Neo4j: Features, Install, Advantage & More

Mrinal Singh Last Updated : 06 Feb, 2024

9 min read

Today, organizations increasingly harness graph analytics to glean insights from vast, intricate data sets. Neo4j, a leading graph database, empowers developers and data scientists with potent tools for building intelligent applications and workflows. This guide aims to aid beginners in Neo4j, from fundamentals, install neo4j to practical concepts. It defines key terms and provides a code example for Neo4j installation and setup. Prior knowledge of Database Systems and Graph Theory is recommended.

This article was published as a part of the Data Science Blogathon.

What is a Neo4j / Graph Database?
Why Graph Database or Neo4j?
Relational Database vs. Graph Database
Advantages Of Neo4j
Features Of Neo4j
Neo4j Property Graph Data Model
How To Install Neo4j On Ubuntu?
- Prerequisites
- Install Neo4j
Start Neo4j Service
Testing and Working With Neo4j
Neo4j CQL: Clauses, Functions, Datatypes, and Operators
Conclusion
Frequently Asked Questions

What is a Neo4j / Graph Database?

Neo4j is a highly scalable graph database management system, purpose-built for storing and traversing relationships in computing. It replaces tables or documents with nodes and relationships in a graph structure for semantic queries.

Key Points

Node-Based Data: Neo4j stores data entities (e.g., users, companies) as nodes.
Relationship Clarity: Edges represent relationships, making data relationships easily understandable.
Flexibility: Data is stored without a predefined model, enabling adaptable analysis.
Relationship Variety: Nodes can have multiple relationships, while edges have start and end nodes, type, and direction.
Common Use Cases: Organizations use Neo4j to extract precise insights from complex queries involving customer and user data.”

Why Graph Database or Neo4j?

In today’s rapidly evolving tech landscape, companies grapple with vast data volumes. Extracting insights and identifying connections within data are paramount. To address this, companies need a database technology that treats relationship details as a primary entity for maximizing data relationships.

Although existing relational databases can manage these relationships, their performance in handling data relationships remains subpar. The solution to many business needs lies in graph databases. They store relationships alongside nodes (data elements) in a more efficient, flexible format. They excel in swiftly navigating data, making them ideal for adapting to evolving business requirements.

Relational Database vs. Graph Database

If you are still stumbling to find the answer to “How does a relational database differ from a graph-based database, the below difference table will help you understand it in a better way:

	Relational Database	Graph Database
Format	It has tables with rows and columns.	It has nodes and edges showing relationships among each other.
Relationships	Relationships are connected across tables where they are established using foreign keys between tables.	Considering data, the relationships are represented between edges and nodes.
Complex Queries	Relational databases require complex joins between tables.	Graph databases operate quickly and do not require joins.
Top Use-Cases	Relationational databases are widely adopted for transaction applications such as online transactions and accountings.	Graph databases are mainly used for relationship-heavy use cases, including fraud detection and recommendations engines.

Advantages Of Neo4j

As we all know Graph database is the solution to make rapid progress on mission-critical enterprises. Still, there is a list of benefits of using Neo4j. We are going to study that now:

It has a simple, flexible, and robust data model. It can be effortlessly adjusted according to your application needs and business demands.
It delivers results based on real-time data.
It offers high availability for big company real-time applications with transactional contracts.
Neo4j is a schema-free database and provides a straightforward representation of connected and semi-structured data.
Using Neo4j, you can represent and easily retrieve (traverse/navigate) connected data faster than other databases comparatively.
Neo4j feeds a declarative query language (Cypher Query Language) to illustrate the graph visually employing ASCII-art syntax. The commands of this language are straightforward to understand and are humanly readable.
Neo4j is fast because more interconnected data is straightforward to retrieve and navigate.
Using Neo4j does not need complicated joins to retrieve connected/related data. It is straightforward to retrieve its neighboring node or relationship attributes without joins or indexes.
Neo4j provides higher vertical scaling, improved operational characteristics, higher concurrency, and simplified tuning.

Features Of Neo4j

To summarize, until now, we have seen what Neo4j is. Why is Neo4j so popular? Difference between Neo4j and relational database management system, and advantages of Neo4j as a widely used graph database across all enterprises and businesses.

In this section, we will examine a list of significant features of Neo4j:

Data Model With Flexible Schema

Neo4j adheres to the property graph model as its data model. The graph comprises nodes representing entities, and these nodes are interconnected through relationships. Both nodes and relationships store data in key-value pairs, referred to as properties. Neo4j imposes no fixed schema, allowing you to add or remove properties based on your requirements. Additionally, Neo4j provides schema constraints for enhanced data management.

ACID Properties

Neo4j supports rich ACID properties:

A: Atomicity
C: Consistency
I: Isolation
D: Durability

Scalability & Reliability

Neo4j allows you to scale the database by increasing the number of reads/writes operations and the volume without impacting the query processing speed and data integrity. It also furnishes permission for replication for data protection and reliability.

Built-in Web applications

Neo4j also offers a built-in Neo4j browser web application that can be utilized to construct and retrieve your graph data.

Indexing & GraphDB

Neo4j sustains Indexes by employing Apache Lucence & follows Property Graph Data Model.

Some Other General Features

It provides REST API to work with programming languages such as Java, Spring, Scala, etc.
It supports UNIQUE constraints and uses native graph storage along native GPE(Graph Processing Engine).
It provides Java Script to work with UI MVC frameworks such as Node JS.
Cypher API and Native Java API are the two types of Java API supported by Neo4j. Using these APIs, you can develop robust Java applications.
In addition to these, Neo4j supports exporting query data to JSON and XLS format to work with other databases such as MongoDB or Cassandra.

Neo4j Property Graph Data Model

As we saw in the features section, Neo4j follows a property graph data model to store and manipulate its data. This section will discuss some of the critical features and central building blocks of the property graph data model, which are:

Nodes

First, it is essential to know that a graph data structure consists of nodes (data entities or discrete objects) represented by relationships.
Nodes can have one or more labels to classify their nodes and are represented using circles.

Relationships

Relationships represent a link between a source node and a target node. They can be Unidirectional and Bidirectional only (that means it always has a direction). It is only denoted using arrow keys.
Relationships must have a type to classify their ties, i.e., “Start Node” or “From Node” and “To Node” or “End Node.”

Properties

Nodes and relationships can have properties where they are key-value pairs, further describing them.

How To Install Neo4j On Ubuntu?

This section will explain installing and configuring the Neo4j on Ubuntu 20.04 server.

Prerequisites

For setting-up Neo4j, the following setting is recommended:

1GB Ram, 15GB of free storage, and a single-core server
All commands should be run in root mode. If you are not in root mode, controls must be followed by the Sudo command. In this tutorial, we will be using the sudo command for your ease.
Ubuntu 20.04

Install Neo4j

The official Ubuntu package repositories do not officially include Neo4j in the standard package repository. To install the upstream supported package from Neo4j, we will add the package source pointing to the location of the Neo4j repository. Then we will add the GPG key from Neo4j for confirmation. After that, we will install Neo4j.

Step 1: We start by updating the existing list of packages.

Command

Step 2: Install neo4j packages

This step will install a few prerequisite packages for HTTPS connections to secure the installation. This application may be already installed in your systems by default. Still, it is safe to run the following command anyways.

Command

Step 3: Add security GPG key

We will add the security GPG key for the official Neo4j package repository in this step. This key will confirm that you can trust what you are installing is from the official Neo4j upstream repository.

Command

Output

Step 4: Add the Neo4j repository to your system’s package manager list.

Command

Step 5: Install the Neo4j package

The final phase in this module is to install the Neo4j package and all of its dependencies. It is necessary to mention that this installation will also download and install a compatible Java package to work with Neo4j. So you can enter “Y” to accept this software install. If your system already has Java installed, the installer will skip this stage.

Command

Start Neo4j Service

Step 1: Enable neo4j.service

After the installation, Neo4j should be running. However, we need to enable it as a “neo4j.service” service to set it to start on a reboot of the system.

Command

Step 2: Use systemctl command

Next, examine Neo4j’s status using the “systemctl” command. This step is essential to verify that everything is working as expected.

Command

Testing and Working With Neo4j

Now that you have Neo4j and its dependencies installed on your system and its services started, you are all set to test the DB connection and configure the admin user.

Step 1: Interact with Neo4J

To interact with the Neo4j database on the command line, we will launch the internal utility using the “cypher-shell” command.

Command

Output

Initially, you’ll need to provide a username and password, which are set to ‘neo4j’ by default. After successful authentication, you’ll be prompted to update the administrator password according to your preference.

Once the password is updated, you’ll gain access to the interactive ‘neo4j’ prompt. Here, you can interact with the Neo4j database by inserting and querying nodes

Step 2: Use exit command

Use the exit command after setting an administrator password and testing a connection to Neo4j.

Command

Output

Bye!

Neo4j CQL: Clauses, Functions, Datatypes, and Operators

As discussed in the earlier section, Neo4j has CQL (Cypher Query Language) as query language. Now we will see some of the clauses, functions, data types, and operators supported in CQL.

Neo4j CQL Clauses

Command	Description
MATCH	Searches data with a specified pattern.
OPTIONAL MATCH	Functions like MATCH but allows null for missing parts.
WHERE	Adds conditions to CQL queries.
START	Locates initial points through legacy indexes.
LOAD CSV	Imports data from a locally stored CSV file.
CREATE	Creates nodes, properties, and relationships in the DB.
SET	Updates labels on nodes and properties on nodes/relationships
MERGE	Checks if a pattern exists; creates if not.
DELETE	Removes nodes, relationships, and paths from the DB.
REMOVE	Eliminates elements and properties from nodes/relationships.
FOREACH	Updates data within a list.
CREATE UNIQUE	Matches and creates a unique pattern.
RETURN	Specifies the query result set.
ORDER BY	Arranges query output in order (used with RETURN or WITH).
LIMIT	Restricts result rows to a specific value.
SKIP	Chains query parts together.
UNWIND	Expands a list into rows.
UNION	Joins outcomes of multiple queries.
CALL	Invokes a deployed procedure in the database.

Neo4j CQL Functions

Term	Description
String	Used when working with string literals.
Aggregation	Conducts aggregation operations on CQL query results.
Relationship	Used to fetch details of relationships, such as start and end nodes.

Neo4j CQL Data Types

Most Neo4j data types are similar to the java language data types. They are also used to define the properties of a node or a relationship.

Data Type	Description
Boolean	Defines boolean values (True, False).
byte	Describes an 8-bit integer.
short	Determines 16-bit integers.
int	Defines 32-bit integers.
long	Describes 64-bit integer.
float	Describes a 32-bit floating-point number.
double	Expresses a 64-bit floating-point number.
char	Represents a 16-bit character.
String	Represents a literal string.

Neo4j CQL Operators

Here are the operators supported by Neo4j CQL:

Operator Type	Operators
Mathematical Operators	+, -, *, /, %, ^
Comparison Operators	>, <, >=, <=, =
Boolean Operators	AND, OR, XOR, NOT
String Operators	+
List Operators	+, IN, [X], [X?..Y]
Regular Expression	=~
Matching String	STARTS WITH, ENDS WITH, CONSTRAINTS

Conclusion

The prime motive behind the launch of the Neo4j graph database was to help users solve many different kinds of business and technical needs. It is simple to use and fits your use-cases whether you depend on graph transactions, market analysis, operational optimizations, or anything else. It has always delivered a seamless experience for integrating additional tools with the rest of your existing system.

Here are a few resources to support your further journey into this tool:

Frequently Asked Questions

Q1. What is Neo4j used for?

A. Neo4j is used for managing and querying graph data, making it ideal for applications involving complex relationships like social networks, recommendation engines, and knowledge graphs.

Q2. Is Neo4j SQL or NoSQL?

A. Neo4j is a NoSQL database specifically designed for graph data, not SQL.

Q3. What are the downsides of Neo4j?

A. Downsides of Neo4j include a learning curve, resource-intensive operations on large graphs, and limited support for certain types of queries.

Q4. Is Neo4j still relevant?

A. Neo4j remains relevant for applications requiring complex relationship modeling and querying, as graph databases continue to find use in various industries.

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.

Mrinal Singh

Data Scientist and a Technical Writer! I will give you the best of Open-Source and AI.

Talks about #chatgpt, #opensource, #contentcreation, #communitybuilding, and #artificialintelligence

Technical Writer | Data Science, ML, AI, Open-Source | Do More with Data - Litmus

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to NLP

Text Pre-processing

NLP Libraries

Regular Expressions

String Similarity

Spelling Correction

Topic Modeling

Text Representation

Information Retrieval System

Word Vectors

Word Senses

Dependency Parsing

Language Modeling

Getting Started with RNN

Different Variants of RNN

Machine Translation and Attention

Self Attention and Transformers

Transfomers and Pretraining

Question Answering

Text Summarization

Named Entity Recognition

Coreference Resolution

Audio Data

ASR

Audio Separation

Chatbot

Auto NLP

Neo4j: Features, Install, Advantage & More

Table of contents

What is a Neo4j / Graph Database?

Why Graph Database or Neo4j?

Relational Database vs. Graph Database

Advantages Of Neo4j

Features Of Neo4j

Data Model With Flexible Schema

ACID Properties

Scalability & Reliability

Built-in Web applications

Indexing & GraphDB

Some Other General Features

Neo4j Property Graph Data Model

Nodes

Relationships

Properties

How To Install Neo4j On Ubuntu?

Prerequisites

Install Neo4j

Step 1: We start by updating the existing list of packages.

Step 2: Install neo4j packages

Step 3: Add security GPG key

Step 4: Add the Neo4j repository to your system’s package manager list.

Step 5: Install the Neo4j package

Start Neo4j Service

Step 1: Enable neo4j.service

Step 2: Use systemctl command

Testing and Working With Neo4j

Step 1: Interact with Neo4J

Step 2: Use exit command

Neo4j CQL: Clauses, Functions, Datatypes, and Operators

Neo4j CQL Clauses

Neo4j CQL Functions

Neo4j CQL Data Types

Neo4j CQL Operators

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid