What kind of database did you use to build your most recent application? According to Scalegrid’s 2019 database trends report, SQL is the most popular database form, with more than 60% of its use. It is followed by NoSQL databases with more than 39% use. MySQL is the most common SQL database, while MongoDB is the most widely used NoSQL database. However, there is another form of database that is getting increasingly popular, and that is graph databases. Do you know what powers the real-time search on Facebook and Google? Graph databases power their search engine, which provides a relevant search, including results that you might not have specifically asked for but is still relevant.
Learning Objectives
The learning objective of this article is to explore graph databases and what is their purpose. We will go through Neo4j, which provides an architecture to create a graph database. Along with creating the basic elements of a graph database, we will also explore some Neo4j functionalities as well.
This article was published as a part of the Data Science Blogathon.
A graph in a graph database represents a collection of nodes and the relationships between these nodes. A node can be considered an object in the real world. For example, a person, a country, a book, or a movie title can be considered a node. These nodes are connected with one or (usually) more than one node through certain relationships. Let us consider the below image, where we have a sample node and relationship graph.
The circles represent the nodes, and the linkage between them through arrows represents their relationship. Note that these arrows represent a direction. We have only included unidirectional linkages in our case, while bidirectional or self-linkages are also possible in graph databases. The person (Mr. Narendra Modi), Indian Flag (representing India), Politician, and Prime Minister are nodes. There are a total of 5 links or relationships present between these nodes. Below are the nodes and relationships that can be drawn from the graph.
Neo4j is a graph database management system that is used to build a native graph database that stores and manages data relationships and has the ability to deliver lightning-fast queries. The community edition of Neo4j is an open-source tool under a GPLv3 license. It helps build the nodes and relationships we see in graph databases, along with additional features like defining node labels and properties.
Read more about Neo4j here:
Now, we have a fair understanding of graph databases, nodes, relationships, and Neo4j features, so let us dig a little further and learn how to create a basic graph database using Neo4j. First, you need to install the Neo4j Desktop community or enterprise edition from this link. You can get a developer license free with the enterprise edition. After downloading the application, follow the steps below to create a new project and assign a database to it to work with.
Neo4j – Steps
Step 1: Create a new project.
Step 2: Create a new database within the project.
Step 3: Provide the name of the database and the password and click on Create.
Step 4: Start the database.
After performing these steps, you can see a console bar where you can start writing your Neo4j graph queries.
We can start working on our database by writing queries to create the node and relationships.
The ‘CREATE’ command creates a node in our graph database. The most basic node is the one that does not have any labels or properties, which means it is a blank or empty node. However, this node will have an ID which is an integer value. This ID is provided by default to each node created and is unique. We can create an empty node using the query:
CREATE ()
To create a node with a label, we can provide it using a colon followed by the label’s name. To create a node with the label ‘Politician’, we can use the query:
CREATE (:Politician)
We can also provide more than one label to a node. To provide more than one label, we can follow the same pattern and continue writing colon followed by the label name. To create a node with the label ‘Country,’ ‘Asia,’ and ‘India,’ we can use the query:
CREATE (:Country:Asia:India)
As discussed, a node can have one or more properties and labels. These properties can be provided as key-value pairs in curly braces as {key: value}. Let us create a node labeled ‘Person’ and properties as ‘Mr. Narendra Modi’ for key ‘name,’ 70 for key ‘age.’
CREATE (:Person{name: ‘Mr. Narendra Modi’, age: 70})
After you have successfully run the above queries, you can view your nodes using the query:
MATCH (node) RETURN (node)
The below image is the query output where you have 4 nodes. You can view the label and properties of a specific node by hovering over it. In the image, we can see the label ‘Person’ and properties ‘name’ and ‘age’ of node ID 3.
Neo4j Output – Creating Nodes
This section will show how to create relationships between the nodes. Here is a quick tip to clear all the existing nodes and relationships in a database; we can use the query:
MATCH (n) DETACH DELETE (n)
To start with, let us first see how we can create a self-linkage in a node representing a node relationship. We will create a node ‘Person’ and add a relationship of love with the same node, presenting the self-love of a person.
CREATE(p:Person{name: ‘Ram’})-[:loves]->(p)
Let us have a quick look at this query. The ‘p’ acts as a reference variable name for the query. Please note that we can add this reference variable while creating nodes, as presented in the previous section. The label provided to this node is ‘Person’ and one ‘name’ property with a value as ‘Ram.’ The content inside the square brackets represents the type of relationship that we are trying to define. The arrow represents the directional flow of this relationship. The reference variable name helps assign this relationship where it is mentioned at the end of the query.
We can also provide properties to a given relationship. Consider the previous node where we want to add a property with the key ‘since’ and value ‘always.’
CREATE(p:Person{name: ‘Ram’})-[:loves{since: ‘always’}]->(p)
To define the relationship between two different nodes, we can create both nodes and then define the relationship using the reference variables. Let us create a relationship between India and her Prime Minister. Note that we can write multiple queries simultaneously using shift enter in the console.
CREATE (p1:Person{name: ‘Mr. Narendra Modi’}) CREATE (c1:Country{name: ‘India’}) CREATE (p1)-[r1:prime_minister]->(c1)
After running all the queries mentioned in this section successfully, we can view the following output using the command:
MATCH (node) RETURN (node)
Neo4j Output – Creating Relationships
As per Neo4j, a query consisting of a depth 6 search on a database with a million user records could not be finished in an hour for relational databases but took only 2.132 seconds in Neo4j graph databases. Due to such striking results, graph databases are used in many big data use cases. For instance, Google search is powered by graph databases. The share of graph databases is meniscal as compared to relational or No-SQL databases. Still, given the ability to work efficiently and quickly with massive amounts of data, we can see the rising trends of using graph data to continue for a more extended period. The key takeaways from this article are:
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.