Database people are very much familiar with JOINS. When we want to fetch data from multiple tables we often join the tables based on Primary Keys and Foreign Keys. Let’s learn simple techniques to perform join operations in MongoDB in this article.
The above diagram is a pictorial representation of the relational database schema of any organization. Those blocks are tables storing particular kinds of data (students/professors/employees) and the lines & arrows represent the relation between the tables using common keys. We can perform joins across the tables based on the relations between them.
For eg: In an Organization, there are separate tables for storing data of employees, departments, projects, etc where data is stored in a normalized way. To fetch the details of employees and in which department and project they are working, we need to perform a join across the tables and fetch the requisite data.
Again, in a University, there might be separate tables to store students’ and professors’ data. To find out which all professors teach a particular student, we need to perform join across the tables.
In this particular tutorial, we are going to see how we can perform different join operations(Inner Join, Outer Join, Right Join & Left Join Operations) in MongoDB.
This article was published as a part of the Data Science Blogathon.
The majority of us have knowledge of SQL databases. There we often perform four main kinds of joins which we will discuss below.
1. Inner Join: Only rows with common keys from both tables will be there in the resultant table.
As we can see after performing inner joins we only returned the rows where the Roll No key is common to both.
2. Left Outer Join: All rows of left tables(matching + non-matching keys) will be on the resultant table. Thus in the resultant table, there be only rows of matching keys from the right table; all rows where keys not matching, you can’t eliminate it.
After performing the left join, we have all the columns from the left table. Since Class Rank for Deepak K. is not present in the right table so fill it with null. As discussed only those records of the right table which match the Roll No with the left table are present in the results. For that reason tuple (3D5RE,16) from the right table is not in the result.
3. Right Outer Join: Opposite of left outer join. Here all rows of the right table will be there in the resultant table and only rows with matching keys will be there from the left table.
As expected all records/tuples from the right table are present in the result, but the record (2A3AS, Deepak K., 87) from the left table is absent.
4. Full Outer Join: All the rows from both tables (matching+non-matching keys) will be present in the resultant table.
As expected, we have all the tuples from both tables in our result. The places where values are not present are fill it with null.
MongoDB is a document-based NoSQL Database. NoSQL databases are better for storing large-scale, non-relational, unstructured and frequently changing data. The below two blogs have comparison and operations in MongoDB.
MongoDB database consists of one or more Collections. Collections can be considered equivalent to tables in SQL Databases. Each collection consists of one or more documents. So documents can be thought of as rows or tuples of a table in SQL databases. Data is stored in BSON(Binary JSON format) inside MongoDB.
Now let’s see how different join operations will perform on MongoDB collections.
Convert the two tables Marks and Rank to Collections with each tuple within them as the documents of the respective collections. Store the collections in a database named School inside MongoDB.
Left Outer Join
Code:
db.Marks.aggregate([{$lookup:{from:"Rank",localField:"Roll No",
foreignField:"Roll No",as:"Ranks"}}])
As we can see the rank details of the respective student are appended to their document. For Deepak, there are no rank details in the Rank table so obviously, his Ranks field is an empty list.
Now let us understand the parameters used:
Now one thing to remember, in MongoDB $lookup can only perform left joins and there is no specific syntax available for other types of joins. So we need to derive the other joins by using different tricks and operations
Now right join is just opposite to the left join where apart from matching records, non-matching records of the right collection/table should also be there in the resulting collection/table.
Now a simple way to do that is just to alter the position of two collections; then our right collection becomes the left and vice-versa. So now, the join will contain all the rows(matching+non-matching) of our right table.
Code:
db.Rank.aggregate([{$lookup:{from:"Marks", localField:"Roll No",
foreignField:"Roll No", as:"Marks_Students"}}])
We can efficiently perform an inner join with a simple trick!!! We will do a left join and then remove all those records where the as field is empty. So we will only be left with the records where the keys are present in both tables(collections).
Code:
db.Rank.aggregate([{$lookup:{from:"Marks", localField:"Roll No", foreignField:"Roll No",
as: "Marks_Students"}}, {$match:{"Marks_Students":{$ne:[]}}}])
As we can see in the above result we only have the records for which the keys from both collections matched. Here {$match:{“Marks_Students”:{$ne:[]}}} dictates to match only those records where Marks_Students field is not [](empty list)
Full outer join is a little complicated, I designed it by a combination of 3 operations. So if it looks confusing on the first go, I request you to give it a read multiple times for better understanding.
Step 1: We will do a left join of Marks(left collection) and Rank(right collection) and add an empty field named Marks to all the resultant records and send/output the result in a new collection called J2.
Code:
db.Marks.aggregate([{$lookup:{from:"Rank",localField:"Roll No", foreignField:"Roll No", as:"Rank"}},
{$addFields:{Marks:[]}},{$out:"J2"}])
So our new collection looks like the screenshot above.
So now as obvious, our database School contains 3 collections-
Marks, Rank, J2
Step 2: We will perform the right join(as discussed before considering Rank as the left collection) of Marks and Rank and append the result to the J2 collection.
Code:
db.Rank.aggregate([{$lookup:{from:"Marks",localField:"Roll No",foreignField:"Roll No",
as:"Marks"}},{$merge:"J2"}])
Notice how the system appends the new output to the bottom of the old output of step 1.
Step 3:
We will only keep/retain the records in our aggregate result where the Match field is [](empty) and discard the rest. In this way, remove the duplicates and we will only have in our results all the distinct fields from both collections/tables. (You may have noticed in the output of Step 2 there are duplicates e.g: There are two records of Nikita)
Atlast we will remove the empty Marks field from the aggregation result since it’s an empty field and no use displaying it. Its purpose was to remove the duplicates.
Code:
db.J2.aggregate([{$redact:{$cond:[{$eq:["$Marks",[]]},"$KEEP","$PRUNE"]}},
{$unset:”Marks”}])
So finally we have our desired output. We have all the records which matched in both the tables(collections) as well as other records present in either of the tables(collections). (Deepak’s in Marks, and roll no. 3D5RE in Rank).
So we successfully derived different kinds of joins from left join in MongoDB. So just to recap, there is only direct syntax available in MongoDB for performing left joins. Other types of joins should be derived by applying different types of operations and techniques on the left join. For eg: removing collections with empty as field in case of an inner join, etc.
While deriving these joins we realized:
If you know better ways you are welcome to share in the comments.
A. Yes, we can only perform the left join of collections using $lookup directly. Perform other join operations using a combination of different operators as shown in this article.
A. Yes, everybody is welcome to try out and come up with better solutions and post in the comments. These solutions especially the outer join are the ways, we can do it. There may be better and more optimized ways to do it. A proper understanding of the operators and syntax, how they work, what result they give etc. will help you come up with innovative solutions.
A. MongoDB queries on large datasets can really become big, messy and complicated. But proper knowledge of the operations will help to write the correct queries. The MongoDB official documentation is one of the best resources to learn MongoDB operations in depth. Apart from that MongoDB University provides excellent certified courses for free.
A. Of course! Proper indexing is always a plus point! I haven’t written a blog on indexing yet, it’s still in the pipeline. MongoDB documentation and their free courses provide good quality free lessons on these topics.
A. In MongoDB Atlas, you get to buy clusters or there is a shared cluster for free where you can do all cool stuff like build and manage applications and dashboards, which can be a cool way to show your MongoDB skills. Integrate MongoDB with the Google Cloud Platform(GCP) and uses services of the GCP platform to easily build and deploy products.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.