Following the global spread of the internet, we are generating data at an unprecedented pace. Because performing any analysis would require us to collect/query the necessary data from the database, we must choose the right tool to query the data. Consequently, we cannot imagine ourselves using SQL to work with this volume of data, as every single query will be expensive. This is precisely where MongoDB comes in. MongoDB is an unstructured database that stores data in the form of documents. This means you do not have to provide a schema to store data in the MongoDB database. This is in contrast to relational databases, where you must provide the schema firsthand before inserting any data. Additionally, MongoDB can handle huge volumes of data very efficiently using PyMongo. This makes it the most widely used NoSQL database as it offers rich query language and flexible and fast access to data.
This article will provide multiple examples of how to query a MongoDB database using PyMongo. We will also learn the basics of aggregation pipelines and use comparison operators, logical operators, and regular expressions.
Learning Objectives
This article continues the MongoDB in Python Tutorial for Beginners, which covered the challenges of unstructured databases, installation steps, and basic operations of MongoDB. So, if you are a complete beginner in MongoDB, I recommend you go through that article first.
PyMongo is a Python library that enables us to connect with MongoDB. It is the most recommended way to work with MongoDB in Python.
We have also chosen Python to interact with MongoDB because it is one of the most commonly used and considerably powerful data science and machine learning languages. PyMongo allows us to retrieve the data with dictionary-like syntax. This is because, internally, MongoDB stores all the documents in BSON format, the Binary form of JSON (JavaScript Object Notation). Now, reading these documents in different programming languages would convert them into the internally supported data types in that programming language. Therefore, in Python, these BSON documents are retrieved as Python dictionaries.
New to Python? Enroll in our free course: Introduction to Python and start your coding journey today!
Installing PyMongo is simple and straightforward. Here, I am assuming you already have Python 3 and MongoDB installed. The following command will help you to install PyMongo:
pip3 install pymongo
Let us set things up before querying a MongoDB database using PyMongo. Firstly, we will insert the data into the database. The following steps will help you with this-
Import the libraries and connect to the mongo client.
Start the MongoDB server on your machine. I am assuming it is running at localhost:27017.
Let’s start by importing some of the libraries we will use. The MongoDB server runs at port 27017 on the local machine by default. Then, we will connect to the MongoDB client using the PyMongo library. The code for it is as follows:
client = pymongo.MongoClient('mongodb://localhost:27017')
It provides a client for a running MongoDB instance.
Get the database instance of the database sample_db.
If it doesn’t exist, MongoDB will create one for you. The code for it is as follows:
database = client['sample_db']
# importing the required libraries
import pymongo
import pprint
import json
import warnings
warnings.filterwarnings('ignore')
# connect to the mongoclient
client = pymongo.MongoClient('mongodb://localhost:27017')
# get the database
database = client['sample_db']
Note- You can view all the existing database names using MongoClient().list_database_names()
Create the Collections from the JSON files.
We will use the data from a meal delivery company that operates in multiple cities. Additionally, they have various fulfilment centres in these cities to dispatch meal orders to their customers. You can download the data and the code here.
1. weekly_demand:
2. meal_info:
Then we will create two collections in the sample_db database. For this we will use the create_collection() function as follows:
# create weekly demand collection
database.create_collection("weekly_demand")
# create meal_info collection
database.create_collection("meal_info")
The list_collection_names () function allows you to view a list of all the collections in your database.
Insert data into Collections.
The data we have is in JSON format. Then, we will get the collection instance, read the data file, and insert the data using the insert_many function.
# get collection weekly_demand
weekly_demand_collection = database.get_collection("weekly_demand")
# open the weekly_demand json file
with open("weekly_demand.json") as f:
file_data = json.load(f)
# insert the data into the collection
weekly_demand_collection.insert_many(file_data)
# get the count of total data points
weekly_demand_collection.find().count()
# >> 456548
# get collection meal_info
meal_info_collection = database.get_collection("meal_info")
# open the meal_info json file
with open("meal_info.json") as f:
file_data = json.load(f)
# insert the data into the collection
meal_info_collection.insert_many(file_data)
# get the count of the total data points
meal_info_collection.find().count()
# >> 51
Finally, we have 456548 documents in the weekly_demand_collection and 51 in the meal info collection. Let’s examine one document from each collection.
Note: insert_one() is a CRUD function in PyMongo to insert documents in a collection.
Let’s examine a document from the first MongoDB collection using the find_one() function. It returns a single document from the collection.
weekly_demand_collection
weekly_demand_collection.find_one()
Notice that by default the _id field of type ObjectId().
meal_info_collection
meal_info_collection.find_one()
Now, our data is ready. Let’s move on to query this database.
Let’s understand this better.
We can query a MongoDB database using PyMongo, using the find function to get all the results satisfying the given condition and the find_one function, which will return only one result satisfying the condition.
The following is the syntax of the find and find_one:
your_collection.find( {<< query >>} , { << fields>>} )
The find_one() takes two parameters. The first is the query parameter based on which the documents will be fetched from the collection. The second parameter is the fields we want to output from these retrieved documents.
You can query the database using the following filtering techniques:
For instance, you have hundreds of fields and want to see only a few. You can do that by putting all the required field names with 1. For example:
weekly_demand_collection.find_one( {}, { "week": 1, "checkout_price" : 1})
On the other hand, if you want to discard a few fields only from the complete document, you can put the field names equal to 0. Therefore, only those fields will be excluded. Please note that you cannot combine 1s and 0s to get the fields. Either all should be one, or all should be zero.
weekly_demand_collection.find_one( {}, {"num_orders" : 0, "meal_id" : 0})
This section will provide a condition in the first braces and fields to discard in the second. Consequently, it will return the first document with center_id equal to 55 and meal_id equal to 1885 and discard the fields _id and week.
weekly_demand_collection.find_one( {"center_id" : 55, "meal_id" : 1885}, {"_id" : 0, "week" : 0} )
The following are the nine comparison operators in MongoDB.
NAME | DESCRIPTION |
$eq | It will match the values that are equal to a specified value. |
$gt | It will match the values that are greater than a specified value. |
$gte | It will match all the values that are greater than or equal to a specified value. |
$in | It will match any of the values specified in an array. |
$lt | It will match all the values that are less than a specified value. |
$lte | It will match all the values that are less than or equal to a specified value. |
$ne | It will match all the values that are not equal to a specified value. |
$nin | It will match none of the values specified in an array. |
The following are some examples of using these comparison operators-
1. Equal to and Not Equal to
We will find all the documents where center_id equals 55 and homepage_featured does not equal 0. Since we will use the find function, it will return the cursor for that command. Further, use a for loop to iterate through the query’s results.
result_1 = weekly_demand_collection.find({
"center_id" : { "$eq" : 55},
"homepage_featured" : { "$ne" : 0}
})
for i in result_1:
print(i)
2. In the List and Not in the List
For instance, you need to match an element with multiple elements. Instead of using the $eq operator multiple times, we can use the $in operator. We will try to find all the documents where center_id is 24 or 11.
Then, we find all the documents where center_id is not in the specified list. The following query will return all the documents where center_id is not 24 and 11.
3. Less than and Greater than
Now, let us find all the documents where center_id is 55 and checkout_price is greater than 100 and less than 200. Use the following syntax for this-
result_4 = weekly_demand_collection.find({
"center_id" : 55,
"checkout_price" : { "$lt" : 200, "$gt" : 100}
})
for i in result_4:
print(i)
NAME | It will join query clauses with a logical AND and return all documents that match both conditions. |
$and | It will invert the effect of a query and return documents that do not match the query expression. |
$not | It will invert the effect of a query and returns documents that do not match the query expression. |
$nor | It will join the query clauses with a logical NOR and return all documents that fail to match the clauses. |
$or | It will join the query clauses with a logical OR and return all documents that match the conditions of either clause. |
The following examples illustrate the use of logical operators-
The following query will return all the documents where the center_id is equal to 11, and also the meal_id is not equal to 1778. The subqueries for the and operator will come inside a list.
result_5 = weekly_demand_collection.find({
"$and" : [{
"center_id" : { "$eq" : 11}
},
{
"meal_id" : { "$ne" : 1778}
}]
})
for i in result_5:
print(i)
The following query will return all the documents where either the center_id equals 11 or the meal_id is either 1207 or 2707. Further, the subqueries for the or operator will come inside a list.
result_6 = weekly_demand_collection.find({
"$or" : [{
"center_id" : { "$eq" : 11}
},
{
"meal_id" : { "$in" : [1207, 2707]}
}]
})
for i in result_6:
print(i)
Regular Expressions are useful when you have text fields and want to search for documents with a specific pattern. If you want to learn more about regular expressions, I highly recommend you go through this article: Beginners Tutorial for Regular Expressions in Python.
It can be used with the operator $regex, and we can provide value to the operator for the regex pattern to match. We will use the meal_info collection for this query, and then we will find the documents where the cuisine field starts with character C.
result_7 = meal_info_collection.find({
"cuisine" : { "$regex" : "^C" }
})
for i in result_7:
print(i)
Let’s take another regular expression example. We will find all the documents in which the category starts with the character “S” and the cuisine ends with “Ian.”
result_8 = meal_info_collection.find({
"$and" : [
{
"category" : {
"$regex" : "^S"
}},
{
"cuisine" : {
"$regex" : "ian$"
}}
]
})
for i in result_8:
print(i)
MongoDB’s aggregation pipeline provides a framework to perform a series of data transformations on a dataset. The following is its syntax:
your_collection.aggregate( [ { <stage1> }, { <stage2> },.. ] )
The first stage takes the complete set of documents as input, and from there, each subsequent stage takes the previous transformation’s result set as input to the next stage and produces the output.
There are around 10 transformations available in the MongoDB aggregate, of which we will see $match and $group in this article. We will discuss each of the transformations in detail in the upcoming article on MongoDB.
For example, in the first stage, we will match the documents where center_id is equal to 11, and in the next stage, it will count the number of documents with center_id equal to 11. Note that we assigned the $count operator the value equalling total_rows in the second stage, which is the name of the field we want in the output.
result_9 = weekly_demand_collection.aggregate([
## stage 1
{
"$match" :
{"center_id" : {"$eq" : 11 } }
},
## stage 2
{
"$count" : "total_rows"
}
])
for i in result_9:
print(i)
Now, let’s take another example where the first stage is the same as before, i.e., center_id is equal to 11. In the second stage, we want to calculate the average of the field num_orders for center_id 11 and the unique meal_ids for center_id 11.
result_10 = weekly_demand_collection.aggregate([
## stage 1
{
"$match" :
{"center_id" : {"$eq" : 11 } }
},
## stage 2
{
"$group" : { "_id" : 0 ,
"average_num_orders": { "$avg" : "$num_orders"},
"unique_meal_id" : {"$addToSet" : "$meal_id"}}
}
])
for i in result_10:
print(i)
The unfathomable amount of data generated today makes it necessary to find better alternatives like this to query data. In this article, we learned how to query a MongoDB database using PyMongo. Also, we understood how to apply various filters per the required situation.
A. You can connect to PyMongo in the following manner –import pymongo client = pymongo.Mongo
client('mongodb://localhost:27017')
A. You can retrieve documents from MongoDB collection using the find() or find_one() functions. In both of these functions, you can provide the filter query and the specific fields you want to retrieve from these documents.
A. You can insert data in MongoDB collection using the insert() or insert_many() functions.
A. To update a document in MongoDB using Python, you can use the update_one()
or update_many()
methods provided by PyMongo. For example, to update a specific field in a document, you would specify the filter criteria and the new values to set.
A. To delete a document from MongoDB using Python, you can utilize the delete_one()
or delete_many()
methods. These methods allow you to specify the criteria for which documents to remove from the collection.