Pymongo: Python Library to Query a MongoDB Database

Lakshay arora Last Updated : 08 Nov, 2024
11 min read

Following the global spread of the internet, we are generating data at an unprecedented pace. Because performing any analysis would require us to collect/query the necessary data from the database, we must choose the right tool to query the data. Consequently, we cannot imagine ourselves using SQL to work with this volume of data, as every single query will be expensive. This is precisely where MongoDB comes in. MongoDB is an unstructured database that stores data in the form of documents. This means you do not have to provide a schema to store data in the MongoDB database. This is in contrast to relational databases, where you must provide the schema firsthand before inserting any data. Additionally, MongoDB can handle huge volumes of data very efficiently using PyMongo. This makes it the most widely used NoSQL database as it offers rich query language and flexible and fast access to data.

This article will provide multiple examples of how to query a MongoDB database using PyMongo. We will also learn the basics of aggregation pipelines and use comparison operators, logical operators, and regular expressions.

Learning Objectives

  • Learn to set up a database in MongoDB using PyMongo.
  • Learn basic querying of MongoDB database, like filtering with different operators – logical, comparison, regular expressions, etc.

This article continues the MongoDB in Python Tutorial for Beginners, which covered the challenges of unstructured databases, installation steps, and basic operations of MongoDB. So, if you are a complete beginner in MongoDB, I recommend you go through that article first.

Table of Contents

What Is PyMongo?

PyMongo is a Python library that enables us to connect with MongoDB. It is the most recommended way to work with MongoDB in Python.

We have also chosen Python to interact with MongoDB because it is one of the most commonly used and considerably powerful data science and machine learning languages. PyMongo allows us to retrieve the data with dictionary-like syntax. This is because, internally, MongoDB stores all the documents in BSON format, the Binary form of JSON (JavaScript Object Notation). Now, reading these documents in different programming languages would convert them into the internally supported data types in that programming language. Therefore, in Python, these BSON documents are retrieved as Python dictionaries.

New to Python? Enroll in our free course: Introduction to Python and start your coding journey today!

How to Install PyMongo?

Installing PyMongo is simple and straightforward. Here, I am assuming you already have Python 3 and MongoDB installed. The following command will help you to install PyMongo:

pip3 install pymongo

How Do You Create Collections and Load Data with PyMongo?

Let us set things up before querying a MongoDB database using PyMongo. Firstly, we will insert the data into the database. The following steps will help you with this-

Step 1

Import the libraries and connect to the mongo client.

Start the MongoDB server on your machine. I am assuming it is running at localhost:27017.

Let’s start by importing some of the libraries we will use. The MongoDB server runs at port 27017 on the local machine by default. Then, we will connect to the MongoDB client using the PyMongo library. The code for it is as follows:

client = pymongo.MongoClient('mongodb://localhost:27017')

It provides a client for a running MongoDB instance.

Step 2

Get the database instance of the database sample_db.

If it doesn’t exist, MongoDB will create one for you. The code for it is as follows:

database = client['sample_db']

Python Code

# importing the required libraries
import pymongo
import pprint
import json
import warnings
warnings.filterwarnings('ignore')

# connect to the mongoclient
client = pymongo.MongoClient('mongodb://localhost:27017')

# get the database
database = client['sample_db']

Note- You can view all the existing database names using MongoClient().list_database_names()

Step 3

Create the Collections from the JSON files.

We will use the data from a meal delivery company that operates in multiple cities. Additionally, they have various fulfilment centres in these cities to dispatch meal orders to their customers. You can download the data and the code here.

1. weekly_demand:

  • id: Unique ID for each document
  • week: Week Number
  • center_id: Unique ID for fulfilment center
  • meal_id: Unique ID for Meal
  • checkout_price: Final price including discount, taxes & delivery charges
  • base_price: Base price of the meal
  • emailer_for_promotion: Emailer sent for promotion of meal
  • homepage_featured: Meal featured on the homepage
  • num_orders: (Target) Orders Count

2. meal_info:

  • meal_id: Unique ID for the meal
  • category: Type of meal (beverages/snacks/soups….)
  • cuisine: Meal cuisine (Indian/Italian/…)

Then we will create two collections in the sample_db database. For this we will use the create_collection() function as follows:

# create weekly demand collection
database.create_collection("weekly_demand")

# create meal_info collection
database.create_collection("meal_info")
Query MongoDB using PyMongo - weekly demand collection
Query MongoDB using PyMongo - meal info collection

The list_collection_names () function allows you to view a list of all the collections in your database.

Step 4

Insert data into Collections.

The data we have is in JSON format. Then, we will get the collection instance, read the data file, and insert the data using the insert_many function.

# get collection weekly_demand
weekly_demand_collection = database.get_collection("weekly_demand")

# open the weekly_demand json file
with open("weekly_demand.json") as f:
    file_data = json.load(f)
# insert the data into the collection
weekly_demand_collection.insert_many(file_data)

# get the count of total data points
weekly_demand_collection.find().count()
# >> 456548

# get collection meal_info 
meal_info_collection = database.get_collection("meal_info")

# open the meal_info json file
with open("meal_info.json") as f:
    file_data = json.load(f)
    
# insert the data into the collection
meal_info_collection.insert_many(file_data)

# get the count of the total data points
meal_info_collection.find().count()
# >> 51

Finally, we have 456548 documents in the weekly_demand_collection and 51 in the meal info collection. Let’s examine one document from each collection.

Note: insert_one() is a CRUD function in PyMongo to insert documents in a collection.

Let’s examine a document from the first MongoDB collection using the find_one() function. It returns a single document from the collection.

weekly_demand_collection

weekly_demand_collection.find_one()
CRUD function in PyMongo


Notice that by default the _id field of type ObjectId().

meal_info_collection

meal_info_collection.find_one()
Query MongoDB using PyMongo - meal info collection

Now, our data is ready. Let’s move on to query this database.

Let’s understand this better.

How to Query MongoDB Using PyMongo?

We can query a MongoDB database using PyMongo, using the find function to get all the results satisfying the given condition and the find_one function, which will return only one result satisfying the condition.

The following is the syntax of the find and find_one:

your_collection.find( {<< query >>} , { << fields>>} )

The find_one() takes two parameters. The first is the query parameter based on which the documents will be fetched from the collection. The second parameter is the fields we want to output from these retrieved documents.

You can query the database using the following filtering techniques:

Filter based on fields

For instance, you have hundreds of fields and want to see only a few. You can do that by putting all the required field names with 1. For example:

weekly_demand_collection.find_one( {}, { "week": 1, "checkout_price" : 1})
Query MongoDB using PyMongo - Filter based on fields

On the other hand, if you want to discard a few fields only from the complete document, you can put the field names equal to 0. Therefore, only those fields will be excluded. Please note that you cannot combine 1s and 0s to get the fields. Either all should be one, or all should be zero.

weekly_demand_collection.find_one( {}, {"num_orders" : 0, "meal_id" : 0})
Query MongoDB using PyMongo - Filter based on fields

Filter with a condition.

This section will provide a condition in the first braces and fields to discard in the second. Consequently, it will return the first document with center_id equal to 55 and meal_id equal to 1885 and discard the fields _id and week.

weekly_demand_collection.find_one( {"center_id" : 55, "meal_id" : 1885}, {"_id" : 0, "week" : 0} )
Query MongoDB using PyMongo - Filter with a condition

Filter based on Comparison Operators

The following are the nine comparison operators in MongoDB.

NAMEDESCRIPTION
$eqIt will match the values that are equal to a specified value.
$gtIt will match the values that are greater than a specified value.
$gteIt will match all the values that are greater than or equal to a specified value.
$inIt will match any of the values specified in an array.
$ltIt will match all the values that are less than a specified value.
$lteIt will match all the values that are less than or equal to a specified value.
$neIt will match all the values that are not equal to a specified value.
$ninIt will match none of the values specified in an array.

The following are some examples of using these comparison operators-

1. Equal to and Not Equal to

We will find all the documents where center_id equals 55 and homepage_featured does not equal 0. Since we will use the find function, it will return the cursor for that command. Further, use a for loop to iterate through the query’s results.

result_1 = weekly_demand_collection.find({
    "center_id" : { "$eq" : 55},
    "homepage_featured" : { "$ne" : 0}
})

for i in result_1:
    print(i)
Comparison Operators


2. In the List and Not in the List

For instance, you need to match an element with multiple elements. Instead of using the $eq operator multiple times, we can use the $in operator. We will try to find all the documents where center_id is 24 or 11.

Then, we find all the documents where center_id is not in the specified list. The following query will return all the documents where center_id is not 24 and 11.

3. Less than and Greater than

Now, let us find all the documents where center_id is 55 and checkout_price is greater than 100 and less than 200. Use the following syntax for this-

result_4 = weekly_demand_collection.find({
    "center_id" : 55,
    "checkout_price" : { "$lt" : 200, "$gt" : 100}
})

for i in result_4:
    print(i)
Query MongoDB using PyMongo - Less than and greater than

Filter based on Logical Operator

NAMEIt will join query clauses with a logical AND and return all documents that match both conditions.
$andIt will invert the effect of a query and return documents that do not match the query expression.
$notIt will invert the effect of a query and returns documents that do not match the query expression.
$norIt will join the query clauses with a logical NOR and return all documents that fail to match the clauses.
$orIt will join the query clauses with a logical OR and return all documents that match the conditions of either clause.

The following examples illustrate the use of logical operators-

  • AND Operator

    The following query will return all the documents where the center_id is equal to 11, and also the meal_id is not equal to 1778. The subqueries for the and operator will come inside a list.

result_5 = weekly_demand_collection.find({
    "$and" : [{
                 "center_id" : { "$eq" : 11}
              },
              {
                   "meal_id" : { "$ne" : 1778}
              }]
})

for i in result_5:
    print(i)
Query MongoDB using PyMongo - And operator
  • OR Operator

    The following query will return all the documents where either the center_id equals 11 or the meal_id is either 1207 or 2707. Further, the subqueries for the or operator will come inside a list.

result_6 = weekly_demand_collection.find({
    "$or" : [{
                 "center_id" : { "$eq" : 11}
              },
              {
                   "meal_id" : { "$in" : [1207, 2707]}
              }]
})

for i in result_6:
    print(i)
Query MongoDB using PyMongo - or operator

Filter with Regular Expressions

Regular Expressions are useful when you have text fields and want to search for documents with a specific pattern. If you want to learn more about regular expressions, I highly recommend you go through this article: Beginners Tutorial for Regular Expressions in Python.

It can be used with the operator $regex, and we can provide value to the operator for the regex pattern to match. We will use the meal_info collection for this query, and then we will find the documents where the cuisine field starts with character C.

result_7 = meal_info_collection.find({
    "cuisine" : { "$regex" : "^C" }
})

for i in result_7:
    print(i)

Let’s take another regular expression example. We will find all the documents in which the category starts with the character “S” and the cuisine ends with “Ian.”

result_8 = meal_info_collection.find({
    "$and" : [
        { 
            "category" : {
            "$regex" : "^S"
        }},
        {
            "cuisine" : {
                "$regex" : "ian$"
        }}
    ]
})

for i in result_8:
    print(i)

Aggregation Pipelines

MongoDB’s aggregation pipeline provides a framework to perform a series of data transformations on a dataset. The following is its syntax:

your_collection.aggregate( [ { <stage1> }, { <stage2> },.. ] )

The first stage takes the complete set of documents as input, and from there, each subsequent stage takes the previous transformation’s result set as input to the next stage and produces the output.

There are around 10 transformations available in the MongoDB aggregate, of which we will see $match and $group in this article. We will discuss each of the transformations in detail in the upcoming article on MongoDB.

For example, in the first stage, we will match the documents where center_id is equal to 11, and in the next stage, it will count the number of documents with center_id equal to 11. Note that we assigned the $count operator the value equalling total_rows in the second stage, which is the name of the field we want in the output.

result_9 = weekly_demand_collection.aggregate([
    ## stage 1
    {
        "$match" : 
                 {"center_id" : {"$eq" : 11 } }
    },
    ## stage 2
    {
        "$count" : "total_rows"
    }
])

for i in result_9:
    print(i)

Now, let’s take another example where the first stage is the same as before, i.e., center_id is equal to 11. In the second stage, we want to calculate the average of the field num_orders for center_id 11 and the unique meal_ids for center_id 11.

result_10 = weekly_demand_collection.aggregate([
    ## stage 1
    {
        "$match" : 
                 {"center_id" : {"$eq" : 11 } }
    },
    ## stage 2
    {
        "$group" : { "_id" : 0 ,
                     "average_num_orders": { "$avg" : "$num_orders"},
                     "unique_meal_id" : {"$addToSet" : "$meal_id"}} 
    }
])

for i in result_10:
    print(i)
Query MongoDB using PyMongo

Conclusion

The unfathomable amount of data generated today makes it necessary to find better alternatives like this to query data. In this article, we learned how to query a MongoDB database using PyMongo. Also, we understood how to apply various filters per the required situation.

Key Takeaways

  • Using create_collection(“<collection_name>”), you can create a collection in MongoDB
  • You can use the insert_many() function to insert multiple records in a collection.
  • The find_one() or find() functions can be used in conjunction with operators like $and, $or, $in, etc., which can be used to filter the documents in a MongoDB collection
  • The Aggregation pipeline provides a framework for performing a series of transformations on the documents in a collection, such as $match and $group.

Q1. How to connect to MongoDB using PyMongo?

A. You can connect to PyMongo in the following manner –
import pymongo client = pymongo.Mongo
client('mongodb://localhost:27017')

Q2. How to retrieve specific data from MongoDB using Python?

A. You can retrieve documents from MongoDB collection using the find() or find_one() functions. In both of these functions, you can provide the filter query and the specific fields you want to retrieve from these documents.

Q3. How to insert data into MongoDB using Python?

A. You can insert data in MongoDB collection using the insert() or insert_many() functions.

Q4. How do you update a document in MongoDB using Python?

A. To update a document in MongoDB using Python, you can use the update_one() or update_many() methods provided by PyMongo. For example, to update a specific field in a document, you would specify the filter criteria and the new values to set.

Q5. How do you delete a document from MongoDB using Python?

A. To delete a document from MongoDB using Python, you can utilize the delete_one() or delete_many() methods. These methods allow you to specify the criteria for which documents to remove from the collection.

Ideas have always excited me. The fact that we could dream of something and bring it to reality fascinates me. Computer Science provides me a window to do exactly that. I love programming and use it to solve problems and a beginner in the field of Data Science.

Responses From Readers

Clear

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details