Automating Email Sorting and Labelling with CrewAI

Abhiraj Suresh Last Updated : 22 Oct, 2024
13 min read

Introduction

Never would have the inventor of email –Ray Tomlinson– thought of how far this piece of tech would reach in the future. Today, email is the prime pillar of corporate and professional communications and is used in innumerable facets of the working world. And this has propelled the creation of a whole set of tools and plugins to optimise your email inbox! With the advent of generative AI, the latest talk in town is about the widespread use of AI Agents to optimise emails. In this blog, I will explain the craft of automating email sorting and labelling by building agents using CrewAI.

Overview

  • Learn how to give your applications access to your Gmail using Google Cloud Console’s OAuth 2.0
  • Understand the code and build an agent that uses an LLM to read and categorise mails into pre-defined categories.
  • Learn how to automate the process of email sorting and labelling using crewAI by simply running a Python script.

Understanding the Context

If you are not living under a rock and have an email address, you know how flooded your inbox tends to get after a night’s sleep. Marketing emails, personal messages, professional communications, and so on, especially if you are a busy working professional. The frustration of a cluttered inbox can be felt in the image below:

Automating Email Sorting and Labelling with CrewAI

The worst part is that even after creating relevant labels in the email (Gmail in the case of the author), one has to take the time to assign these emails to the correct label. With the developments in generative AI, shouldn’t it be much easier to sort and label emails in Gmail? Why don’t we reduce the steps of sorting relevant emails to a single click?  

Let me illustrate how we can use crewAI to build an LLM agent and automate this process. The end goal is automatically sorting our unread emails using crewAI into three categories: ‘Reply Immediately’, ‘No Reply’, and ‘Irrelevant’We then add these categorised emails to Gmail and the respective labels. 

Note: We manually created these labels –  ‘Reply Immediately’, ‘No Reply’, and ‘Irrelevant’ – in Gmail.

Automating Email Sorting and Labelling with CrewAI

Also Read: Top 10 AI Email Automation Tools to Use in 2024

Steps for Google Authentication

Before we jump to the code to sort emails in Gmail, you need to enable the Gmail API and generate the OAuth 2.0 credentials. This will help your email sorting agentic system access your emails. Here are the steps to do this.

Step 1: Create a New Project in Google Cloud

The first step in a Google authentication process is to create a new project in Google Cloud.

  • 1. Visit the Google Cloud console and log in with your email address. First-time users must create an account.
  • Then select “New Project” in the dropdown, give it a name, and click Create. This project will have the required API-related configurations. While adding the new project, choose your organisation name as the location, as we have selected analyticsvidhya.com
Automating Email Sorting and Labelling with CrewAI | Steps for Google Authentication
Automating Email Sorting and Labelling with CrewAI | Steps for Google Authentication

Step 2: Enable Gmail API

The next step is to enable Gmail API.

  • 1. Click the Navigation Menu from the console’s Dashboard to Explore and Enable APIs under “Getting Started.
  • On the left side of the screen, select Library and search for “Gmail API.” Then, please enable it for the project you created.
Automating Email Sorting and Labelling with CrewAI | Steps for Google Authentication

Step 3: Set Up OAuth 2.0 Credentials

The last step in Google Authentication is to set up OAuth 2.0 credentials.

  • First, set up the OAuth consent screen under APIs & Services >. Click Configure Consent Screen.
Automating Email Sorting and Labelling with CrewAI | Steps for Google Authentication
  • Choose the type (e.g., External for apps used by anyone). We chose Internal since we are using it for our email ID. Then Click Create.
  • Name your app and add the User support email and Developer contact information. Ideally, you should add your work email ID. Click SAVE AND CONTINUE at the bottom of the screen.
Automating Email Sorting and Labelling with CrewAI | Steps for Google Authentication
  • Now, Define the scopes in the consent screen setup. Scopes in the context of Google Console dictate what the API can access. For email-related tasks, you’ll need the following: ‘https://www.googleapis.com/auth/gmail.modify‘. This scope will allow our email sorting and labelling system to send and modify emails in your Gmail account.

Click on ADD OR REMOVE SCOPES and then select the scope mentioned above.

  • Click on Update. You can see the scope has been added. Press SAVE AND CONTINUE.
Automating Email Sorting and Labelling with CrewAI | Steps for Google Authentication
  • Go through the summary and click BACK TO DASHBOARD.

Step 4: Creating Credentials

  • Now, choose Credentials under APIs & Services and click CREATE CREDENTIALS.
  • Select OAuth client ID.
  • For local development, we will choose the Desktop App option. And then press CREATE.
Automating Email Sorting and Labelling with CrewAI | Steps for Google Authentication

Step 5: Download the Credential.json

  • Now download the JSON file and save it locally at your preferred location.

Python Code for Sorting and Labelling Emails Using crewAI

We will now begin coding the email sorting and labelling system using crewAI. Ensure you have installed crewAI and utils.

Install crewAI

#pip install crewai==0.28.8 crewai_tools==0.1.6

#pip install utils

#pip install google-auth-oauthlib

#pip install google-api-python-client

Import Libraries

Now, we will import the relevant libraries.

# Import standard libraries

import json                   # For working with JSON data

import os                     # For interacting with the operating system

import os.path                # For manipulating filesystem pathnames

import pickle                 # For serializing and deserializing Python objects

# Import Google API client libraries for authentication and API requests

from google.auth.transport.requests import Request            # For making authenticated HTTP requests

from google.oauth2.credentials import Credentials             # For handling OAuth 2.0 credentials

from google_auth_oauthlib.flow import InstalledAppFlow        # For managing OAuth 2.0 authorization flows

from googleapiclient.discovery import build                   # For constructing a Resource object for interacting with an API

from googleapiclient.errors import HttpError                  # For handling HTTP errors from API requests

# Import custom libraries (assuming these are your own modules or third-party packages)

from crewai import Agent, Task, Crew                          # For managing agents, tasks, and crews

from crewai_tools import Tool                                 # For additional tools and utilities

# Import pandas library

import pandas as pd                                           # For data manipulation and analysis

Access to LLM

After this, we will give our email sorter and labeller access to an LLM. I am using the GPT 4o model for this task, but you can choose the LLM of your choice.

# Set OpenAI API key

os.environ['OPENAI_API_KEY'] = "Your API Key"

os.environ["OPENAI_MODEL_NAME"] = 'Model Name’'

Now, we will create the EmailCollector task, which uses Google’s API to interact with Gmail and fetch unread emails.

class EmailCollector:

    def __init__(self):

        self.creds = None

        self._authenticate()

    def _authenticate(self):

        SCOPES = ['https://www.googleapis.com/auth/gmail.modify']  # Use modify to update emails

        if os.path.exists('token.pickle'):

            with open('token.pickle', 'rb') as token:

                self.creds = pickle.load(token)

        if not self.creds or not self.creds.valid:

            if self.creds and self.creds.expired and self.creds.refresh_token:

                self.creds.refresh(Request())

            else:

                flow = InstalledAppFlow.from_client_secrets_file(

                    '/Users/admin/Desktop/Blogs/credentials.json', SCOPES)

                self.creds = flow.run_local_server(port=0)

            with open('token.pickle', 'wb') as token:

                pickle.dump(self.creds, token)

    def get_unread_emails(self):

        service = build('gmail', 'v1', credentials=self.creds)

        results = service.users().messages().list(userId='me', q='is:unread').execute()

        messages = results.get('messages', [])

        email_list = []

        if not messages:

            print('No unread emails found.')

        else:

            for message in messages:

                msg = service.users().messages().get(userId='me', id=message['id'], format='full').execute()

                email_list.append(msg)

        return email_list

Code Explanation

Here is what the above code does:

  1. First, the EmailCollector initialises with self.creds = None and calls the _authenticate method to handle authentication.
  2. Then, the  _authenticate method checks if a token.pickle file exists. If it exists, it loads the stored credentials from the file using pickle.
  3. If credentials are expired or missing, it checks if a refresh token is available to refresh them. If not, it initiates an OAuth flow using InstalledAppFlow, prompting the user to log in.
  4. After obtaining the credentials (new or refreshed), they are saved to the token pickle for future use.
  5. The get_unread_emails method retrieves a list of unread emails after connecting to the Gmail API using the authenticated credentials.
  6. It loops through each unread email, retrieves full message details, and stores them in the email_list.
  7. Finally, the list of unread emails is returned for further use.

Create Mail Data Gatherer

Now, we will create the mailDataGatherer() function to gather the unread emails and save them as a .csv file.

def mailDataGatherer():

    # Create the Email Collector instance

    email_collector_instance = EmailCollector()

    # Get unread emails

    email_list = email_collector_instance.get_unread_emails()

    # Prepare data for DataFrame

    emails_data = []

    for email in email_list:

        subject = next(header['value'] for header in email['payload']['headers'] if header['name'] == 'Subject')

        body = ''

        if 'parts' in email['payload']:

            for part in email['payload']['parts']:

                if part['mimeType'] == 'text/plain':

                    body = part['body']['data']

                    break

        else:

            body = email['payload']['body']['data']

        import base64

        body = base64.urlsafe_b64decode(body).decode('utf-8')

        emails_data.append({

            'Subject': subject,

            'Body': body

        })

    # Create a DataFrame

    df = pd.DataFrame(emails_data)

    # Save DataFrame to CSV

    df.to_csv('unread_emails.csv', index=False)

    df.fillna("",inplace=True)

    # Print the DataFrame

    return df

Code Explanation

Here is an explanation of the above code:

  • First, the function handles authentication and retrieves unread emails using an instance of the EmailCollector class.
  • Then, we fetch the unread emails using the get_unread_emails() method from the EmailCollector instance.
  • After that, we process the email data inside the loop, which is
    1. The function extracts the subject from the email headers for each email in the list.
    2. It checks the email’s body, decoding it from base64 format to get the actual text.
    3. It collects both the subject and body of the emails.
  • We will store the processed email data (subject and body) in a Pandas DataFrame.
  • The DataFrame is saved as a CSV file called unread_emails.csv, and any missing values are filled with empty strings.
  • Finally, the function returns the DataFrame for further use or viewing.

Create a Tool to Extract

Now, we will create the extract_mail_tool for our agent using the Tool functionality within crewAI. This tool will use the emaiDataGatherer function to gather the subject and body of the unread email.

# Create the Extract Subjects tool

extract_mail_tool = Tool(

    name="mailDataGatherer",

    description="Get all the subjects and body content of unread emails.",

    func=mailDataGatherer

)

Now, we will create a function that takes in the JSON representation of the unread emails. It then decodes the emails and labels the corresponding emails in our Gmail inbox as per the pre-defined categories –  ‘Reply Immediately’, ‘No Reply’, and ‘Irrelevan.t’ 

def push_mail_label(dfjson):

    emails = pd.DataFrame(json.loads(dfjson.replace("```","").replace("json","")))

    SCOPES = ['https://www.googleapis.com/auth/gmail.modify']  # Use modify to update emails

    #Change the name and path of your file below accordingly

    flow = InstalledAppFlow.from_client_secrets_file(

                        '/Users/admin/Desktop/Blogs/credentials.json', SCOPES)

    creds = flow.run_local_server(port=0)

    service = build('gmail', 'v1', credentials=creds)

    labels = service.users().labels().list(userId='me').execute().get('labels', [])

    label_map = {label['name']: label['id'] for label in labels}

    def get_or_create_label(label_name):

        if label_name in label_map:

            return label_map[label_name]

        else:

            # Create new label if it doesn't exist

            new_label = {

                'labelListVisibility': 'labelShow',

                'messageListVisibility': 'show',

                'name': label_name

            }

            created_label = service.users().labels().create(userId='me', body=new_label).execute()

            return created_label['id']

    # Map the categories to Gmail labels, creating them if they don't exist

    category_label_map = {

        "Reply Immediately": get_or_create_label("Reply Immediately"),

        "No Reply": get_or_create_label("No Reply"),

        "Irrelevant": get_or_create_label("Irrelevant")

    }

    

    for i in range(emails.shape[0]):

        subject = emails.iloc[i]['Subject']

        body = emails.iloc[i]['Body']

        category = emails.iloc[i]['Category']

        search_query = f'subject:("{subject}")'

        results = service.users().messages().list(userId='me', q=search_query).execute()

        message_id = results['messages'][0]['id']  # Get the first matching message ID

        label_id = category_label_map.get(category, None)

        # Apply label based on category

        if label_id:

            service.users().messages().modify(

                userId='me',

                id=message_id,

                body={

                    'addLabelIds': [label_id],

                    'removeLabelIds': []  # Optionally add label removal logic here

                }

            ).execute()

            print(f'Updated email with subject: {subject} to label: {category}')

        else:

            print(f'Category {category} does not match any Gmail labels.')

        

    return

How does this function work?

The function works as follows:

  1. The dfjson string is first cleaned of extra characters. It is then parsed into a Pandas DataFrame that contains columns like Subject, Body, and Category for each email.
  2. Then, with Google OAuth 2.0, the function authenticates with Gmail through InstalledAppFlow using the gmail.modify scope. This allows it to modify emails (adding labels in our case).
  3. Next, the function retrieves all existing Gmail labels from the user’s account and stores them in a label_map dictionary (label name to label ID).
  4. After that, the get_or_create_label() function checks if a label already exists. If not, it creates a new one using the Gmail API.
  5. Next, it maps the categories in the email data to their respective Gmail labels (Reply Immediately, No Reply, Irrelevant), creating them if necessary.
  6. For each email, the function searches Gmail for a matching message by the subject, retrieves the message ID and applies the corresponding label based on the category.
  7. At the end, the function prints a message indicating success or failure for each email it processes.

Defining the Email Sorting Agent

This agent will classify the unread emails into three categories we have defined. The agent uses the extract_mail_tool to access the emails, which uses the mailDataGatherer() function to get the emails. Since this agent uses LLM, you must give proper instructions on what is expected to be done. For example, one explicit instruction I have added is about the kind of mail that must be included in each piece of content. You can read the instructions further in the agent’s backstory parameter. 

sorter = Agent(

    role="Email Sorter",

    goal="Clasify the emails into one of the 3  categories provided",

    tools=[extract_mail_tool],  # Add tools if needed

    verbose=True,

    backstory=(

        "You are an expert personal assistant working for a busy corporate professional."

        "You are handling the personal email account of the CEO."

        "You have been assigned a task to classify the unread emails into one of the three categories."

        "List of the three categories is : ['Reply Immediately', 'No Reply' and 'Irrelevant']"

        "'Reply Immediately' should include unread emails from within the organization and outside. This can be a general inquiry, query and calender invites."

        "'No Reply' includes unread emails where someone is letting me know of some development within the company. It also includes Newsletters that I have subscribed to."

        "'Irrelevant' contains unread emails that are from external businesses marketing there products."

        "You can apply your own intelligence while sorting the unread emails."

        "You get the data of unread mails in a pandas dataframe using the tool mailDataGatherer."

        "You will get 'Subject' and 'Body' columns in the dataframe returned by tool."

        "You should classify the mails on basis of subject and body and call it 'Category'."

        "There is no tool which you can use for classification. Use your own intelligence to do it."

        "Finally you return a json with 3 keys for each mail: 'Subject', 'Body', 'Category'"

    )

)

Note that we have set verbose = True, which helps us view how the model thinks. 

Define the Task

Now, we will define the Task that the sorter agent will perform. We have added the exact list of tasks, which you can read in the code below.

# Task for Sorter: Sort Emails

sort_task = Task(

    goal="Sort the categorised emails into: ['Reply Immediately', 'No Reply' and 'Irrelevant'] ",

    description= "1. At first, get the dataframe of unread emails using the 'mailDataGatherer'."

                    "2. Do not use the tool again once you have got the subject and body"

                    "3. Classify the emails into one of the above categories: ['Reply Immediately', 'No Reply' and 'Irrelevant']."

                    "4. There is no tool which you can use for classification. Use the instructions and your own intelligence to do it."

                    "5. Finally, return a json with three keys for each row: ['Subject', 'Body', 'Category']"

                    "6. All the emails must strictly be classified into one of above three categories only.",

    expected_output = "A json file with all the unread emails. There should be 3 keys for each mail- Subject, Body and Category ",

    agent=sorter,

    

    async_execution=False

)

Enable Collaboration in CrewAI

Next, we will use crewAI’s functionality to enable collaboration and let the sorter agent perform the tasks defined in the sort_task function above.

# Create the Crew

email_sorting_crew = Crew(

    agents=[sorter],

    tasks=[sort_task],

    verbose=True

)

Finally, we will kickoff our crew to let it sort the unread emails automatically and save the output in the result variable.

# Running the crew

result = email_sorting_crew.kickoff()

Since verbose = True for all elements in our agent, we get to see how our agent performs.

Python Code for Sorting and Labelling Emails Using crewAI
Python Code for Sorting and Labelling Emails Using crewAI

Now, we push this result into our Gmail using the push_mail_label() function.

#push the results

push_mail_label(result)

Note that throughout the execution, you will be twice redirected to a different window to give the necessary permissions. Once the code can reaccess the emails, the unread emails will be pushed under the relevant labels.

And Voila! We have the emails with labels attached. For privacy, I will not display my entire inbox.

Conclusion

Remember the frustration we discussed earlier in the blog? Well, crewAI’s email sorting and labelling agentic system allows us to save a significant amount of time.

With LLM agents, the possibilities of task automation are endless. It depends on how well you think and structure your agentic system. As a next step, you can try building an agent that checks whether the first agent’s categorisation is correct. Or, you can also try creating an agent that prepares an email for the ‘Reply Immediately’ category and pushes it to the drafts in Gmail. Sounds ambitious? So, go on and get your hands dirty. Happy learning!!

Frequently Asked Questions

Q1. What is crewAI?

A. crewAI is an open-source Python framework that supports developing and managing multi-agent AI systems. Using crewAI, you can build LLM-backed AI agents that can autonomously make decisions within an environment based on the variables present.

Q2. Can I use crewAI to sort my Gmail emails?

A. Yes! You can use crewAI to build LLM-backed agents to automate email sorting and labelling tasks.

Q3. How many agents do I need to build using crewAI to sort emails in Gmail?

A. Depending on your agentic system structure, you can use as many agents as you like to sort emails in Gmail. However, it is recommended that you build one agent per task. 

Q4. What tasks can CrewAI do?

A. CrewAI can perform various tasks, including sorting and writing emails, planning projects, generating articles, scheduling and posting social media content, and more.

Q5. How does sorting and labelling emails in Gmail using crewAI save time?

A. Since GenAI agents using crewAI make decisions using LLMs, it reduces the requirement to allocate human resources to go through emails and make decisions.

Q6. How can email sorting be automated using crewAI?

A. To automate email sorting using crewAI, you must define the agents with a descriptive backstory within the Agent function, define tasks for each agent using the Task functionality, and then create a Crew to enable different agents to collaborate.

My name is Abhiraj. I am currently a manager for the Instruction Design team at Analytics Vidhya. My interests include badminton, voracious reading, and meeting new people. On a daily basis I love learning new things and spreading my knowledge.

Responses From Readers

Clear

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details