LangGraph vs CrewAI vs AutoGen to Build a Data Analysis Agent

Santhosh Reddy Dandavolu Last Updated : 31 Jan, 2025

8 min read

In today’s data-driven world, organizations rely on data analysts to interpret complex datasets, uncover actionable insights, and drive decision-making. But what if we could enhance the efficiency and scalability of this process using AI? Enter the Data Analysis Agent, to automate analytical tasks, execute code, and adaptively respond to data queries. LangGraph, CrewAI, and AutoGen are three popular frameworks used to build AI Agents. We will be using and comparing all three in this article to build a simple data analysis agent.

Working of Data Analysis Agent
Building a Data Analysis Agent with LangGraph
- Pre-requisites
Steps to Build a Data Analysis Agent with LangGraph
Building a Data Analysis Agent with CrewAI
Building a Data Analysis Agent with AutoGen
LangGraph vs CrewAI vs AutoGen
Frequently Asked Questions

Working of Data Analysis Agent

The data analysis agent will first take the query from the user and generate the code to read the file and analyze the data in the file. Then the generated code will be executed using the Python repl tool. The result of the code is sent back to the agent. The agent then analyzes the result received from the code execution tool and replies to the user query. LLMs can generate arbitrary code, so we must carefully execute the LLM-generated code in a local environment.

Building a Data Analysis Agent with LangGraph

If you are new to this topic or wish to brush up on your knowledge of LangGraph, here’s an article I would recommend: What is LangGraph?

Pre-requisites

Before building agents, ensure you have the necessary API keys for the required LLMs.

Load the .env file with the API keys needed.

from dotenv import load_dotenv

load_dotenv(./env)

Key Libraries Required

langchain – 0.3.7

langchain-experimental – 0.3.3

langgraph – 0.2.52

crewai – 0.80.0

Crewai-tools – 0.14.0

autogen-agentchat – 0.2.38

Now that we’re all set, let’s begin building our agent.

Steps to Build a Data Analysis Agent with LangGraph

1. Import the necessary libraries.

import pandas as pd
from IPython.display import Image, display
from typing import List, Literal, Optional, TypedDict, Annotated
from langchain_core.tools import tool
from langchain_core.messages import ToolMessage
from langchain_experimental.utilities import PythonREPL
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode, tools_condition
from langgraph.checkpoint.memory import MemorySaver

2. Let’s define the state.

class State(TypedDict):
	messages: Annotated[list, add_messages]
graph_builder = StateGraph(State)

3. Define the LLM and the code execution function and bind the function to the LLM.

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.1)

@tool
def python_repl(code: Annotated[str, "filename to read the code from"]):
    """Use this to execute python code read from a file. If you want to see the output of a value,
    Make sure that you read the code from correctly
    you should print it out with `print(...)`. This is visible to the user."""

    try:
        result = PythonREPL().run(code)
        print("RESULT CODE EXECUTION:", result)
    except BaseException as e:
        return f"Failed to execute. Error: {repr(e)}"
    return f"Executed:\n```python\n{code}\n```\nStdout: {result}"

llm_with_tools = llm.bind_tools([python_repl])

4. Define the function for the agent to reply and add it as a node to the graph.

def chatbot(state: State):
    return {"messages": [llm_with_tools.invoke(state["messages"])]}
    
graph_builder.add_node("agent", chatbot)

5. Define the ToolNode and add it to the graph.

code_execution = ToolNode(tools=[python_repl])

graph_builder.add_node("tools", code_execution)

If the LLM returns a tool call, we need to route it to the tool node; otherwise, we can end it. Let’s define a function for routing. Then we can add other edges.

def route_tools(state: State,):
    """
    Use in the conditional_edge to route to the ToolNode if the last message
    has tool calls. Otherwise, route to the end.
    """
    if isinstance(state, list):
        ai_message = state[-1]
    elif messages := state.get("messages", []):
        ai_message = messages[-1]
    else:
        raise ValueError(f"No messages found in input state to tool_edge: {state}")
    if hasattr(ai_message, "tool_calls") and len(ai_message.tool_calls) > 0:
        return "tools"
    return END
    
graph_builder.add_conditional_edges(
    "agent",
    route_tools,
    {"tools": "tools", END: END},
)

graph_builder.add_edge("tools", "agent")

6. Let us also add the memory so that we can chat with the agent.

memory = MemorySaver()

graph = graph_builder.compile(checkpointer=memory)

7. Compile and display the graph.

graph = graph_builder.compile(checkpointer=memory)

display(Image(graph.get_graph().draw_mermaid_png()))

8. Now we can start the chat. Since we have added memory, we will give each conversation a unique thread_id and start the conversation on that thread.

config = {"configurable": {"thread_id": "1"}}

def stream_graph_updates(user_input: str):
    events = graph.stream(
        {"messages": [("user", user_input)]}, config, stream_mode="values"
    )
    for event in events:
        event["messages"][-1].pretty_print()
        
while True:
    user_input = input("User: ")
    if user_input.lower() in ["quit", "exit", "q"]:
        print("Goodbye!")
        break
    stream_graph_updates(user_input)

While the loop is running, we start by giving the path of the file and then asking any questions based on the data.

The output will be as follows:

As we have included memory, we can ask any questions on the dataset in the chat. The agent will generate the required code and the code will be executed. The code execution result will be sent back to the LLM. An example is shown below:

Also Read: How to Create Your Personalized News Digest Agent with LangGraph

Building a Data Analysis Agent with CrewAI

Now, we will use CrewAI for data analysis task.

1. Import the necessary libraries.

from crewai import Agent, Task, Crew
from crewai.tools import tool
from crewai_tools import DirectoryReadTool, FileReadTool
from langchain_experimental.utilities import PythonREPL

2. We will build one agent for generating the code and another for executing that code.

coding_agent = Agent(
	role="Python Developer",
	goal="Craft well-designed and thought-out code to answer the given problem",
	backstory="""You are a senior Python developer with extensive experience in software and its best practices.
            	You have expertise in writing clean, efficient, and scalable code. """,
	llm='gpt-4o',
	human_input=True,
)
coding_task = Task(
	description="""Write code to answer the given problem
                	assign the code output to the 'result' variable
                    	Problem: {problem},
                    	""",
	expected_output="code to get the result for the problem. output of the code should be assigned to the 'result' variable",
	agent=coding_agent
)

3. To execute the code, we will use PythonREPL(). Define it as a crewai tool.

@tool("repl")
def repl(code: str) -> str:
	"""Useful for executing Python code"""
	return PythonREPL().run(command=code)

4. Define executing agent and tasks with access to repl and FileReadTool()

executing_agent = Agent(
	role="Python Executor",
	goal="Run the received code to answer the given problem",
	backstory="""You are a Python developer with extensive experience in software and its best practices.
            	"You can execute code, debug, and optimize Python solutions effectively.""",
	llm='gpt-4o-mini',
	human_input=True,
	tools=[repl, FileReadTool()]
)
executing_task = Task(
	description="""Execute the code to answer the given problem
                	assign the code output to the 'result' variable
                    	Problem: {problem},
                    	""",
	expected_output='the result for the problem',
	agent=executing_agent
)

5. Build the crew with both agents and corresponding tasks.

analysis_crew = Crew(
	agents=[coding_agent, executing_agent],
	tasks=[coding_task, executing_task],
	verbose=True
)

6. Run the crew with the following inputs.

inputs = {'problem': """read this file and return the column names and find mean age
   "/home/santhosh/Projects/Code/LangGraph/gym_members_exercise_tracking.csv""",}

result = analysis_crew.kickoff(inputs=inputs)

print(result.raw)

Here’s how the output will look like:

Also Read: Build LLM Agents on the Fly Without Code With CrewAI

Building a Data Analysis Agent with AutoGen

1. Import the necessary libraries.

from autogen import ConversableAgent
from autogen.coding import LocalCommandLineCodeExecutor, DockerCommandLineCodeExecutor

2. Define the code executor and an agent to use the code executor.

executor = LocalCommandLineCodeExecutor(
	timeout=10,  # Timeout for each code execution in seconds.
	work_dir='./Data',  # Use the directory to store the code files.
)
code_executor_agent = ConversableAgent(
	"code_executor_agent",
	llm_config=False,
	code_execution_config={"executor": executor},
	human_input_mode="ALWAYS",
)

3. Define an agent to write the code with a custom system message.

Take the code_writer system message from https://microsoft.github.io/autogen/0.2/docs/tutorial/code-executors/



code_writer_agent = ConversableAgent(
    "code_writer_agent",
    system_message=code_writer_system_message,
    llm_config={"config_list": [{"model": "gpt-4o-mini"}]},
    code_execution_config=False,
)

4. Define the problem to solve and initiate the chat.

problem = """Read the file at the path '/home/santhosh/Projects/Code/LangGraph/gym_members_exercise_tracking.csv'
        	and print mean age of the people."""

chat_result = code_executor_agent.initiate_chat(
	code_writer_agent,
	message=problem,
)

Once the chat starts, we can also ask any subsequent questions on the dataset mentioned above. If the code encounters any error, we can ask to modify the code. If the code is fine, we can just press ‘enter’ to continue executing the code.

5. We can also print the questions asked by us and their answers, if required, using this code.

for message in chat_result.chat_history:
    if message['role'] == 'assistant':
        if 'exitcode' not in message['content']:
            print(message['content'])
            print('\n')
            
    else:
        if 'TERMINATE' in message['content']:        
            print(message['content'])
            print("----------------------------------------")

Here’s the result:

Also Read: Hands-on Guide to Building Multi-Agent Chatbots with AutoGen

LangGraph vs CrewAI vs AutoGen

Now that you’ve learned to build a data analysis agent with all the 3 frameworks, let’s explore the differences between them, when it comes to code execution:

Framework	Key Features	Strengths	Best Suited For
LangGraph	– Graph-based structure (nodes represent agents/tools, edges define interactions) – Seamless integration with PythonREPL	– Highly flexible for creating structured, multi-step workflows – Safe and efficient code execution with memory preservation across tasks	Complex, process-driven analytical tasks that demand clear, customizable workflows
CrewAI	– Collaboration-focused – Multiple agents working in parallel with predefined roles – Integrates with LangChain tools	– Task-oriented design – Excellent for teamwork and role specialization – Supports safe and reliable code execution with PythonREPL	Collaborative data analysis, code review setups, task decomposition, and role-based execution
AutoGen	– Dynamic and iterative code execution – Conversable agents for interactive execution and debugging – Built-in chat feature	– Adaptive and conversational workflows – Focus on dynamic interaction and debugging – Ideal for rapid prototyping and troubleshooting	Rapid prototyping, troubleshooting, and environments where tasks and requirements evolve frequently

Conclusion

In this article, we demonstrated how to build data analysis agents using LangGraph, CrewAI, and AutoGen. These frameworks enable agents to generate, execute, and analyze code to address data queries efficiently. By automating repetitive tasks, these tools make data analysis faster and more scalable. The modular design allows customization for specific needs, making them valuable for data professionals. These agents showcase the potential of AI to simplify workflows and extract insights from data with ease.

To know more about AI Agents, checkout our exclusive Agentic AI Pioneer Program!

Frequently Asked Questions

Q1. What are the key benefits of using AI frameworks like LangGraph, CrewAI, and AutoGen for data analysis?

A. These frameworks automate code generation and execution, enabling faster data processing and insights. They streamline workflows, reduce manual effort, and enhance productivity for data-driven tasks.

Q2. Can these data analysis agents handle multiple datasets or complex queries?

A. Yes, the agents can be customized to handle diverse datasets and complex analytical queries by integrating appropriate tools and adjusting their workflows.

Q3. What precautions should be taken when executing LLM-generated code?

A. LLM-generated code may include errors or unsafe operations. Always validate the code in a controlled environment to ensure accuracy and security before execution.

Q4. How does memory integration enhance these data analysis agents?

A. Memory integration allows agents to retain the context of past interactions, enabling adaptive responses and continuity in complex or multi-step queries.

Q5. What types of tasks can these data analysis agents automate?

A. These agents can automate tasks such as reading files, performing data cleaning, generating summaries, executing statistical analyses, and answering user queries about the data.

Santhosh Reddy Dandavolu

I am working as an Associate Data Scientist at Analytics Vidhya, a platform dedicated to building the Data Science ecosystem. My interests lie in the fields of Natural Language Processing (NLP), Deep Learning, and AI Agents.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

LangGraph vs CrewAI vs AutoGen to Build a Data Analysis Agent

Table of Contents

Working of Data Analysis Agent

Building a Data Analysis Agent with LangGraph

Pre-requisites

Steps to Build a Data Analysis Agent with LangGraph

1. Import the necessary libraries.

2. Let’s define the state.

3. Define the LLM and the code execution function and bind the function to the LLM.

4. Define the function for the agent to reply and add it as a node to the graph.

5. Define the ToolNode and add it to the graph.

6. Let us also add the memory so that we can chat with the agent.

7. Compile and display the graph.

8. Now we can start the chat. Since we have added memory, we will give each conversation a unique thread_id and start the conversation on that thread.

Building a Data Analysis Agent with CrewAI

1. Import the necessary libraries.

2. We will build one agent for generating the code and another for executing that code.

3. To execute the code, we will use PythonREPL(). Define it as a crewai tool.

4. Define executing agent and tasks with access to repl and FileReadTool()

5. Build the crew with both agents and corresponding tasks.

6. Run the crew with the following inputs.

Building a Data Analysis Agent with AutoGen

1. Import the necessary libraries.

2. Define the code executor and an agent to use the code executor.

3. Define an agent to write the code with a custom system message.

4. Define the problem to solve and initiate the chat.

5. We can also print the questions asked by us and their answers, if required, using this code.

LangGraph vs CrewAI vs AutoGen

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth