Artificial Intelligence has entered a new era. Gone are the days when models would simply output information based on predefined rules. The cutting-edge approach in AI today revolves around RAG (Retrieval-Augmented Generation) systems, and more specifically, the use of agents to intelligently retrieve, analyze, and verify information. This is the future of intelligent data retrieval — where machine learning models not only answer questions but do so with unprecedented accuracy and depth.
In this blog, we’ll dive into how you can build your own agent-powered RAG system using CrewAI and LangChain, two of the most powerful tools that are revolutionizing the way we interact with AI. But before we dive into the code, let’s get familiar with these game-changing technologies.
This article was published as a part of the Data Science Blogathon.
RAG represents a hybrid approach in modern AI. Unlike traditional models that solely rely on pre-existing knowledge baked into their training, RAG systems pull real-time information from external data sources (like databases, documents, or the web) to augment their responses.
In simple terms, a RAG system doesn’t just guess or rely on what it “knows”—it actively retrieves relevant, up-to-date information and then generates a coherent response based on it. This ensures that the AI’s answers are not only accurate but also grounded in real, verifiable facts.
Now that you understand what RAG is, imagine supercharging it with agents—AI entities that handle specific tasks like retrieving data, evaluating its relevance, or verifying its accuracy. This is where CrewAI and LangChain come into play, making the process even more streamlined and powerful.
Think of CrewAI as an intelligent manager that orchestrates a team of agents. Each agent specializes in a particular task, whether it’s retrieving information, grading its relevance, or filtering out errors. The magic happens when these agents collaborate—working together to process complex queries and deliver precise, accurate answers.
While CrewAI brings the intelligence of agents, LangChain enables you to build workflows that chain together complex AI tasks. It ensures that agents perform their tasks in the right order, creating seamless, highly orchestrated AI processes.
LLM Orchestration: LangChain works with a wide variety of large language models (LLMs), from OpenAI to Hugging Face, enabling complex natural language processing.
By combining CrewAI’s agent-based framework with LangChain’s task orchestration, you can create a robust Agentic RAG system. In this system, each agent plays a role—whether it’s fetching relevant documents, verifying the quality of retrieved information, or grading answers for accuracy. This layered approach ensures that responses are not only accurate but are grounded in the most relevant and recent information available.
Let’s move forward and build an Agent-Powered RAG System that answers complex questions using a pipeline of AI agents.
We will now start building our own agentic RAG System step by step below:
Before diving into the code, let’s install the necessary libraries:
!pip install crewai==0.28.8 crewai_tools==0.1.6 langchain_community==0.0.29 sentence-transformers langchain-groq --quiet
!pip install langchain_huggingface --quiet
!pip install --upgrade crewai langchain langchain_community
We start by importing the necessary libraries:
from langchain_openai import ChatOpenAI
import os
from crewai_tools import PDFSearchTool
from langchain_community.tools.tavily_search import TavilySearchResults
from crewai_tools import tool
from crewai import Crew
from crewai import Task
from crewai import Agent
In this step, we imported:
To access the Groq API, you typically need to authenticate by generating an API key. You can generate this key by logging into the Groq Console. Here’s a general outline of the process:
This API key will be used in your HTTP headers for API requests to authenticate and interact with the Groq system.
Always refer to the official Groq documentation for specific details or additional steps related to accessing the API.
import os
os.environ['GROQ_API_KEY'] = 'Add Your Groq API Key'
llm = ChatOpenAI(
openai_api_base="https://api.groq.com/openai/v1",
openai_api_key=os.environ['GROQ_API_KEY'],
model_name="llama3-8b-8192",
temperature=0.1,
max_tokens=1000,
)
Here, we define the language model that will be used by the system:
To demonstrate how RAG works, we download a PDF and search through it:
import requests
pdf_url = 'https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf'
response = requests.get(pdf_url)
with open('attenstion_is_all_you_need.pdf', 'wb') as file:
file.write(response.content)
This downloads the famous “Attention is All You Need” paper and saves it locally. We’ll use this PDF in the following step for searching.
In this section, we create a RAG tool that searches a PDF using a language model and an embedder for semantic understanding.
Finally, the rag_tool.run() function is executed with a query like “How did the self-attention mechanism evolve in large language models?” to retrieve information.
rag_tool = PDFSearchTool(pdf='/content/attenstion_is_all_you_need.pdf',
config=dict(
llm=dict(
provider="groq", # or google, openai, anthropic, llama2, ...
config=dict(
model="llama3-8b-8192",
# temperature=0.5,
# top_p=1,
# stream=true,
),
),
embedder=dict(
provider="huggingface", # or openai, ollama, ...
config=dict(
model="BAAI/bge-small-en-v1.5",
#task_type="retrieval_document",
# title="Embeddings",
),
),
)
)
rag_tool.run("How did self-attention mechanism evolve in large language models?")
Setup Your Tavily API Key in order also to enable web search functionality:
import os
# Set the Tavily API key
os.environ['TAVILY_API_KEY'] = "Add Your Tavily API Key"
web_search_tool = TavilySearchResults(k=3)
web_search_tool.run("What is self-attention mechanism in large language models?")
This tool allows us to perform a web search, retrieving up to 3 results.
@tool
def router_tool(question):
"""Router Function"""
if 'self-attention' in question:
return 'vectorstore'
else:
return 'web_search'
The router tool directs queries to either a vectorstore (for highly technical questions) or a web search. It checks the content of the query and makes the appropriate decision.
We define a series of agents to handle different parts of the query-answering pipeline:
Routes questions to the right retrieval tool (PDF or web search).
Router_Agent = Agent(
role='Router',
goal='Route user question to a vectorstore or web search',
backstory=(
"You are an expert at routing a user question to a vectorstore or web search."
"Use the vectorstore for questions on concept related to Retrieval-Augmented Generation."
"You do not need to be stringent with the keywords in the question related to these topics. Otherwise, use web-search."
),
verbose=True,
allow_delegation=False,
llm=llm,
)
Retrieves the information from the chosen source (PDF or web search).
Retriever_Agent = Agent(
role="Retriever",
goal="Use the information retrieved from the vectorstore to answer the question",
backstory=(
"You are an assistant for question-answering tasks."
"Use the information present in the retrieved context to answer the question."
"You have to provide a clear concise answer."
),
verbose=True,
allow_delegation=False,
llm=llm,
)
Ensures the retrieved information is relevant.
Grader_agent = Agent(
role='Answer Grader',
goal='Filter out erroneous retrievals',
backstory=(
"You are a grader assessing relevance of a retrieved document to a user question."
"If the document contains keywords related to the user question, grade it as relevant."
"It does not need to be a stringent test.You have to make sure that the answer is relevant to the question."
),
verbose=True,
allow_delegation=False,
llm=llm,
)
Filters out hallucinations(incorrect answers).
hallucination_grader = Agent(
role="Hallucination Grader",
goal="Filter out hallucination",
backstory=(
"You are a hallucination grader assessing whether an answer is grounded in / supported by a set of facts."
"Make sure you meticulously review the answer and check if the response provided is in alignmnet with the question asked"
),
verbose=True,
allow_delegation=False,
llm=llm,
)
Grades the final answer and ensures it’s useful.
answer_grader = Agent(
role="Answer Grader",
goal="Filter out hallucination from the answer.",
backstory=(
"You are a grader assessing whether an answer is useful to resolve a question."
"Make sure you meticulously review the answer and check if it makes sense for the question asked"
"If the answer is relevant generate a clear and concise response."
"If the answer gnerated is not relevant then perform a websearch using 'web_search_tool'"
),
verbose=True,
allow_delegation=False,
llm=llm,
)
Each task is defined to assign a specific role to the agents:
Determines whether the query should go to the PDF search or web search.
router_task = Task(
description=("Analyse the keywords in the question {question}"
"Based on the keywords decide whether it is eligible for a vectorstore search or a web search."
"Return a single word 'vectorstore' if it is eligible for vectorstore search."
"Return a single word 'websearch' if it is eligible for web search."
"Do not provide any other premable or explaination."
),
expected_output=("Give a binary choice 'websearch' or 'vectorstore' based on the question"
"Do not provide any other premable or explaination."),
agent=Router_Agent,
tools=[router_tool],
)
Retrieves the necessary information.
retriever_task = Task(
description=("Based on the response from the router task extract information for the question {question} with the help of the respective tool."
"Use the web_serach_tool to retrieve information from the web in case the router task output is 'websearch'."
"Use the rag_tool to retrieve information from the vectorstore in case the router task output is 'vectorstore'."
),
expected_output=("You should analyse the output of the 'router_task'"
"If the response is 'websearch' then use the web_search_tool to retrieve information from the web."
"If the response is 'vectorstore' then use the rag_tool to retrieve information from the vectorstore."
"Return a claer and consise text as response."),
agent=Retriever_Agent,
context=[router_task],
#tools=[retriever_tool],
)
Grades the retrieved information.
grader_task = Task(
description=("Based on the response from the retriever task for the quetion {question} evaluate whether the retrieved content is relevant to the question."
),
expected_output=("Binary score 'yes' or 'no' score to indicate whether the document is relevant to the question"
"You must answer 'yes' if the response from the 'retriever_task' is in alignment with the question asked."
"You must answer 'no' if the response from the 'retriever_task' is not in alignment with the question asked."
"Do not provide any preamble or explanations except for 'yes' or 'no'."),
agent=Grader_agent,
context=[retriever_task],
)
Ensures the answer is grounded in facts.
hallucination_task = Task(
description=("Based on the response from the grader task for the quetion {question} evaluate whether the answer is grounded in / supported by a set of facts."),
expected_output=("Binary score 'yes' or 'no' score to indicate whether the answer is sync with the question asked"
"Respond 'yes' if the answer is in useful and contains fact about the question asked."
"Respond 'no' if the answer is not useful and does not contains fact about the question asked."
"Do not provide any preamble or explanations except for 'yes' or 'no'."),
agent=hallucination_grader,
context=[grader_task],
)
Provides the final answer or performs a web search if needed.
answer_task = Task(
description=("Based on the response from the hallucination task for the quetion {question} evaluate whether the answer is useful to resolve the question."
"If the answer is 'yes' return a clear and concise answer."
"If the answer is 'no' then perform a 'websearch' and return the response"),
expected_output=("Return a clear and concise response if the response from 'hallucination_task' is 'yes'."
"Perform a web search using 'web_search_tool' and return ta clear and concise response only if the response from 'hallucination_task' is 'no'."
"Otherwise respond as 'Sorry! unable to find a valid response'."),
context=[hallucination_task],
agent=answer_grader,
#tools=[answer_grader_tool],
)
We group the agents and tasks into a Crew that will manage the overall pipeline:
rag_crew = Crew(
agents=[Router_Agent, Retriever_Agent, Grader_agent, hallucination_grader, answer_grader],
tasks=[router_task, retriever_task, grader_task, hallucination_task, answer_task],
verbose=True,
)
Finally, we ask a question and kick off the RAG system:
inputs ={"question":"How does self-attention mechanism help large language models?"}
result = rag_crew.kickoff(inputs=inputs)
print(result)
This pipeline processes the question through the agents, retrieves the relevant information, filters out hallucinations, and provides a concise and relevant answer.
The combination of RAG, CrewAI, and LangChain is a glimpse into the future of AI. By leveraging agentic intelligence and task chaining, we can build systems that are smarter, faster, and more accurate. These systems don’t just generate information—they actively retrieve, verify, and filter it to ensure the highest quality of responses.
With tools like CrewAI and LangChain at your disposal, the possibilities for building intelligent, agent-driven AI systems are endless. Whether you’re working in AI research, automated customer support, or any other data-intensive field, Agentic RAG systems are the key to unlocking new levels of efficiency and accuracy.
You can click here to access the link.
A. CrewAI orchestrates multiple AI agents, each specializing in tasks like retrieving information, verifying relevance, and ensuring accuracy.
A. LangChain creates workflows that chain AI tasks together, ensuring each step of data processing and retrieval happens in the right order.
A. Agents handle specific tasks like retrieving data, verifying its accuracy, and grading responses, making the system more reliable and precise.
A. The Groq API provides access to powerful language models like LLaMA 3, enabling high-performance AI for complex tasks.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.