LLM Agents play an increasingly important role in the generative landscape as reasoning engines. But most of the agents have the shortcomings of failing or going into hallucinations. However, agents face formidable challenges within Large Language Models (LLMs), including context understanding, coherence maintenance, and dynamic adaptability. LangGraph, a sophisticated graph-based representation of language, aids agents in navigating and comprehending complex linguistic structures, fostering deeper semantic understanding. Advanced RAG techniques such as Adaptive RAG, Corrective RAG, and Self RAG help mitigate these issues with LLM Agents.
This article will use RAG Techniques to build reliable and fail-safe LLM Agents using LangGraph of LangChain and Cohere LLM.
This article was published as a part of the Data Science Blogathon.
The essential principle underlying agents is to use a language model to pick a series of actions. This sequence is hardcoded into the code when used in chains. In contrast, agents use a language model as a reasoning engine to choose which actions to do and in what order.
It comprises of 3 components:
Agents can be created using the ReAct concept with Langchain or LangGraph.
1. Reliability: ReAct / Langchain Agent is less reliable as LLM has to make the correct decision at each step, whereas LangGraph is more reliable as the control flow is set. LLM performs a specific job at each node of the graph.
2. Flexibility: ReAct / Langchain Agent is more flexible as LLM can choose any sequence of action steps, whereas LangGraph is less flexible as actions are constrained by setting up the control flow at each node.
3. Compatibility with smaller LLMs: ReAct / Langchain Agent are not very compatible with smaller LLMs, whereas LangGraph is better compatible with smaller LLMs.
LangGraph is a package that extends LangChain by enabling circular computing in LLM applications. LangGraph allows for the inclusion of cycles, whereas earlier LangChain allowed the definition of computation chains (Directed Acyclic Graphs or DAGs). This enables more complex, agent-like behaviors in which an LLM can be called in a loop to decide the next action to execute.
1. Stateful Graph: LangGraph revolves around a stateful graph, where each node represents a step in your computation. The graph maintains a state passed around and updated as the computation progresses.
2. Nodes: Nodes are the building blocks of your LangGraph. Each node represents a function or a computation step. You define nodes to perform specific tasks, such as processing input, making decisions, or interacting with external APIs.
3. Edges: Edges connect the nodes in your graph, defining the computation flow. LangGraph supports conditional edges, allowing you to dynamically determine the next node to execute based on the current state of the graph.
Tavily Search API is a search engine optimized for LLMs, aiming for efficient, quick, and persistent search results. Unlike other search APIs like Serp or Google, Tavily optimizes search for AI developers and autonomous AI agents.
Cohere is an AI platform for the enterprise that specialises in large language model-powered solutions. Its main service is the Command R model (and the research open weights Command R+), which provides scalable and high-performance models that compete with offerings from firms such as OpenAI and Mistral.
We need to generate the free API key to use Cohere LLM. Visit the website and log in using a Google account or GitHub account. Once logged in, you will land at a Cohere dashboard page, as shown below.
Click on the API Keys option. You will see a Trial Free API key is generated.
Visit the sign-in page of the site here, log in using your Google Account and you will see a default-free plan for API key is generated called the “Research” plan.
Once you sign in using any account, you will land at the home page of your account, which will show a default-free plan with an API key generated, similar to the screen below.
Now, once the API keys are generated, then we need to install the required
libraries as below. One can use colab notebooks for development.
!pip install --quiet langchain langchain_cohere tiktoken chromadb pymupdf
Set the API Keys as environment variables
### Set API Keys
import os
os.environ["COHERE_API_KEY"] = "Cohere API Key"
os.environ["TAVILY_API_KEY"] = "Tavily API Key"
Build a vector index on top of the pdf using Cohere Embeddings.
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_cohere import CohereEmbeddings
#from langchain_community.document_loaders import WebBaseLoader
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.vectorstores import Chroma
# Set embeddings
embd = CohereEmbeddings()
# Load Docs to Index
loader = PyMuPDFLoader('/content/cleartax-in-s-income-tax-slabs.pdf')
data = loader.load()
#print(data[10])
# Split
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
chunk_size=512, chunk_overlap=0
)
doc_splits = text_splitter.split_documents(data)
# Add to vectorstore
vectorstore = Chroma.from_documents(persist_directory='/content/vector',
documents=doc_splits,
embedding=embd,
)
vectorstore_retriever = vectorstore.as_retriever()
Install this second set of libraries. Don’t install all libraries together; otherwise, it will throw a dependency error.
!pip install langchain-openai langchainhub chromadb langgraph --quiet
Now, we will build a router to route queries based on whether the query is related to the vector index. It is based on the Adaptive Advance RAG technique, which routes queries to suitable nodes.
### Router
from typing import Literal
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_cohere import ChatCohere
# Data model
class web_search(BaseModel):
"""
The internet. Use web_search for questions that are related to anything else than agents, prompt engineering, and adversarial attacks.
"""
query: str = Field(description="The query to use when searching the internet.")
class vectorstore(BaseModel):
"""
A vectorstore containing documents related to to Income Tax of India New and Old Regime Rules. Use the vectorstore for questions on these topics.
"""
query: str = Field(description="The query to use when searching the vectorstore.")
# Preamble
preamble = """You are an expert at routing a user question to a vectorstore or web search.
The vectorstore contains documents related to Income Tax of India New and Old Regime Rules.
Use the vectorstore for questions on these topics. Otherwise, use web-search."""
# LLM with tool use and preamble
llm = ChatCohere(model="command-r", temperature=0)
structured_llm_router = llm.bind_tools(
tools=[web_search, vectorstore], preamble=preamble
)
# Prompt
route_prompt = ChatPromptTemplate.from_messages(
[
("human", "{question}"),
]
)
question_router = route_prompt | structured_llm_router
response = question_router.invoke(
{"question": "When will the results of General Elections 2024 of India be declared?"}
)
print(response.response_metadata["tool_calls"])
response = question_router.invoke({"question": "What are the income tax slabs in New Tax Regime?"})
print(response.response_metadata["tool_calls"])
response = question_router.invoke({"question": "Hi how are you?"})
print("tool_calls" in response.response_metadata)
Outputs
We can see output prints of the tool to which the query is routed, such as “web search” or “vector store,” and their corresponding response. When we ask questions about the general election, it does a web search. When we ask a query related to Tax Regime (our pdf), it directs us to the vector store.
[{'id': '1c86d1f8baa14f3484d1b99c9a53ab3a', 'function': {'name': 'web_search', 'arguments': '{"query": "General Elections 2024 of India results declaration date"}'}, 'type': 'function'}]
[{'id': 'c1356c914562418b943d50d61c2590ea', 'function': {'name': 'vectorstore', 'arguments': '{"query": "income tax slabs in New Tax Regime"}'}, 'type': 'function'}]
False
Now, we will build a retrieval binary grader that will grade whether the retrieved documents are relevant to the query or not.
### Retrieval Grader
# Data model
class GradeDocuments(BaseModel):
"""Binary score for relevance check on retrieved documents."""
binary_score: str = Field(
description="Documents are relevant to the question, 'yes' or 'no'"
)
# Prompt
preamble = """You are a grader assessing relevance of a retrieved document to a user question. \n
If the document contains keyword(s) or semantic meaning related to the user question, grade it as relevant. \n
Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question."""
# LLM with function call
llm = ChatCohere(model="command-r", temperature=0)
structured_llm_grader = llm.with_structured_output(GradeDocuments, preamble=preamble)
grade_prompt = ChatPromptTemplate.from_messages(
[
("human", "Retrieved document: \n\n {document} \n\n User question: {question}"),
]
)
retrieval_grader = grade_prompt | structured_llm_grader
question = "Old tax regime slabs"
docs = vectorstore_retriever.invoke(question)
doc_txt = docs[1].page_content
response = retrieval_grader.invoke({"question": question, "document": doc_txt})
print(response)
Output
binary_score='yes'
Now, we will build the Answer generator, which will generate an answer based on information obtained from the vector store or web search.
### Generate
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
import langchain
from langchain_core.messages import HumanMessage
# Preamble
preamble = """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise."""
# LLM
llm = ChatCohere(model_name="command-r", temperature=0).bind(preamble=preamble)
# Prompt
prompt = lambda x: ChatPromptTemplate.from_messages(
[
HumanMessage(
f"Question: {x['question']} \nAnswer: ",
additional_kwargs={"documents": x["documents"]},
)
]
)
# Chain
rag_chain = prompt | llm | StrOutputParser()
# Run
generation = rag_chain.invoke({"documents": docs, "question": question})
print(generation)
Output
Under the old tax regime in India, there were separate slab rates for different categories of taxpayers. Taxpayers with an income of up to 5 lakhs were eligible for a rebate.
If the RAG chain fails, this LLM Chain will be the default chain for fallback scenarios. Note here in the prompt we don’t have the “documents” variable.
### LLM fallback
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
import langchain
from langchain_core.messages import HumanMessage
# Preamble
preamble = """You are an assistant for question-answering tasks. Answer the question based upon your knowledge. Use three sentences maximum and keep the answer concise."""
# LLM
llm = ChatCohere(model_name="command-r", temperature=0).bind(preamble=preamble)
# Prompt
prompt = lambda x: ChatPromptTemplate.from_messages(
[HumanMessage(f"Question: {x['question']} \nAnswer: ")]
)
# Chain
llm_chain = prompt | llm | StrOutputParser()
# Run
question = "Hi how are you?"
generation = llm_chain.invoke({"question": question})
print(generation)
Output
I don't have feelings as an AI chatbot, but I'm here to assist you with any questions or tasks you may have. How can I help you today?
Now, we will build a simple hallucination checker that will give a binary score of “Yes” or “No” based on whether the retrieved context is used to generate a final response free from hallucination and grounded in facts.
### Hallucination Grader
# Data model
class GradeHallucinations(BaseModel):
"""Binary score for hallucination present in generation answer."""
binary_score: str = Field(
description="Answer is grounded in the facts, 'yes' or 'no'"
)
# Preamble
preamble = """You are a grader assessing whether an LLM generation is grounded in / supported by a set of retrieved facts. \n
Give a binary score 'yes' or 'no'. 'Yes' means that the answer is grounded in / supported by the set of facts."""
# LLM with function call
llm = ChatCohere(model="command-r", temperature=0)
structured_llm_grader = llm.with_structured_output(
GradeHallucinations, preamble=preamble
)
# Prompt
hallucination_prompt = ChatPromptTemplate.from_messages(
[
# ("system", system),
("human", "Set of facts: \n\n {documents} \n\n LLM generation: {generation}"),
]
)
hallucination_grader = hallucination_prompt | structured_llm_grader
hallucination_grader.invoke({"documents": docs, "generation": generation})
This will be further checked after the hallucination grader passes on the response to this node. It will check whether the generated answer is relevant to the question.
### Answer Grader
# Data model
class GradeAnswer(BaseModel):
"""Binary score to assess answer addresses question."""
binary_score: str = Field(
description="Answer addresses the question, 'yes' or 'no'"
)
# Preamble
preamble = """You are a grader assessing whether an answer addresses / resolves a question \n
Give a binary score 'yes' or 'no'. Yes' means that the answer resolves the question."""
# LLM with function call
llm = ChatCohere(model="command-r", temperature=0)
structured_llm_grader = llm.with_structured_output(GradeAnswer, preamble=preamble)
# Prompt
answer_prompt = ChatPromptTemplate.from_messages(
[
("human", "User question: \n\n {question} \n\n LLM generation: {generation}"),
]
)
answer_grader = answer_prompt | structured_llm_grader
answer_grader.invoke({"question": question, "generation": generation})
Now, we will build the web search tool using Tavily API.
### Search
from langchain_community.tools.tavily_search import TavilySearchResults
web_search_tool = TavilySearchResults()
We will now capture the workflow of our Agent we define the class for maintaining the state of each decision point.
from typing_extensions import TypedDict
from typing import List
class GraphState(TypedDict):
"""|
Represents the state of our graph.
Attributes:
question: question
generation: LLM generation
documents: list of documents
"""
question: str
generation: str
documents: List[str]
We now define the Nodes of the graph and the edges of the graph.
from langchain.schema import Document
def retrieve(state):
"""
Retrieve documents
Args:
state (dict): The current graph state
Returns:
state (dict): New key added to state, documents, that contains retrieved documents
"""
print("---RETRIEVE---")
question = state["question"]
# Retrieval
documents = vectorstore_retriever.invoke(question)
return {"documents": documents, "question": question}
def llm_fallback(state):
"""
Generate answer using the LLM w/o vectorstore
Args:
state (dict): The current graph state
Returns:
state (dict): New key added to state, generation, that contains LLM generation
"""
print("---LLM Fallback---")
question = state["question"]
generation = llm_chain.invoke({"question": question})
return {"question": question, "generation": generation}
def generate(state):
"""
Generate answer using the vectorstore
Args:
state (dict): The current graph state
Returns:
state (dict): New key added to state, generation, that contains LLM generation
"""
print("---GENERATE---")
question = state["question"]
documents = state["documents"]
if not isinstance(documents, list):
documents = [documents]
# RAG generation
generation = rag_chain.invoke({"documents": documents, "question": question})
return {"documents": documents, "question": question, "generation": generation}
def grade_documents(state):
"""
Determines whether the retrieved documents are relevant to the question.
Args:
state (dict): The current graph state
Returns:
state (dict): Updates documents key with only filtered relevant documents
"""
print("---CHECK DOCUMENT RELEVANCE TO QUESTION---")
question = state["question"]
documents = state["documents"]
# Score each doc
filtered_docs = []
for d in documents:
score = retrieval_grader.invoke(
{"question": question, "document": d.page_content}
)
grade = score.binary_score
if grade == "yes":
print("---GRADE: DOCUMENT RELEVANT---")
filtered_docs.append(d)
else:
print("---GRADE: DOCUMENT NOT RELEVANT---")
continue
return {"documents": filtered_docs, "question": question}
def web_search(state):
"""
Web search based on the re-phrased question.
Args:
state (dict): The current graph state
Returns:
state (dict): Updates documents key with appended web results
"""
print("---WEB SEARCH---")
question = state["question"]
# Web search
docs = web_search_tool.invoke({"query": question})
web_results = "\n".join([d["content"] for d in docs])
web_results = Document(page_content=web_results)
return {"documents": web_results, "question": question}
### Edges ###
def route_question(state):
"""
Route question to web search or RAG.
Args:
state (dict): The current graph state
Returns:
str: Next node to call
"""
print("---ROUTE QUESTION---")
question = state["question"]
source = question_router.invoke({"question": question})
# Fallback to LLM or raise error if no decision
if "tool_calls" not in source.additional_kwargs:
print("---ROUTE QUESTION TO LLM---")
return "llm_fallback"
if len(source.additional_kwargs["tool_calls"]) == 0:
raise "Router could not decide source"
# Choose datasource
datasource = source.additional_kwargs["tool_calls"][0]["function"]["name"]
if datasource == "web_search":
print("---ROUTE QUESTION TO WEB SEARCH---")
return "web_search"
elif datasource == "vectorstore":
print("---ROUTE QUESTION TO RAG---")
return "vectorstore"
else:
print("---ROUTE QUESTION TO LLM---")
return "vectorstore"
def decide_to_generate(state):
"""
Determines whether to generate an answer, or re-generate a question.
Args:
state (dict): The current graph state
Returns:
str: Binary decision for next node to call
"""
print("---ASSESS GRADED DOCUMENTS---")
question = state["question"]
filtered_documents = state["documents"]
if not filtered_documents:
# All documents have been filtered check_relevance
# We will re-generate a new query
print("---DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION, WEB SEARCH---")
return "web_search"
else:
# We have relevant documents, so generate answer
print("---DECISION: GENERATE---")
return "generate"
def grade_generation_v_documents_and_question(state):
"""
Determines whether the generation is grounded in the document and answers question.
Args:
state (dict): The current graph state
Returns:
str: Decision for next node to call
"""
print("---CHECK HALLUCINATIONS---")
question = state["question"]
documents = state["documents"]
generation = state["generation"]
score = hallucination_grader.invoke(
{"documents": documents, "generation": generation}
)
grade = score.binary_score
# Check hallucination
if grade == "yes":
print("---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---")
# Check question-answering
print("---GRADE GENERATION vs QUESTION---")
score = answer_grader.invoke({"question": question, "generation": generation})
grade = score.binary_score
if grade == "yes":
print("---DECISION: GENERATION ADDRESSES QUESTION---")
return "useful"
else:
print("---DECISION: GENERATION DOES NOT ADDRESS QUESTION---")
return "not useful"
else:
print("---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY---")
return "not supported"
Add the nodes in the workflow and conditional edges. First, add all the nodes, then add the edges and define edges with conditions.
import pprint
from langgraph.graph import END, StateGraph
workflow = StateGraph(GraphState)
# Define the nodes
workflow.add_node("web_search", web_search) # web search
workflow.add_node("retrieve", retrieve) # retrieve
workflow.add_node("grade_documents", grade_documents) # grade documents
workflow.add_node("generate", generate) # rag
workflow.add_node("llm_fallback", llm_fallback) # llm
# Build graph
workflow.set_conditional_entry_point(
route_question,
{
"web_search": "web_search",
"vectorstore": "retrieve",
"llm_fallback": "llm_fallback",
},
)
workflow.add_edge("web_search", "generate")
workflow.add_edge("retrieve", "grade_documents")
workflow.add_conditional_edges(
"grade_documents",
decide_to_generate,
{
"web_search": "web_search",
"generate": "generate",
},
)
workflow.add_conditional_edges(
"generate",
grade_generation_v_documents_and_question,
{
"not supported": "generate", # Hallucinations: re-generate
"not useful": "web_search", # Fails to answer question: fall-back to web-search
"useful": END,
},
)
workflow.add_edge("llm_fallback", END)
# Compile
app = workflow.compile()
We will now install additional libraries to visualize the workflow graph.
!apt-get install python3-dev graphviz libgraphviz-dev pkg-config
!pip install pygraphviz
The dashed edges are conditional edges, whereas solid edges are non-conditional direct edges.
from IPython.display import Image
Image(app.get_graph().draw_png())
We now execute our workflow to check if it gives the desired output based on the defined workflow.
Example 1 – Web Search Query
# Execute
inputs = {
"question": "Give the dates of different phases of general election 2024 in India?"
}
for output in app.stream(inputs):
for key, value in output.items():
# Node
pprint.pprint(f"Node '{key}':")
# Optional: print full state at each node
pprint.pprint("\n---\n")
# Final generation
pprint.pprint(value["generation"])
Output
---ROUTE QUESTION---
---ROUTE QUESTION TO RAG---
---RETRIEVE---
"Node 'retrieve':"
'\n---\n'
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE: DOCUMENT NOT RELEVANT---
---ASSESS GRADED DOCUMENTS---
---DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION, WEB SEARCH---
"Node 'grade_documents':"
'\n---\n'
---WEB SEARCH---
"Node 'web_search':"
'\n---\n'
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION ADDRESSES QUESTION---
"Node 'generate':"
'\n---\n'
('The 2024 Indian general election will take place in seven phases, with '
'voting scheduled for: April 19, April 26, May 7, May 13, May 20, May 25, and '
'June 1.')
Example 2 – Vector search query relevant
# Run
inputs = {"question": "What are the slabs of new tax regime?"}
for output in app.stream(inputs):
for key, value in output.items():
# Node
pprint.pprint(f"Node '{key}':")
# Optional: print full state at each node
# pprint.pprint(value["keys"], indent=2, width=80, depth=None)
pprint.pprint("\n---\n")
# Final generation
pprint.pprint(value["generation"])
Output
---ROUTE QUESTION---
---ROUTE QUESTION TO RAG---
---RETRIEVE---
"Node 'retrieve':"
'\n---\n'
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT RELEVANT---
---ASSESS GRADED DOCUMENTS---
---DECISION: GENERATE---
"Node 'grade_documents':"
'\n---\n'
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION ADDRESSES QUESTION---
"Node 'generate':"
'\n---\n'
('Here are the slabs of the new tax regime for the given years:\n'
'\n'
'## FY 2022-23 (AY 2023-24)\n'
'- Up to Rs 2,50,000: Nil\n'
'- Rs 2,50,001 to Rs 5,00,000: 5%\n'
'- Rs 5,00,001 to Rs 7,50,000: 10%\n'
'- Rs 7,50,001 to Rs 10,00,000: 15%\n'
'- Rs 10,00,001 to Rs 12,50,000: 20%\n'
'- Rs 12,50,001 to Rs 15,00,000: 25%\n'
'- Rs 15,00,001 and above: 30%\n'
'\n'
'## FY 2023-24 (AY 2024-25)\n'
'- Up to Rs 3,00,000: Nil\n'
'- Rs 3,00,000 to Rs 6,00,000: 5% on income above Rs 3,00,000\n'
'- Rs 6,00,000 to Rs 900,000: Rs. 15,000 + 10% on income above Rs 6,00,000\n'
'- Rs 9,00,000 to Rs 12,00,000: Rs. 45,000 + 15% on income above Rs 9,00,000\n'
'- Rs 12,00,000 to Rs 1500,000: Rs. 90,000 + 20% on income above Rs '
'12,00,000\n'
'- Above Rs 15,00,000: Rs. 150,000 + 30% on income above Rs 15,00,000')
LangGraph is a versatile tool for developing complex, stateful applications employing LLMs. By understanding its essential ideas and working through basic examples, beginners can use its possibilities for their projects. Concentrating on maintaining states, handling conditional edges, and ensuring that the graph has no dead-end nodes is critical.
In my perspective, it is more advantageous than ReAct agents since we can establish total control of the workflow rather than having the agent make the decisions.
The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.
A. Yes, Cohere currently allows free rate limited API calls for research
and prototyping here.
A. It is more optimized for searches with RAG and LLMs as compared to
other conventional search APIs.
A. LangGraph offers compatibility with existing LangChain agents, allowing developers to modify AgentExecutor internals more easily. The state of the graph includes familiar concepts like input, chat_history, intermediate_steps, and agent_outcome.1
A. We can further enhance this Adaptive RAG strategy by integrating Self –
Reflection in RAG, which iteratively fetches documents with self-reasoning and
refines the answer iteratively.
A. Cohere offers many different Models; the initial versions were – Command and Command R . Command R Plus is the latest multilingual model with a larger 128k context window. Apart from these LLM models, it also has an embedding model – Embed, and another ranking sorting model Rerank.