This article explores Adaptive Question-Answering (QA) frameworks, specifically the Adaptive RAG strategy. It discusses how this framework dynamically selects the most suitable method for large language models (LLMs) based on query complexity. It highlights the learning objectives, features, and implementation of Adaptive RAG, its efficiency, and its integration with Langchain and Cohere LLM. The article also discusses the ReAct Agent’s role in classifying queries and directing them to appropriate tools. It concludes that Adaptive RAG can revolutionize QA systems.
An Adaptive Question-Answering(QA) framework system designed to select the best method for (retrieval-augmented) large language models (LLMs), ranging from basic to sophisticated, based on query complexity. This QA framework strategy was introduced as Adaptive RAG in this paper.
This article was published as a part of the Data Science Blogathon.
Adaptive-RAG presents a dynamic QA framework that may change its response method dependent on the query complexity. Adaptive-RAG selects the most appropriate strategy, whether it is iterative and single-step retrieval-augmented procedures or completely bypassing retrieval.
As a result, this paper proposes an adaptive QA framework aimed to select the best appropriate technique for (retrieval-augmented) large language models, ranging from simple to sophisticated, based on query complexity. This is done through the use of a classifier, which is a smaller LM trained to predict query complexity levels based on automatically acquired labels from real model predictions and underlying dataset patterns. This methodology enables a flexible strategy that easily transitions between iterative and single-step retrieval-augmented LLMs, as well as non-retrieval approaches, to address a wide range of queries.
In above diagram we can observe a conceptual comparison on different retrieval -augmented LLM approaches to question answering. The single-step approach may not be sufficient for complex queries which require multi-step reasoning. Similarly multi-step approach which iteratively retrieves documents and generates intermediate answers may not be accurate for simple queries. Adaptive approach can select the most suitable strategy based on query complexity determined by the classifier.
In this implementation we use the simple architecture depicted in the flowchart. The ReAct Agent of LangChain will act as a classifier in context of Adaptive RAG here. It will analyse the query and determine the query type so as to route to correct tool or option.
ReAct (Reasoning + Acting) is a prompting strategy created by Princeton University academics in partnership with Google researchers. It intends to enable LLMs to simulate human-like activities in the actual world, where humans reason vocally and execute actions to get knowledge. It enables LLMs to interface with external tools, hence improving decision-making processes. LLMs may use React to interpret and create text, make educated judgements, and take action based on what they understand.
ReAct combines reasoning and acting to solve complex language reasoning and decision-making tasks.
While Chain-of-thought (CoT) prompting works with reasoning steps only which relies heavily on internal knowledge of LLM which makes it prone to fact hallucination. ReAct addresses this by allowing LLMs to generate verbal reasoning traces and actions for a task.
This interaction is achieved through text actions that the model can use to ask questions or perform tasks to gain more information and better understand a situation. For instance, when faced with a multi-hop reasoning question, ReAct might initiate multiple search actions, each potentially being a call to an external tool.
The results of these actions are then used to generate a final answer.
By forcing the LLM to alternate between thinking and acting, ReAct converts it into an active agent in its surroundings, capable of completing tasks in a human-like fashion.
ReAct is ideal for scenarios where LLM has to rely on external tools and agent and have to interact with them to fetch information for various reasoning steps.
Let us now look important component used:
Cohere’s Command R is a scalable generative model targeting RAG and Tool Use to enable production-scale AI for enterprise.
We require a vector store for RAG. In our implementation we have used Chroma DB which is a popular open-source vector store for storing and indexing embeddings . It is available as a LangChain integration.
In the web search tool we will require an internet search API instead of using the conventional Duck Duck Go search API we will use a specialized search API Tavily AI . It is a search engine optimized for LLMs and RAG, aimed at efficient, quick and persistent search results.
Orchestration tools in the context of LLM applications are software frameworks designed to streamline and manage complex processes involving multiple components and interactions with LLMs. As we all know for building LLM chatbots and applications we require a framework to handle the glue code and allow us to focus on higher level logic. Lang Chain is the most popular framework and we will use it to build the ReAct Agent which will be our classifier for questions.
Let us now implement simple adaptive RAG using Langchain Agent and cohere LLM:
We need to generate the free API key for using Cohere LLM. Visit website and log in using Google account or github account. Once logged in you will land at a cohere dashboard page as shown below. Click on API Keys option . You will see a Trial Free API key is generated.
Visit the sign in page of site here log in using Google Account or Github Account .
Once you sign in using any account you will land at home page of your account which will show a default free plan with API key is generated similar to the screen below.
Now once the API keys are generated then we need to install the required libraries as below. One can use colab notebooks for development.
! pip install --quiet langchain langchain_cohere tiktoken chromadb pymupdf
Set the API Keys as environment variables:
### Set API Keys
import os
os.environ["COHERE_API_KEY"] = "Cohere API Key"
os.environ["TAVILY_API_KEY"] = "Tavily API Key"
Now we will create the Websearch tool using the object instance of Lang Chain integration of Tavily Search “TavilySearchResults” :
from langchain_community.tools.tavily_search import TavilySearchResults
internet_search = TavilySearchResults()
internet_search.name = "internet_search"
internet_search.description = "Returns a list of relevant document snippets for a textual query retrieved from the internet."
from langchain_core.pydantic_v1 import BaseModel, Field
class TavilySearchInput(BaseModel)
query: str = Field(description="Query to search the internet with")
internet_search.args_schema = TavilySearchInput
Now we will create the RAG Tool on top of any document. In our case we used an uploaded pdf.
We use the Cohere Embeddings for embedding the Pdf and PyMuPdf to read the pdf text in Documents object. We also use Recursive Text Splitter to split the documents into chunks.
Then Using Chroma DB we store the document embeddings and index it and persist it in a directory.
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_cohere import CohereEmbeddings
#from langchain_community.document_loaders import WebBaseLoader
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.vectorstores import Chroma
# Set embeddings
embd = CohereEmbeddings()
# Load Docs to Index
loader = PyMuPDFLoader('/content/cleartax-in-s-income-tax-slabs.pdf') #PDF Path
data = loader.load()
#print(data[10])
# Split
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
chunk_size=512, chunk_overlap=0
)
doc_splits = text_splitter.split_documents(data)
# Add to vectorstore
vectorstore = Chroma.from_documents(persist_directory='/content/vector',
documents=doc_splits,
embedding=embd,
)
vectorstore_retriever = vectorstore.as_retriever()
Now we use the vector retriever created above to build a retriever tool which will be used by the Classifier (ReAct Agent) to direct the appropriate queries to RAG.
from langchain.tools.retriever import create_retriever_tool
vectorstore_search = create_retriever_tool(
retriever=vectorstore_retriever,
name="vectorstore_search",
description="Retrieve relevant info from a vectorstore that contains documents related to Income Tax of India New and Old Regime Rules",
)
The agent ReAct is based on the Reasoning + Action framework for LLM which generates response at every step through reasoning at each step and taking appropriate actions based on the reasoning.
from langchain.agents import AgentExecutor
from langchain_cohere.react_multi_hop.agent import create_cohere_react_agent
from langchain_core.prompts import ChatPromptTemplate
# LLM
from langchain_cohere.chat_models import ChatCohere
chat = ChatCohere(model="command-r-plus", temperature=0.3)
# Preamble
preamble = """
You are an expert who answers the user's question with the most relevant datasource.
You are equipped with an internet search tool and a special vectorstore of information about Income Tax Rules and Regulations of India.
If the query covers the topics of Income tax old and new regime India Rules and regulations then use the vectorstore search.
"""
# Prompt
prompt = ChatPromptTemplate.from_template("{input}")
# Create the ReAct agent
agent = create_cohere_react_agent(
llm=chat,
tools=[internet_search, vectorstore_search],
prompt=prompt,
)
Now we have all the components required so we create an executor wrapper using which we can call the ReAct Agent. We pass the Agent in agent parameter and also the list of tools in tools parameter.
# Agent Executor
agent_executor = AgentExecutor(
agent=agent, tools=[internet_search, vectorstore_search], verbose=True
)
Now let us test the ReAct Agent by asking different queries.
Asking Query on Current Affairs
output = agent_executor.invoke(
{
"input": "What is the general election schedule of India 2024?",
"preamble": preamble,
}
)
print(output)
print(output['output'])
Output:
The 2024 Indian general election will be held between April 19 and June 1,across
seven phases. The counting of votes will take place on June 4, 2024.
Query related to Document
output = agent_executor.invoke(
{
"input": "How much deduction is required for a salary of 13lakh so that Old regime is better tahn New regime Threshold?",
"preamble": preamble,
}
)
print(output)
print(output['output'])
Output:
The old regime is better for people who have a financial plan for wealth creation by making investments in tax-saving instruments; medical claims and life insurance; making payments of children’s tuition fees; payment of EMIs on education loan; buying a house with a home loan; and so on. The old regime helps with higher tax deductions and lower tax outgo.
The new regime is better for people who make low investments. As the new regime offers six lower-income tax slabs, anyone paying taxes without claiming tax deductions can benefit from paying a lower rate of tax under the new tax regime.
For a salary of 13 lakhs, the old regime will be better if the total deductions are more than 3.75 lakhs.
Directly Answer Queries
Now we will ask a query related to neither internet nor RAG .
output = agent_executor.invoke(
{
"input": "What is your name?",
"preamble": preamble,
}
)
print(output)
print(output['output'])
Output:
I am an AI assistant trained to answer your queries about the Income Tax Rules
and Regulations of India. I do not have a name.
Adaptive RAG is a dynamic QA framework that uses a classifier to predict query complexity levels and transitions between iterative and single-step retrieval strategies. It enhances efficiency and accuracy in QA systems. Implemented with Langchain Agent and Cohere LLM, it offers improved decision-making and versatile interaction with external tools. As language models and QA systems evolve, Adaptive RAG is a valuable strategy for managing information retrieval and response selection.
A. Yes Cohere currently allows free rate limited API calls for research and prototyping here
A. It is more optimized for searches with RAG and LLMs as compared to other conventional search APIs.
A. Although Adaptive RAG is a novel Question and Answering Strategy but it has its limitations one such being the dependency on a good classifier generally a smaller LLM to help dynamically route queries to appropriate tool.
A. We can further enhance this Adaptive RAG strategy by integrating Self – Reflection in RAG which iteratively fetches documents with self reasoning and refines the answer iteratively.
A. Cohere offers many different versions of Models initial versions were – Command, Command R. Command R plus is the latest model offered by it which is multilingual with larger 128k context window. Apart from these LLM models it also has embedding model – Embed and another ranking sorting model Rerank.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.