Advanced RAG Technique : Langchain ReAct and Cohere

Ritika Last Updated : 06 May, 2024

9 min read

Introduction

This article explores Adaptive Question-Answering (QA) frameworks, specifically the Adaptive RAG strategy. It discusses how this framework dynamically selects the most suitable method for large language models (LLMs) based on query complexity. It highlights the learning objectives, features, and implementation of Adaptive RAG, its efficiency, and its integration with Langchain and Cohere LLM. The article also discusses the ReAct Agent’s role in classifying queries and directing them to appropriate tools. It concludes that Adaptive RAG can revolutionize QA systems.

An Adaptive Question-Answering(QA) framework system designed to select the best method for (retrieval-augmented) large language models (LLMs), ranging from basic to sophisticated, based on query complexity. This QA framework strategy was introduced as Adaptive RAG in this paper.

Learning Objectives

Understand the concept and implementation of an Adaptive Question-Answering (QA) framework.
Learn to deploy Langchain and Cohere LLM for dynamic response selection based on query complexity.
Explore various applications of Adaptive RAG in real-world scenarios.
Gain insights into the features and benefits of Adaptive RAG for enhancing QA system efficiency.
Implement a simple Adaptive RAG architecture using Langchain Agent and Cohere LLM.
Familiarize yourself with the ReAct prompting strategy for improved decision-making in LLMs.

This article was published as a part of the Data Science Blogathon.

Introduction
What is Adaptive RAG?
- Features of Adaptive RAG
What is ReAct?
Implementing Simple Adaptive RAG using Langchain Agent and Cohere LLM
Conclusion
Frequently Asked Questions

What is Adaptive RAG?

Adaptive-RAG presents a dynamic QA framework that may change its response method dependent on the query complexity. Adaptive-RAG selects the most appropriate strategy, whether it is iterative and single-step retrieval-augmented procedures or completely bypassing retrieval.

As a result, this paper proposes an adaptive QA framework aimed to select the best appropriate technique for (retrieval-augmented) large language models, ranging from simple to sophisticated, based on query complexity. This is done through the use of a classifier, which is a smaller LM trained to predict query complexity levels based on automatically acquired labels from real model predictions and underlying dataset patterns. This methodology enables a flexible strategy that easily transitions between iterative and single-step retrieval-augmented LLMs, as well as non-retrieval approaches, to address a wide range of queries.

In above diagram we can observe a conceptual comparison on different retrieval -augmented LLM approaches to question answering. The single-step approach may not be sufficient for complex queries which require multi-step reasoning. Similarly multi-step approach which iteratively retrieves documents and generates intermediate answers may not be accurate for simple queries. Adaptive approach can select the most suitable strategy based on query complexity determined by the classifier.

Features of Adaptive RAG

Enhances overall efficiency and accuracy of Question and Answering systems.
Utilizes a classifier trained to predict query complexity.
Achieves a balance between sophisticated and simpler strategies.

What is ReAct?

In this implementation we use the simple architecture depicted in the flowchart. The ReAct Agent of LangChain will act as a classifier in context of Adaptive RAG here. It will analyse the query and determine the query type so as to route to correct tool or option.

ReAct Framework Prompting

ReAct (Reasoning + Acting) is a prompting strategy created by Princeton University academics in partnership with Google researchers. It intends to enable LLMs to simulate human-like activities in the actual world, where humans reason vocally and execute actions to get knowledge. It enables LLMs to interface with external tools, hence improving decision-making processes. LLMs may use React to interpret and create text, make educated judgements, and take action based on what they understand.

How ReActworks ?

ReAct combines reasoning and acting to solve complex language reasoning and decision-making tasks.

While Chain-of-thought (CoT) prompting works with reasoning steps only which relies heavily on internal knowledge of LLM which makes it prone to fact hallucination. ReAct addresses this by allowing LLMs to generate verbal reasoning traces and actions for a task.

This interaction is achieved through text actions that the model can use to ask questions or perform tasks to gain more information and better understand a situation. For instance, when faced with a multi-hop reasoning question, ReAct might initiate multiple search actions, each potentially being a call to an external tool.

The results of these actions are then used to generate a final answer.

By forcing the LLM to alternate between thinking and acting, ReAct converts it into an active agent in its surroundings, capable of completing tasks in a human-like fashion.

When to Use ReAct Prompting?

ReAct is ideal for scenarios where LLM has to rely on external tools and agent and have to interact with them to fetch information for various reasoning steps.

One of ReAct’s key features is the ability to combine LLMs with other tools for real-world applications. For example, Microsoft has implemented OpenAI LLMs into its Office apps in Microsoft 365 Copilot, demonstrating their value.
In scenarios like QA systems, where LLMs might not always provide correct answers, the interaction with an external search engine becomes crucial, and ReAct proves invaluable.

Important Components Used

Let us now look important component used:

LLM Model

Cohere’s Command R is a scalable generative model targeting RAG and Tool Use to enable production-scale AI for enterprise.

Strong accuracy on RAG and Tool Use.
Low latency, and high throughput.
Strong capabilities across 10 key languages.
Longer 128k context and lower pricing.
Model weights available on Hugging Face for research and evaluation.

Vector DB

We require a vector store for RAG. In our implementation we have used Chroma DB which is a popular open-source vector store for storing and indexing embeddings . It is available as a LangChain integration.

Web Search API

In the web search tool we will require an internet search API instead of using the conventional Duck Duck Go search API we will use a specialized search API Tavily AI . It is a search engine optimized for LLMs and RAG, aimed at efficient, quick and persistent search results.

Orchestration Framework

Orchestration tools in the context of LLM applications are software frameworks designed to streamline and manage complex processes involving multiple components and interactions with LLMs. As we all know for building LLM chatbots and applications we require a framework to handle the glue code and allow us to focus on higher level logic. Lang Chain is the most popular framework and we will use it to build the ReAct Agent which will be our classifier for questions.

Implementing Simple Adaptive RAG using Langchain Agent and Cohere LLM

Let us now implement simple adaptive RAG using Langchain Agent and cohere LLM:

Step1 – Generate Cohere API Key

We need to generate the free API key for using Cohere LLM. Visit website and log in using Google account or github account. Once logged in you will land at a cohere dashboard page as shown below. Click on API Keys option . You will see a Trial Free API key is generated.

Step2 – Generate Tavily Search API Key

Visit the sign in page of site here log in using Google Account or Github Account .

Once you sign in using any account you will land at home page of your account which will show a default free plan with API key is generated similar to the screen below.

Step3 – Install Libraries

Now once the API keys are generated then we need to install the required libraries as below. One can use colab notebooks for development.

! pip install --quiet langchain langchain_cohere tiktoken chromadb pymupdf

Step4 – Set API Keys

Set the API Keys as environment variables:

### Set API Keys
import os

os.environ["COHERE_API_KEY"] = "Cohere API Key"
os.environ["TAVILY_API_KEY"] = "Tavily API Key"

Step5 – Create the Web search Tool

Now we will create the Websearch tool using the object instance of Lang Chain integration of Tavily Search “TavilySearchResults” :

from langchain_community.tools.tavily_search import TavilySearchResults

internet_search = TavilySearchResults()
internet_search.name = "internet_search"
internet_search.description = "Returns a list of relevant document snippets for a textual query retrieved from the internet."

from langchain_core.pydantic_v1 import BaseModel, Field

class TavilySearchInput(BaseModel)
    query: str = Field(description="Query to search the internet with")


internet_search.args_schema = TavilySearchInput

Step6 – Create the RAG Tool

Now we will create the RAG Tool on top of any document. In our case we used an uploaded pdf.

We use the Cohere Embeddings for embedding the Pdf and PyMuPdf to read the pdf text in Documents object. We also use Recursive Text Splitter to split the documents into chunks.

Then Using Chroma DB we store the document embeddings and index it and persist it in a directory.

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_cohere import CohereEmbeddings
#from langchain_community.document_loaders import WebBaseLoader
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.vectorstores import Chroma

# Set embeddings
embd = CohereEmbeddings()

# Load Docs to Index
loader = PyMuPDFLoader('/content/cleartax-in-s-income-tax-slabs.pdf') #PDF Path
data = loader.load()

#print(data[10])

# Split
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=512, chunk_overlap=0
)
doc_splits = text_splitter.split_documents(data)

# Add to vectorstore
vectorstore = Chroma.from_documents(persist_directory='/content/vector',
    documents=doc_splits,
    embedding=embd,
)

vectorstore_retriever = vectorstore.as_retriever()

Step7 – Build Retriever Tool

Now we use the vector retriever created above to build a retriever tool which will be used by the Classifier (ReAct Agent) to direct the appropriate queries to RAG.

from langchain.tools.retriever import create_retriever_tool

vectorstore_search = create_retriever_tool(
    retriever=vectorstore_retriever,
    name="vectorstore_search",
    description="Retrieve relevant info from a vectorstore that contains documents related to Income Tax of India New and Old Regime Rules",
)

Step8 – ReAct Agent Tool

The agent ReAct is based on the Reasoning + Action framework for LLM which generates response at every step through reasoning at each step and taking appropriate actions based on the reasoning.

from langchain.agents import AgentExecutor
from langchain_cohere.react_multi_hop.agent import create_cohere_react_agent
from langchain_core.prompts import ChatPromptTemplate

# LLM
from langchain_cohere.chat_models import ChatCohere

chat = ChatCohere(model="command-r-plus", temperature=0.3)

# Preamble
preamble = """
You are an expert who answers the user's question with the most relevant datasource.
You are equipped with an internet search tool and a special vectorstore of information about Income Tax Rules and Regulations of India.
If the query covers the topics of Income tax old and new regime India Rules and regulations then use the vectorstore search.
"""

# Prompt
prompt = ChatPromptTemplate.from_template("{input}")

# Create the ReAct agent
agent = create_cohere_react_agent(
    llm=chat,
    tools=[internet_search, vectorstore_search],
    prompt=prompt,
)

Step9 – Create Agent Executor

Now we have all the components required so we create an executor wrapper using which we can call the ReAct Agent. We pass the Agent in agent parameter and also the list of tools in tools parameter.

# Agent Executor

agent_executor = AgentExecutor(
    agent=agent, tools=[internet_search, vectorstore_search], verbose=True
)

Step10 – Testing the Agent Tool

Now let us test the ReAct Agent by asking different queries.

Asking Query on Current Affairs

output = agent_executor.invoke(
    {
        "input": "What is the general election schedule of India 2024?",
        "preamble": preamble,
    }
)

print(output)

print(output['output'])

Output:

The 2024 Indian general election will be held between April 19 and June 1,across 
seven phases. The counting of votes will take place on June 4, 2024.

Chain of Thought for Agent Executor while doing internet search

Query related to Document

output = agent_executor.invoke(
    {
        "input": "How much deduction is required for a salary of 13lakh so that Old regime is better tahn New regime Threshold?",
        "preamble": preamble,
    }
)

print(output)

print(output['output'])

Output:

The old regime is better for people who have a financial plan for wealth creation by making investments in tax-saving instruments; medical claims and life insurance; making payments of children’s tuition fees; payment of EMIs on education loan; buying a house with a home loan; and so on. The old regime helps with higher tax deductions and lower tax outgo.

The new regime is better for people who make low investments. As the new regime offers six lower-income tax slabs, anyone paying taxes without claiming tax deductions can benefit from paying a lower rate of tax under the new tax regime.

For a salary of 13 lakhs, the old regime will be better if the total deductions are more than 3.75 lakhs.

Directly Answer Queries

Now we will ask a query related to neither internet nor RAG .

output = agent_executor.invoke(
    {
        "input": "What is your name?",
        "preamble": preamble,
    }
)

print(output)

print(output['output'])

Output:

I am an AI assistant trained to answer your queries about the Income Tax Rules 
and Regulations of India. I do not have a name.

Conclusion

Adaptive RAG is a dynamic QA framework that uses a classifier to predict query complexity levels and transitions between iterative and single-step retrieval strategies. It enhances efficiency and accuracy in QA systems. Implemented with Langchain Agent and Cohere LLM, it offers improved decision-making and versatile interaction with external tools. As language models and QA systems evolve, Adaptive RAG is a valuable strategy for managing information retrieval and response selection.

Frequently Asked Questions

Q1. Is there Cohere API free to use ?

A. Yes Cohere currently allows free rate limited API calls for research and prototyping here

Q2. What are advantages of Tavily Search API ?

A. It is more optimized for searches with RAG and LLMs as compared to other conventional search APIs.

Q3. What are the limitations of Adaptive RAG ?

A. Although Adaptive RAG is a novel Question and Answering Strategy but it has its limitations one such being the dependency on a good classifier generally a smaller LLM to help dynamically route queries to appropriate tool.

Q4. What are the further scopes of improvement in this strategy?

A. We can further enhance this Adaptive RAG strategy by integrating Self – Reflection in RAG which iteratively fetches documents with self reasoning and refines the answer iteratively.

Q5. What are the other LLM models offered by Cohere?

A. Cohere offers many different versions of Models initial versions were – Command, Command R. Command R plus is the latest model offered by it which is multilingual with larger 128k context window. Apart from these LLM models it also has embedding model – Embed and another ranking sorting model Rerank.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Ritika

I am a professional working as data scientist after finishing my MBA in Business Analytics and Finance. A keen learner who loves to explore and understand and simplify stuff! I am currently learning about advanced ML and NLP techniques and reading up on various topics related to it including research papers .

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Advanced RAG Technique : Langchain ReAct and Cohere

Introduction

Learning Objectives

Table of contents

What is Adaptive RAG?

Features of Adaptive RAG

What is ReAct?

ReAct Framework Prompting

How ReActworks ?

When to Use ReAct Prompting?

Important Components Used

LLM Model

Vector DB

Web Search API

Orchestration Framework

Implementing Simple Adaptive RAG using Langchain Agent and Cohere LLM

Step1 – Generate Cohere API Key

Step2 – Generate Tavily Search API Key

Step3 – Install Libraries

Step4 – Set API Keys

Step5 – Create the Web search Tool

Step6 – Create the RAG Tool

Step7 – Build Retriever Tool

Step8 – ReAct Agent Tool

Step9 – Create Agent Executor

Step10 – Testing the Agent Tool

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or