Building an Agentic RAG Application using LangChain, Tavily & GPT-4

Sukanya Bag Last Updated : 11 Dec, 2024

9 min read

In our fast-paced digital age, accessing the most current information quickly is more important than ever. Traditional sources often fall short, either because they’re outdated or simply unavailable when we need them the most. This is where the concept of a real-time, web-enhanced Agentic RAG application steps in, offering a groundbreaking solution. By harnessing the capabilities of LangChain and LLMs for language understanding and Tavily for real-time web data integration, developers can create applications that go beyond the limitations of static databases.

This innovative approach enables the application to scour the web for the latest information, ensuring that users receive the most relevant and current answers to their queries. It’s an intelligent assistant that doesn’t just respond with pre-loaded information but actively seeks out and incorporates new data in real time. This article aims to guide you through the development of such an application, addressing potential challenges like maintaining accuracy and ensuring speedy responses. The goal is to democratize access to information, making it as up-to-date and accessible as possible, thereby removing the barriers to the wealth of knowledge available on the internet. Join us in exploring how to build an AI-powered, web-enhanced Agentic RAG Application that puts the world’s information at your fingertips.

Learning Objectives

Gain a comprehensive understanding of building a state-of-the-art, real-time Agentic Retrieval-Augmented Generation (RAG) application.
Learn to integrate advanced technologies seamlessly into your application.

This article was published as a part of the Data Science Blogathon.

What is Agentic RAG and How Does it Work?
Skills Necessary to Implement
Implementation of Agentic RAG Application
Key Takeaways
Conclusion
Frequently Asked Questions

What is Agentic RAG and How Does it Work?

Agentic Retrieval-Augmented Generation (RAG) is an advanced framework that coordinates multiple tools to tackle complex tasks by integrating information retrieval with language generation. This system enhances traditional RAG by utilizing specialized tools, each focused on distinct subtasks, to produce more accurate and contextually relevant outputs. The process begins by decomposing a complex problem into smaller, manageable subtasks, with each tool handling a specific aspect of the task. These tools interact through a shared memory or message-passing mechanism, allowing them to build upon each other’s outputs and refine the overall response.

Certain tools are equipped with retrieval capabilities, enabling access to external data sources such as databases or the Internet. This ensures that the generated content is grounded in accurate and up-to-date information. After processing their respective tasks, the tools combine their findings to generate a coherent and comprehensive final output that addresses the original query or task.

This approach offers several benefits, including specialization, where each tool excels in a particular area, leading to more precise handling of complex tasks; scalability, due to the modular nature of the system, allowing for easy adaptation to various applications and domains; and reduced hallucination, as incorporating multiple tools with retrieval capabilities enables cross-verification of information, minimizing the risk of generating incorrect or fabricated content. In our current application, we will use the Tavily web search and vector store retrieval tools to create an advanced RAG pipeline.

Skills Necessary to Implement

The knowledge and skills necessary to implement this solution effectively.

Tavily Search API: From their docs, Tavily Search API is a search engine optimized for LLMs, aimed at efficient, quick and persistent search results.’ Using the Tavily API, we can easily integrate real-time web search into LLM-based applications. LangChain has integration of Tavily API for real-time web search, which searches for relevant information from the web based on the user’s query. Tavily API has the capability to fetch relevant information from multiple sources, which includes the URL, relevant images and the content, all in a structured JSON format. This fetched information is then used as a context for LLM to answer the user’s query. We will be building an agent utilizing the Tavily integration of LangChain that will be used by the pipeline to answer queries when the LLM is not able to answer from the provided document.
OpenAI GPT-4 Turbo: We will be using OpenAI’s GPT 4 Turbo model for this experiment, but you can use any model for this pipeline, including local models. Please don’t use OpenAI’s GPT 4o model, as it is seen to not perform well on agentic applications.
Apple’s 2023 10-K Document: We will be using the 10K Annual reports document for this experiment, but you can use any document of choice.
Deeplake Vector Store: For this application, we will be using the Deeplake vector store. It’s fast and lightweight, which helps maintain the latency of the application.
Simple SQL Chat Memory (optional): Additionally, we will implement a basic SQL-based chat memory system for maintaining context and continuity across interactive chat sessions. This is optional but keeping it in the application enhances user experience.

Implementation of Agentic RAG Application

Now we’ll walk through the creation of this simple, yet powerful Retrieval-Augmented Generation (RAG) system designed to answer user queries with high accuracy and relevance. The code outlined below demonstrates how to integrate these components into a cohesive application capable of retrieving information both from a specific document and the vast resources of the web. Let’s dive into the specifics of the implementation, examining how each part of the code contributes to the overall functionality of the system.

Creating Environment

First, create an environment using the below-mentioned packages-

#install dependencies
deeplake==3.9.27
ipykernel==6.29.5
ipython==8.29.0
jupyter_client==8.6.3
jupyter_core==5.7.2
langchain==0.3.7
langchain-community==0.3.5
langchain-core==0.3.15
langchain-experimental==0.3.3
langchain-openai
langchain-text-splitters==0.3.2
numpy==1.26.4
openai==1.54.4
pandas==2.2.3
pillow==10.4.0
PyMuPDF==1.24.13
tavily-python==0.5.0
tiktoken==0.8.0

Environment Setup and Initial Configurations: First, the necessary libraries are imported. This sets up the foundation for the application to interact with various services and functionalities.

Initial Configurations

import os

from langchain_core.prompts import (
  ChatPromptTemplate,
)
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain.tools.retriever import create_retriever_tool
from langchain.tools.tavily_search import TavilySearchResults
from langchain.vectorstores.deeplake import DeepLake
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders.pdf import PyMuPDFLoader
from langchain.memory import ConversationBufferWindowMemory, SQLChatMessageHistory

Tavily Search Tool Configuration

Tavily Search Tool Configuration: Next, the TavilySearchResults tool is configured for web searches, setting parameters like max_results and search_depth. First, we export the Tavily API key as os environment. You can get your own Tavily API key from here, by generating a new one.

os.environ["TAVILY_API_KEY"] = "tavily_api_key"
search_tool = TavilySearchResults(
  name="tavily_search_engine",
  description="A search engine optimized for retrieving information from web based on user query.",
  max_results=5,
  search_depth="advanced",
  include_answer=True,
  include_raw_content=True,
  include_images=True,
  verbose=False,
)

Chat OpenAI Configuration

Chat OpenAI Configuration: The Chat OpenAI model is configured with the GPT-4 model details and API keys. Note that any Chat model can be used for this use-case. Check the LangChain documentation for the model you wish to use and modify the code accordingly.

llm = ChatOpenAI( 
  model="gpt-4",
  temperature=0.3,
  api_key="openai_api_key",
)

Defining the Prompt Template

Defining the Prompt Template: A ChatPromptTemplate is defined to guide the chatbot’s interaction with users, emphasizing the use of document context and web search.

prompt = ChatPromptTemplate([
  ("system",
  f"""You are a helpful chatbot. You need to answer the user's queries in 
  detail from the document context. You have access to two tools: 
  deeplake_vectorstore_retriever and tavily_search_engine. 
  Always use the deeplake_vectorstore_retriever tool first to retrieve 
  the context and answer the question. If the context does not contain relevant 
  answer for the user's query, use the tavily_search_engine to fetch web search 
  results to answer them. NEVER give an incomplete answer. Always try your best 
  to find answer through web search if answer is not found from context."""),
  ("human", "{user_input}"),
  ("placeholder", "{messages}"),
  ("placeholder", "{agent_scratchpad}"),
])

Document Pre-processing and Ingestion

Document Pre-processing and Ingestion: The document is loaded, split into chunks, and these chunks are processed to create embeddings, which are then stored in a DeepLake vector store.

data = "Apple 10k 2023.pdf" 
loader = PyMuPDFLoader(file_path=data) 
text_splitter = CharacterTextSplitter(separator="\n", chunk_size=1000, chunk_overlap=200) 
docs = loader.load_and_split(text_splitter=text_splitter) 
embeddings = OpenAIEmbeddings(api_key="openai_api_key",) 
vectorstore = DeepLake(dataset_path="dataset", embedding=embeddings,) 
_ = vectorstore.add_documents(documents=docs)

Creating the Retrieval tool

Creating the Retrieval tool: A retrieval tool is created from the vector store to facilitate document search.

retriever_tool = create_retriever_tool(
	retriever=vectorstore.as_retriever(
	search_type="similarity",
	search_kwargs={
		"k": 6,
		"fetch_k": 12,
	},
),
name="deeplake_vectorstore_retriever",
  description="Searches and returns contents from uploaded Apple SEC document based on user query.",
)

tools = [
  retriever_tool,
  search_tool,
]

Implementing a simple chat history/memory (Optional): An SQL-based chat memory is configured for maintaining the context. A SQLite database will be created and the chat conversations will be stored in that.

history = SQLChatMessageHistory(
  session_id="ghdcfhdxgfx",
  connection_string="sqlite:///chat_history.db",
  table_name="message_store",
  session_id_field_name="session_id",
)

memory = ConversationBufferWindowMemory(chat_memory=history)

Initializing the Agent and Agent Executor

Initializing the Agent and Agent Executor: An agent is created and executed to process user queries and generate responses based on the configured tools and memory.

agent = create_openai_tools_agent(llm, tools, prompt)

agent_executor = AgentExecutor(
  tools=tools,
  agent=agent,
  verbose=False,
  max_iterations=8,
  early_stopping_method="force",
  memory=memory,
)

Testing the pipeline on a few samples: With this, we have our RAG with Real-time search pipeline is ready. Let’s test the application with some queries.

result = agent_executor.invoke({
  "user_input": "What is the fiscal year highlights of Apple inc. for 2nd quarter of 2024?",
},)
print(result["output"])

Output

The document does not contain specific information about the fiscal year
 highlights of Apple Inc. for the 2nd quarter of 2024. The latest information
 includes:

Total Net Sales and Net Income: Apple's total net sales were $383.3 billion,
 and net income was $97.6 billion during 2023.

Sales Performance: There was a 3% decrease in total net sales compared to
 2022, amounting to an $11.0 billion reduction. This decrease was attributed
 to currency exchange headwinds and weaker consumer demand.

Product Announcements: Significant product announcements during fiscal year
 2023 included:

First Quarter: iPad and iPad Pro, Next-generation Apple TV 4K, and MLS Season
 Pass (a Major League Soccer subscription streaming service).
Second Quarter: MacBook Pro 14", MacBook Pro 16", and Mac mini; Second-
generation HomePod.
Third Quarter: MacBook Air 15", Mac Studio and Mac Pro; Apple Vision Pro™
 (the company's first spatial computer featuring its new visionOS).
Fourth Quarter: iPhone 15 series (iPhone 15, iPhone 15 Plus, iPhone 15 Pro,
 and iPhone 15 Pro Max); Apple Watch Series 9 and Apple Watch Ultra 2.
Share Repurchase Program: In May 2024, Apple announced a new share repurchase
 program of up to $90 billion and raised its quarterly dividend.

For the specific details regarding the 2nd quarter of 2024, it would be
 necessary to consult Apple's official publications or financial reports.

The query that we asked was related to 2024, for which the model replied that the context has no information for 2024 highlights and gives the details for 2023, which it gets from the vector store. Next, let’s try asking the same question but for Nvidia.

result = agent_executor.invoke({
  "user_input": "What is the fiscal year highlights of Nvidia inc. for 2nd quarter of 2024?",
},)
print(result["output"])

Output

The text from the image is:

For the 2nd quarter of the fiscal year 2024, Nvidia Inc. reported exceptional
 growth, particularly driven by its Data Center segment. Here are the
 highlights:

Total Revenues: Nvidia's total revenue saw a significant increase, up 101%
 year-over-year (YOY) to $13.51 billion, which was above the outlook.

Data Center Revenue: The Data Center segment experienced a remarkable growth
 of 171% YOY, reaching $10.32 billion. This growth underscores the strong
 demand for AI-related computing solutions.

Gaming Revenue: The Gaming segment also saw growth, up 22% YOY to $2.49
 billion.

Record Data Center Revenue: Nvidia achieved a record in Data Center revenue,
 highlighting the strong demand for its products and services.

These results reflect Nvidia's strong position in the market, particularly in
 areas related to accelerated computing and AI platforms. The company
 continues to benefit from trends in artificial intelligence and machine
 learning.

For more detailed information, you can refer to Nvidia's official press
 release regarding their financial results for the second quarter of fiscal
 year 2024.

Also read: RAG vs Agentic RAG: A Comprehensive Guide

Key Takeaways

Integration of Advanced Technologies: Combines tools like LangChain, Azure OpenAI, Tavily Search API, and DeepLake vector stores to create a robust system for information retrieval and NLP.
Retrieval-Augmented Generation (RAG): Blends real-time web search, document retrieval, and conversational AI to deliver accurate and context-aware responses.
Efficient Document Management: Uses DeepLake vector stores for quick retrieval of relevant document sections, optimizing large document handling.
Azure OpenAI Language Modeling: Leverages GPT-4 for coherent, human-like, and contextually appropriate responses.
Dynamic Web Search: Tavily Search API enriches responses with real-time web information for comprehensive answers.
Context and Memory Management: SQL-based chat memory ensures coherent, context-aware interactions across sessions.
Flexible, Scalable Architecture: Modular and configurable design supports easy expansion with additional models, information sources, or features.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Conclusion

In this article, we explored the creation of a real-time, Agentic RAG Application using LangChain, Tavily, and OpenAI GPT-4. This powerful integration of technologies enables the application to provide accurate, contextually relevant answers by combining document retrieval, real-time web search, and conversational memory. Our guide presents a flexible and scalable approach, adaptable to various models, data sources, and functionalities. By following these steps, developers can build advanced AI-powered solutions that meet today’s demand for up-to-date and comprehensive information accessibility.

If you are looking for an RAG course online then explore: RAG Systemt Essentials.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Frequently Asked Questions

Q1. What is the main purpose of Agentic RAG Application?

Ans. The primary purpose of Agentic RAG application is to provide accurate, real-time answers by combining document retrieval, real-time web search, and language generation. It leverages specialized tools for subtasks, ensuring precise and contextually relevant responses.

Q2. How does the Tavily Search API enhance the RAG system?

Ans. The Tavily Search API integrates real-time web search into the system, fetching up-to-date, relevant information in a structured JSON format. It provides additional context to queries that cannot be answered by pre-loaded document data alone.

Q3. Why is DeepLake used in this application?

Ans. DeepLake serves as the vector store for storing document embeddings. It allows for efficient retrieval of relevant document chunks based on similarity search, ensuring the application can quickly access and use the required information.

Sukanya Bag

An ace multi-skilled programmer whose major area of work and interest lies in Software Development, Data Science, and Machine Learning. A proactive and detail-oriented individual who loves data storytelling, and is curious and passionate to solve complex value-oriented business problems with Data Science and Machine Learning to deliver robust machine learning pipelines that ensure maximum impact.

In my free time, I focus on creating Data Science and AI/ML content, providing 1:1 mentorships, career guidance and interview preparation tips, with a sole focus on teaching complex topics the easier way, to help people make a successful career transition to Data Science with the right skillset!

Advanced AI Agents RAG

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Building an Agentic RAG Application using LangChain, Tavily & GPT-4

Learning Objectives

Table of contents

What is Agentic RAG and How Does it Work?

Skills Necessary to Implement

Implementation of Agentic RAG Application

Creating Environment

Initial Configurations

Tavily Search Tool Configuration

Chat OpenAI Configuration

Defining the Prompt Template

Document Pre-processing and Ingestion

Creating the Retrieval tool

Initializing the Agent and Agent Executor

Output

Key Takeaways

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp