In the rapidly evolving field of artificial intelligence, the ability to process and understand vast amounts of information is becoming increasingly crucial. Enter Multi-Document Agentic RAG – a powerful approach that combines Retrieval-Augmented Generation (RAG) with agent-based systems to create AI that can reason across multiple documents. This guide will walk you through the concept, implementation, and potential of this exciting technology.
This article was published as a part of the Data Science Blogathon.
Retrieval-Augmented Generation (RAG) is a technique that enhances language models by allowing them to access and use external knowledge. Instead of relying solely on their trained parameters, RAG models can retrieve relevant information from a knowledge base to generate more accurate and informed responses.
Multi-Document Agentic RAG takes this concept further by enabling an AI agent to work with multiple documents simultaneously. This approach is particularly valuable for tasks that require synthesizing information from various sources, such as academic research, market analysis, or legal document review.
Let us understand why multi-document agentic RAG is a game-changer.
We will now look into the key strengths of multi-document agentic RAG systems.
Imagine you’re building a super-smart digital library assistant. This assistant can read thousands of books, understand complex questions, and give you detailed answers using information from multiple sources. That’s essentially what a Multi-Document Agentic RAG system does. Let’s break down the key components that make this possible:
Converts all types of documents (PDFs, web pages, Word files, etc.) into a format that our AI can understand.
Transforms the processed text into numerical vectors (sequences of numbers) that represent the meaning and context of the information.
In simple terms, imagine creating a super-condensed summary of each paragraph in your library, but instead of words, you use a unique code. This code captures the essence of the information in a way that computers can quickly compare and analyze.
It creates an efficient structure to store and retrieve these embeddings. This is like creating the world’s most efficient card catalog for our digital library. It allows our AI to quickly locate relevant information without having to scan every single document in detail.
It uses the query (your question) to find the most relevant pieces of information from the indexed embeddings. When you ask a question, this component races through our digital library, using that super-efficient card catalog to pull out all the potentially relevant pieces of information.
An AI agent interprets the retrieved information in the context of your query, deciding how to use it to formulate an answer. This is like having a genius AI agent who not only finds the right documents but also understands the deeper meaning of your question. They can connect dots across different sources and figure out the best way to answer you.
It produces a human-readable answer based on the agent’s reasoning and the retrieved information. This is where our genius agent explains their findings to you in clear, concise language. They take all the complex information they’ve gathered and analyzed, and present it in a way that directly answers your question.
This powerful combination allows Multi-Document Agentic RAG systems to provide insights and answers that draw from a vast pool of knowledge, making them incredibly useful for complex research, analysis, and problem-solving tasks across many fields.
Let’s start by building a simple agentic RAG that can work with three academic papers. We’ll use the llama_index library, which provides powerful tools for building RAG systems.
To get started with building your AI agent, you need to install the necessary libraries. Here are the steps to set up your environment:
python -m venv ai_agent_env
source ai_agent_env/bin/activate # On Windows, use `ai_agent_env\Scripts\activate`
pip install openai llama-index==0.10.27 llama-index-llms-openai==0.1.15
pip install llama-index-embeddings-openai==0.1.7
To use the OpenAI API, you need an API key. Follow these steps to set up your API key:
export OPENAI_API_KEY='your_openai_api_key_here'
import os
import openai
import nest_asyncio
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.tools import FunctionTool, QueryEngineTool
from llama_index.core.vector_stores import MetadataFilters, FilterCondition
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner
from typing import List, Optional
import subprocess
openai.api_key = os.getenv('OPENAI_API_KEY')
#optionally, you could simply add openai key directly. (not a good practice)
#openai.api_key = 'your_openai_api_key_here'
nest_asyncio.apply()
As stated earlier, I am only using three papers to make this agentic rag, we would later scale this agentic rag to more papers in some other blog. You could use your own documents (optionally).
# List of URLs to download
urls = [
"https://openreview.net/pdf?id=VtmBAGCN7o",
"https://openreview.net/pdf?id=6PmJoRfdaK",
"https://openreview.net/pdf?id=hSyW5go0v8",
]
# Corresponding filenames to save the files as
papers = [
"metagpt.pdf",
"longlora.pdf",
"selfrag.pdf",
]
# Loop over both lists and download each file with its respective name
for url, paper in zip(urls, papers):
subprocess.run(["wget", url, "-O", paper])
The below function, get_doc_tools, is designed to create two tools: a vector query tool and a summary query tool. These tools help in querying and summarizing a document using an agent-based retrieval-augmented generation (RAG) approach. Below are the steps and their explanations.
def get_doc_tools(
file_path: str,
name: str,
) -> str:
"""Get vector query and summary query tools from a document."""
The function starts by loading the document using SimpleDirectoryReader
, which takes the provided file_path
and reads the document’s contents. Once the document is loaded, it’s processed by SentenceSplitter
, which breaks the document into smaller chunks, or nodes, each containing up to 1024 characters. These nodes are then indexed using VectorStoreIndex
, a tool that allows for efficient vector-based queries. This index will later be used to perform searches over the document content based on vector similarity, making it easier to retrieve relevant information.
# Load documents from the specified file path
documents = SimpleDirectoryReader(input_files=[file_path]).load_data()
# Split the loaded document into smaller chunks (nodes) of 1024 characters
splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)
# Create a vector index from the nodes for efficient vector-based queries
vector_index = VectorStoreIndex(nodes)
Here, the function defines vector_query
, which is responsible for answering specific questions about the document. The function accepts a query string and an optional list of page numbers. If no page numbers are provided, the entire document is queried. The function first checks if page_numbers
is provided; if not, it defaults to an empty list.
Then, it creates metadata filters that correspond to the specified page numbers. These filters help narrow down the search to specific parts of the document. The query_engine
is created using the vector index and is configured to use these filters, along with a similarity threshold, to find the most relevant results. Finally, the function executes the query using this engine and returns the response.
# vector query function
def vector_query(
query: str,
page_numbers: Optional[List[str]] = None
) -> str:
"""Use to answer questions over a given paper.
Useful if you have specific questions over the paper.
Always leave page_numbers as None UNLESS there is a specific page you want to search for.
Args:
query (str): the string query to be embedded.
page_numbers (Optional[List[str]]): Filter by set of pages. Leave as NONE
if we want to perform a vector search
over all pages. Otherwise, filter by the set of specified pages.
"""
page_numbers = page_numbers or []
metadata_dicts = [
{"key": "page_label", "value": p} for p in page_numbers
]
query_engine = vector_index.as_query_engine(
similarity_top_k=2,
filters=MetadataFilters.from_dicts(
metadata_dicts,
condition=FilterCondition.OR
)
)
response = query_engine.query(query)
return response
This part of the function creates the vector_query_tool
, a tool that links the previously defined vector_query
function to a dynamically generated name based on the name
parameter provided when calling get_doc_tools
.
The tool is created using FunctionTool.from_defaults
, which automatically configures it with the necessary defaults. This tool can now be used to perform vector-based queries over the document using the function defined earlier.
# Creating the Vector Query Tool
vector_query_tool = FunctionTool.from_defaults(
name=f"vector_tool_{name}",
fn=vector_query
)
In this final section, the function creates a tool for summarizing the document. First, it creates a SummaryIndex
from the nodes that were previously split and indexed. This index is designed specifically for summarization tasks. The summary_query_engine
is then created with a response mode of "tree_summarize"
, which allows the tool to generate concise summaries of the document content.
The summary_tool
is finally created using QueryEngineTool.from_defaults
, which links the query engine to a dynamically generated name based on the name
parameter. The tool is also given a description indicating its purpose for summarization-related queries. This summary tool can now be used to generate summaries of the document based on user queries.
# Summary Query Tool
summary_index = SummaryIndex(nodes)
summary_query_engine = summary_index.as_query_engine(
response_mode="tree_summarize",
use_async=True,
)
summary_tool = QueryEngineTool.from_defaults(
name=f"summary_tool_{name}",
query_engine=summary_query_engine,
description=(
f"Useful for summarization questions related to {name}"
),
)
return vector_query_tool, summary_tool
paper_to_tools_dict = {}
for paper in papers:
print(f"Getting tools for paper: {paper}")
vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
paper_to_tools_dict[paper] = [vector_tool, summary_tool]
initial_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]
len(initial_tools)
This code processes each paper and creates two tools for each: a vector tool for semantic search and a summary tool for generating concise summaries, in this case 6 tools.
Earlier we created tools for agent to use, now we will create our agent using then FunctionCallingAgentWorker class. We would be using “gpt-3.5-turbo” as our llm.
llm = OpenAI(model="gpt-3.5-turbo")
agent_worker = FunctionCallingAgentWorker.from_tools(
initial_tools,
llm=llm,
verbose=True
)
agent = AgentRunner(agent_worker)
This agent can now answer questions about the three papers we’ve processed.
We asked our agent different questions from the three papers, and here is its response. Here are examples and explanation of how it works within.
In this example, we queried our agent to extract specific information from three research papers, particularly about the evaluation dataset and results used in the LongLoRA study. The agent interacts with the documents using the vector query tool, and here’s how it processes the information step-by-step:
=== Calling Function ===
Calling function: vector_tool_longlora with args: {"query": "evaluation dataset"}
=== Calling Function ===
Calling function: vector_tool_longlora with args: {"query": "evaluation results"}
Few More Examples for Other Papers
In this instance, the agent was tasked with providing summaries of both Self-RAG and LongLoRA. The behavior observed in this case differs from the previous example:
=== Calling Function ===
Calling function: summary_tool_selfrag with args: {"input": "Self-RAG"}
Unlike the earlier example, which involved querying specific details (like evaluation datasets and results), here the agent directly utilized the summary_tool functions designed for Self-RAG and LongLoRA. This shows the agent’s ability to adaptively switch between query tools based on the nature of the question—opting for summarization when a broader overview is required.
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "LongLoRA"}
The agent separately called summary_tool_selfrag and summary_tool_longlora to obtain the summaries, demonstrating its capacity to handle multi-part queries efficiently. It identifies the need to engage distinct summarization tools tailored to each paper rather than executing a single combined retrieval.
The responses provided by the agent were concise and directly addressed the prompt. This indicates that the agent can extract high-level insights effectively, contrasting with the previous example where it provided more granular data points based on specific vector queries.
This interaction highlights the agent’s capability to deliver high-level overviews versus detailed, context-specific responses observed previously. This shift in behavior underscores the versatility of the agentic RAG system in adjusting its query strategy based on the nature of the user’s question—whether it’s a need for in-depth detail or a broad summary.
While Multi-Document Agentic RAG is powerful, there are some challenges to keep in mind:
Multi-Document Agentic RAG represents a significant advancement in the field of AI, enabling more accurate and context-aware responses by synthesizing information from multiple sources. This approach is particularly valuable in complex domains like research, legal analysis, and technical documentation, where precise information retrieval and reasoning are crucial. By leveraging embeddings, agent-based reasoning, and robust retrieval mechanisms, this system not only enhances the depth and reliability of AI-generated content but also paves the way for more sophisticated applications in knowledge-intensive industries. As technology continues to evolve, Multi-Document Agentic RAG is poised to become an essential tool for extracting meaningful insights from vast amounts of data.
A. Multi-Document Agentic RAG combines Retrieval-Augmented Generation (RAG) with agent-based systems to enable AI to reason across multiple documents.
A. It enhances accuracy by synthesizing information from various sources, allowing AI to connect facts and provide more precise answers.
A. It’s particularly valuable in academic research, legal document analysis, market intelligence, and technical documentation.
A. The key components include document processing, creating embeddings, indexing, retrieval, agent-based reasoning, and generation.
A. Embeddings convert text into numerical vectors, capturing the meaning and context of information for efficient comparison and analysis.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.