Building Multi-Document Agentic RAG using LLamaIndex

Ketan Kumar 05 Sep, 2024
12 min read

Introduction

In the rapidly evolving field of artificial intelligence, the ability to process and understand vast amounts of information is becoming increasingly crucial. Enter Multi-Document Agentic RAG – a powerful approach that combines Retrieval-Augmented Generation (RAG) with agent-based systems to create AI that can reason across multiple documents. This guide will walk you through the concept, implementation, and potential of this exciting technology.

Learning Objectives

  • Understand the fundamentals of Multi-Document Agentic RAG systems and their architecture.
  • Learn how embeddings and agent-based reasoning enhance AI’s ability to generate contextually accurate responses.
  • Explore advanced retrieval mechanisms that improve information extraction in knowledge-intensive applications.
  • Gain insights into the applications of Multi-Document Agentic RAG in complex fields like research and legal analysis.
  • Develop the ability to evaluate the effectiveness of RAG systems in AI-driven content generation and analysis.

This article was published as a part of the Data Science Blogathon.

Understanding RAG and Multi-Document Agents

Retrieval-Augmented Generation (RAG) is a technique that enhances language models by allowing them to access and use external knowledge. Instead of relying solely on their trained parameters, RAG models can retrieve relevant information from a knowledge base to generate more accurate and informed responses.

Understanding RAG and Multi-Document Agents

Multi-Document Agentic RAG takes this concept further by enabling an AI agent to work with multiple documents simultaneously. This approach is particularly valuable for tasks that require synthesizing information from various sources, such as academic research, market analysis, or legal document review.

Why Multi-Document Agentic RAG is a Game-Changer?

Let us understand why multi-document agentic RAG is a game-changer.

  • Smarter Understanding of Context: Imagine having a super-smart assistant that doesn’t just read one book, but an entire library to answer your question. That’s what enhanced contextual understanding means. By analyzing multiple documents, the AI can piece together a more complete picture, giving you answers that truly capture the big picture.
  • Boost in Accuracy for Tricky Tasks: We’ve all played “connect the dots” as kids. Multi-Document Agentic RAG does something similar, but with information. By connecting facts from various sources, it can tackle complex problems with greater precision. This means more reliable answers, especially when dealing with intricate topics.
  • Handling Information Overload Like a Pro: In today’s world, we’re drowning in data. Multi-Document Agentic RAG is like a supercharged filter, sifting through massive amounts of information to find what’s truly relevant. It’s like having a team of experts working around the clock to digest and summarize vast libraries of knowledge.
  • Adaptable and Growable Knowledge Base: Think of this as a digital brain that can easily learn and expand. As new information becomes available, Multi-Document Agentic RAG can seamlessly incorporate it. This means your AI assistant is always up-to-date, ready to tackle the latest questions with the freshest information.

Key Strengths of Multi-Document Agentic RAG Systems

We will now look into the key strengths of multi-document agentic RAG systems.

  • Supercharging Academic Research: Researchers often spend weeks or months synthesizing information from hundreds of papers. Multi-Document Agentic RAG can dramatically speed up this process, helping scholars quickly identify key trends, gaps in knowledge, and potential breakthroughs across vast bodies of literature.
  • Revolutionizing Legal Document Analysis: Lawyers deal with mountains of case files, contracts, and legal precedents. This technology can swiftly analyze thousands of documents, spotting critical details, inconsistencies, and relevant case law that might take a human team days or weeks to uncover.
  • Turbocharging Market Intelligence: Businesses need to stay ahead of trends and competition. Multi-Document Agentic RAG can continuously scan news articles, social media, and industry reports, providing real-time insights and helping companies make data-driven decisions faster than ever before.
  • Navigating Technical Documentation with Ease: For engineers and IT professionals, finding the right information in sprawling technical documentation can be like searching for a needle in a haystack. This AI-powered approach can quickly pinpoint relevant sections across multiple manuals, troubleshooting guides, and code repositories, saving countless hours of frustration.

Building Blocks of Multi-Document Agentic RAG

Imagine you’re building a super-smart digital library assistant. This assistant can read thousands of books, understand complex questions, and give you detailed answers using information from multiple sources. That’s essentially what a Multi-Document Agentic RAG system does. Let’s break down the key components that make this possible:

Building Blocks of Multi-Document Agentic RAG

Document Processing

Converts all types of documents (PDFs, web pages, Word files, etc.) into a format that our AI can understand.

Creating Embeddings

Transforms the processed text into numerical vectors (sequences of numbers) that represent the meaning and context of the information.

In simple terms, imagine creating a super-condensed summary of each paragraph in your library, but instead of words, you use a unique code. This code captures the essence of the information in a way that computers can quickly compare and analyze.

Indexing

It creates an efficient structure to store and retrieve these embeddings. This is like creating the world’s most efficient card catalog for our digital library. It allows our AI to quickly locate relevant information without having to scan every single document in detail.

Retrieval

It uses the query (your question) to find the most relevant pieces of information from the indexed embeddings. When you ask a question, this component races through our digital library, using that super-efficient card catalog to pull out all the potentially relevant pieces of information.

Agent-based Reasoning

An AI agent interprets the retrieved information in the context of your query, deciding how to use it to formulate an answer. This is like having a genius AI agent who not only finds the right documents but also understands the deeper meaning of your question. They can connect dots across different sources and figure out the best way to answer you.

Generation

It produces a human-readable answer based on the agent’s reasoning and the retrieved information. This is where our genius agent explains their findings to you in clear, concise language. They take all the complex information they’ve gathered and analyzed, and present it in a way that directly answers your question.

This powerful combination allows Multi-Document Agentic RAG systems to provide insights and answers that draw from a vast pool of knowledge, making them incredibly useful for complex research, analysis, and problem-solving tasks across many fields.

Implementing a Basic Multi-Document Agentic RAG

Let’s start by building a simple agentic RAG that can work with three academic papers. We’ll use the llama_index library, which provides powerful tools for building RAG systems.

Step1: Installation of Required Libraries

To get started with building your AI agent, you need to install the necessary libraries. Here are the steps to set up your environment:

  • Install Python: Ensure you have Python installed on your system. You can download it from the official Python website: Download Python
  • Set Up a Virtual Environment: It’s good practice to create a virtual environment for your project to manage dependencies. Run the following commands to set up a virtual environment:
python -m venv ai_agent_env
source ai_agent_env/bin/activate  # On Windows, use `ai_agent_env\Scripts\activate`
  • Install OpenAI API and LlamaIndex:
pip install openai llama-index==0.10.27 llama-index-llms-openai==0.1.15
pip install llama-index-embeddings-openai==0.1.7

Step2: Setting Up API Keys and Environment Variables

To use the OpenAI API, you need an API key. Follow these steps to set up your API key:

  • Obtain an API Key: Sign up for an account on the OpenAI website and obtain your API key from the API section.
  • Set Up Environment Variables: Store your API key in an environment variable to keep it secure. Add the following line to your .bashrc or .zshrc file (or use the appropriate method for your operating system)
export OPENAI_API_KEY='your_openai_api_key_here'
  • Access the API Key in Your Code: In your Python code, import necessary libraries, and access the API key using the os module
import os
import openai
import nest_asyncio
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.tools import FunctionTool, QueryEngineTool
from llama_index.core.vector_stores import MetadataFilters, FilterCondition
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner
from typing import List, Optional
import subprocess
openai.api_key = os.getenv('OPENAI_API_KEY')

#optionally, you could simply add openai key directly. (not a good practice)
#openai.api_key = 'your_openai_api_key_here'

nest_asyncio.apply()

Step3: Downloading Documents

As stated earlier, I am only using three papers to make this agentic rag, we would later scale this agentic rag to more papers in some other blog. You could use your own documents (optionally).

# List of URLs to download
urls = [
    "https://openreview.net/pdf?id=VtmBAGCN7o",
    "https://openreview.net/pdf?id=6PmJoRfdaK",
    "https://openreview.net/pdf?id=hSyW5go0v8",
]

# Corresponding filenames to save the files as
papers = [
    "metagpt.pdf",
    "longlora.pdf",
    "selfrag.pdf",
]

# Loop over both lists and download each file with its respective name
for url, paper in zip(urls, papers):
    subprocess.run(["wget", url, "-O", paper])

Step4: Creating Vector and Summary Tool

The below function, get_doc_tools, is designed to create two tools: a vector query tool and a summary query tool. These tools help in querying and summarizing a document using an agent-based retrieval-augmented generation (RAG) approach. Below are the steps and their explanations.

def get_doc_tools(
    file_path: str,
    name: str,
) -> str:
    """Get vector query and summary query tools from a document."""

Loading Documents and Preparing for Vector Indexing

The function starts by loading the document using SimpleDirectoryReader, which takes the provided file_path and reads the document’s contents. Once the document is loaded, it’s processed by SentenceSplitter, which breaks the document into smaller chunks, or nodes, each containing up to 1024 characters. These nodes are then indexed using VectorStoreIndex, a tool that allows for efficient vector-based queries. This index will later be used to perform searches over the document content based on vector similarity, making it easier to retrieve relevant information.

# Load documents from the specified file path
documents = SimpleDirectoryReader(input_files=[file_path]).load_data()

# Split the loaded document into smaller chunks (nodes) of 1024 characters
splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

# Create a vector index from the nodes for efficient vector-based queries
vector_index = VectorStoreIndex(nodes)

Defining the Vector Query Function

Here, the function defines vector_query, which is responsible for answering specific questions about the document. The function accepts a query string and an optional list of page numbers. If no page numbers are provided, the entire document is queried. The function first checks if page_numbers is provided; if not, it defaults to an empty list.

Then, it creates metadata filters that correspond to the specified page numbers. These filters help narrow down the search to specific parts of the document. The query_engine is created using the vector index and is configured to use these filters, along with a similarity threshold, to find the most relevant results. Finally, the function executes the query using this engine and returns the response.

  # vector query function
    def vector_query(
        query: str, 
        page_numbers: Optional[List[str]] = None
    ) -> str:
        """Use to answer questions over a given paper.
    
        Useful if you have specific questions over the paper.
        Always leave page_numbers as None UNLESS there is a specific page you want to search for.
    
        Args:
            query (str): the string query to be embedded.
            page_numbers (Optional[List[str]]): Filter by set of pages. Leave as NONE 
                if we want to perform a vector search
                over all pages. Otherwise, filter by the set of specified pages.
        
        """
    
        page_numbers = page_numbers or []
        metadata_dicts = [
            {"key": "page_label", "value": p} for p in page_numbers
        ]
        
        query_engine = vector_index.as_query_engine(
            similarity_top_k=2,
            filters=MetadataFilters.from_dicts(
                metadata_dicts,
                condition=FilterCondition.OR
            )
        )
        response = query_engine.query(query)
        return response

Creating the Vector Query Tool

This part of the function creates the vector_query_tool, a tool that links the previously defined vector_query function to a dynamically generated name based on the name parameter provided when calling get_doc_tools.

The tool is created using FunctionTool.from_defaults, which automatically configures it with the necessary defaults. This tool can now be used to perform vector-based queries over the document using the function defined earlier.

       
    # Creating the Vector Query Tool
    vector_query_tool = FunctionTool.from_defaults(
        name=f"vector_tool_{name}",
        fn=vector_query
    )

Creating the Summary Query Tool

In this final section, the function creates a tool for summarizing the document. First, it creates a SummaryIndex from the nodes that were previously split and indexed. This index is designed specifically for summarization tasks. The summary_query_engine is then created with a response mode of "tree_summarize", which allows the tool to generate concise summaries of the document content.

The summary_tool is finally created using QueryEngineTool.from_defaults, which links the query engine to a dynamically generated name based on the name parameter. The tool is also given a description indicating its purpose for summarization-related queries. This summary tool can now be used to generate summaries of the document based on user queries.

# Summary Query Tool
    summary_index = SummaryIndex(nodes)
    summary_query_engine = summary_index.as_query_engine(
        response_mode="tree_summarize",
        use_async=True,
    )
    summary_tool = QueryEngineTool.from_defaults(
        name=f"summary_tool_{name}",
        query_engine=summary_query_engine,
        description=(
            f"Useful for summarization questions related to {name}"
        ),
    )

    return vector_query_tool, summary_tool

Calling Function to Build Tools for Each Paper

paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

initial_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]
len(initial_tools)
Calling Function to Build Tools for Each Paper

This code processes each paper and creates two tools for each: a vector tool for semantic search and a summary tool for generating concise summaries, in this case 6 tools.

Step5: Creating the Agent

Earlier we created tools for agent to use, now we will create our agent using then FunctionCallingAgentWorker class. We would be using “gpt-3.5-turbo” as our llm. 

llm = OpenAI(model="gpt-3.5-turbo")

agent_worker = FunctionCallingAgentWorker.from_tools(
    initial_tools, 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

This agent can now answer questions about the three papers we’ve processed.

Step6: Analyzing Responses from the Agent

We asked our agent different questions from the three papers, and here is its response. Here are examples and explanation of how it works within.

Analyzing Responses from the Agent

Explanation of the Agent’s Interaction with LongLoRA Papers

In this example, we queried our agent to extract specific information from three research papers, particularly about the evaluation dataset and results used in the LongLoRA study. The agent interacts with the documents using the vector query tool, and here’s how it processes the information step-by-step:

  • User Input: The user asked two sequential questions regarding the evaluation aspect of LongLoRA: first about the evaluation dataset and then about the results.
  • Agent’s Query Execution: The agent identifies that it needs to search the LongLoRA document specifically for information about the evaluation dataset. It uses the vector_tool_longlora function, which is the vector query tool set up specifically for LongLoRA.
=== Calling Function ===
Calling function: vector_tool_longlora with args: {"query": "evaluation dataset"}
  • Function Output for Evaluation Dataset: The agent retrieves the relevant section from the document, identifying that the evaluation dataset used in LongLoRA is the “PG19 test split,” which is a dataset commonly used for language model evaluation due to its long-form text nature.
  • Agent’s Second Query Execution: Following the first response, the agent then processes the second part of the user’s question, querying the document about the evaluation results of LongLoRA.
=== Calling Function ===
Calling function: vector_tool_longlora with args: {"query": "evaluation results"}
  • Function Output for Evaluation Results: The agent returns detailed results showing how the models perform better in terms of perplexity with larger context sizes. It highlights key findings, such as improvements with larger context windows and specific context lengths (100k, 65536, and 32768). It also notes a trade-off, as extended models experience some perplexity degradation on smaller context sizes due to Position Interpolation—a common limitation in such models.
  • Final LLM Response: The agent synthesizes the results into a concise response that answers the initial question about the dataset. Further explanation of the evaluation results would follow, summarizing the performance findings and their implications.

Few More Examples for Other Papers

Few More Examples for Other Papers

Explanation of the Agent’s Behavior: Summarizing Self-RAG and LongLoRA

In this instance, the agent was tasked with providing summaries of both Self-RAG and LongLoRA. The behavior observed in this case differs from the previous example:

Summary Tool Usage

=== Calling Function ===
Calling function: summary_tool_selfrag with args: {"input": "Self-RAG"}

Unlike the earlier example, which involved querying specific details (like evaluation datasets and results), here the agent directly utilized the summary_tool functions designed for Self-RAG and LongLoRA. This shows the agent’s ability to adaptively switch between query tools based on the nature of the question—opting for summarization when a broader overview is required.

Distinct Calls to Separate Summarization Tools

=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "LongLoRA"}

The agent separately called summary_tool_selfrag and summary_tool_longlora to obtain the summaries, demonstrating its capacity to handle multi-part queries efficiently. It identifies the need to engage distinct summarization tools tailored to each paper rather than executing a single combined retrieval.

Conciseness and Directness of Responses

The responses provided by the agent were concise and directly addressed the prompt. This indicates that the agent can extract high-level insights effectively, contrasting with the previous example where it provided more granular data points based on specific vector queries.

This interaction highlights the agent’s capability to deliver high-level overviews versus detailed, context-specific responses observed previously. This shift in behavior underscores the versatility of the agentic RAG system in adjusting its query strategy based on the nature of the user’s question—whether it’s a need for in-depth detail or a broad summary.

Challenges and Considerations

While Multi-Document Agentic RAG is powerful, there are some challenges to keep in mind:

  • Scalability: As the number of documents grows, efficient indexing and retrieval become crucial.
  • Coherence: Ensuring that the agent produces coherent responses when integrating information from multiple sources.
  • Bias and Accuracy: The system’s output is only as good as its input documents and retrieval mechanism.
  • Computational Resources: Processing and embedding large numbers of documents can be resource-intensive.

Conclusion

Multi-Document Agentic RAG represents a significant advancement in the field of AI, enabling more accurate and context-aware responses by synthesizing information from multiple sources. This approach is particularly valuable in complex domains like research, legal analysis, and technical documentation, where precise information retrieval and reasoning are crucial. By leveraging embeddings, agent-based reasoning, and robust retrieval mechanisms, this system not only enhances the depth and reliability of AI-generated content but also paves the way for more sophisticated applications in knowledge-intensive industries. As technology continues to evolve, Multi-Document Agentic RAG is poised to become an essential tool for extracting meaningful insights from vast amounts of data.

Key Takeaways

  • Multi-Document Agentic RAG improves AI response accuracy by integrating information from multiple sources.
  • Embeddings and agent-based reasoning enhance the system’s ability to generate context-aware and reliable content.
  • The system is particularly valuable in complex fields like research, legal analysis, and technical documentation.
  • Advanced retrieval mechanisms ensure precise information extraction, supporting knowledge-intensive industries.
  • Multi-Document Agentic RAG represents a significant step forward in AI-driven content generation and data analysis.

Frequently Asked Questions

Q1. What is Multi-Document Agentic RAG?

A. Multi-Document Agentic RAG combines Retrieval-Augmented Generation (RAG) with agent-based systems to enable AI to reason across multiple documents.

Q2. How does Multi-Document Agentic RAG improve accuracy?

A. It enhances accuracy by synthesizing information from various sources, allowing AI to connect facts and provide more precise answers.

Q3. In which fields is Multi-Document Agentic RAG most beneficial?

A. It’s particularly valuable in academic research, legal document analysis, market intelligence, and technical documentation.

Q4. What are the key components of a Multi-Document Agentic RAG system?

A. The key components include document processing, creating embeddings, indexing, retrieval, agent-based reasoning, and generation.

Q5. What is the role of embeddings in this system?

A. Embeddings convert text into numerical vectors, capturing the meaning and context of information for efficient comparison and analysis.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Ketan Kumar 05 Sep, 2024

Hey everyone, Ketan Kumar here! I'm an M.Sc. student at VIT AP with a burning passion for Generative AI. My expertise lies in crafting machine learning models and wielding Natural Language Processing for innovative projects. Currently, I'm putting this knowledge to work in drug discovery research at Syngene International, exploring the potential of LLMs. Always eager to connect and delve deeper into the ever-evolving world of data science!

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,