Magic Behind Anthropic’s Contextual RAG for AI Retrieval

Neha Dwivedi Last Updated : 20 Nov, 2024

17 min read

In an era where artificial intelligence (AI) is tasked with navigating and synthesizing vast amounts of information, the efficiency and accuracy of retrieval methods are paramount. Anthropic, a leading AI research company, has introduced a groundbreaking approach called Contextual Retrieval-Augmented Generation (RAG). This method marries traditional retrieval techniques with innovative tweaks, significantly enhancing retrieval accuracy and relevance. Dubbed “stupidly brilliant,” Anthropic’s Contextual RAG demonstrates that simplicity when applied thoughtfully, can lead to extraordinary advancements in AI.

Learning Objectives

Understand the core challenges in AI retrieval and how Contextual RAG addresses them.
Learn about the unique synergy between embeddings and BM25 in Contextual RAG.
Experience how expanding context and self-contained chunks enhance response quality.
Practice reranking techniques to optimize the quality of retrieved information.
Develop a comprehensive understanding of layered optimizations for retrieval-augmented generation.

This article was published as a part of the Data Science Blogathon.

Learning Objectives
Understanding the Need for Enhanced Retrieval in AI
Core Innovations of Contextual RAG
Synergy at Work: How Contextual RAG Transforms AI Retrieval
Practical Application: Hands-On Exercise with Contextual RAG
Conclusion
- Key Takeaways
Frequently Asked Questions

Understanding the Need for Enhanced Retrieval in AI

Retrieval-Augmented Generation (RAG) is a pivotal technique in the AI landscape, aiming to fetch pertinent information that a model can utilize to generate accurate, context-rich responses. Traditional RAG systems predominantly rely on embeddings, which adeptly capture the semantic essence of text but sometimes falter in precise keyword matching. Recognizing these limitations, Anthropic has developed Contextual RAG—a series of ingenious optimizations that elevate the retrieval process without adding undue complexity.

By integrating embeddings with BM25, increasing the number of chunks fed to the model, and implementing reranking, Contextual RAG redefines the potential of RAG systems. This layered approach ensures that the AI not only understands the context but also retrieves the most relevant information with remarkable precision.

Core Innovations of Contextual RAG

Anthropic’s Contextual RAG stands out due to its strategic combination of established retrieval methods enhanced with subtle, yet impactful modifications. Let’s delve into the four key innovations that make this approach exceptionally effective.

Embeddings + BM25: The Perfect Synergy

Embeddings are vector representations of text that capture semantic relationships, enabling models to understand context and meaning beyond mere keyword matching. On the other hand, BM25 is a robust keyword-based retrieval algorithm known for its precision in lexical matching.

Contextual RAG ingeniously combines these two methods:

Embeddings handle the nuanced understanding of language, capturing the semantic essence of queries and documents.
BM25 ensures that exact keyword matches are not overlooked, maintaining high precision in retrieval.

Why It’s Brilliant: While combining these methods might appear straightforward, the synergy they create is profound. BM25’s precision complements embeddings’ contextual depth, resulting in a retrieval process that is both accurate and contextually aware. This dual approach allows the model to grasp the intent behind queries more effectively, leading to higher quality responses.

Expanding Context: The Top-20 Chunk Method

Traditional RAG systems often limit retrieval to the top 5 or 10 chunks of information, which can constrain the model’s ability to generate comprehensive responses. Contextual RAG breaks this limitation by expanding the retrieval to the top-20 chunks.

Benefits of Top-20 Chunk Retrieval:

Richer Context: A larger pool of information provides the model with a more diverse and comprehensive understanding of the topic.
Increased Relevance: With more chunks to analyze, the likelihood of including relevant information that might not appear in the top 5 results increases.
Enhanced Decision-Making: The model can make more informed decisions by evaluating a broader spectrum of data.

Why It’s Brilliant: Simply increasing the number of retrieved chunks amplifies the variety and depth of information available to the model. This broader context ensures that responses are not only accurate but also nuanced and well-rounded.

Expanding Context: The Top-20 Chunk Method

Self-Contained Chunks: Enhancing Each Piece of Information

In Contextual RAG, each retrieved chunk contains additional context, ensuring clarity and relevance when viewed independently. This is particularly crucial for complex queries where individual chunks might be ambiguous.

Implementation of Self-Contained Chunks:

Contextual Augmentation: Each chunk is supplemented with enough background information to make it understandable on its own.
Reduction of Ambiguity: By providing standalone context, the model can accurately interpret each chunk’s relevance without relying on surrounding information.

Why It’s Brilliant: Enhancing each chunk with additional context minimizes ambiguity and ensures that the model can effectively utilize each piece of information. This leads to more precise and coherent responses, as the AI can better discern the significance of each chunk in relation to the query.

Reranking for Optimal Relevance

After retrieving the most relevant chunks, reranking is employed to order them based on their relevance. This step ensures that the highest-quality information is prioritized, which is especially important when dealing with token limitations.

Reranking Process:

Assessment of Relevance: Each chunk is evaluated for its relevance to the query.
Optimal Ordering: Chunks are reordered so that the most pertinent information appears first.
Quality Assurance: Ensures that the most valuable content is prioritized, enhancing the overall response quality.

Why It’s Brilliant: Reranking acts as a final filter that elevates the most relevant and high-quality chunks to the forefront. This prioritization ensures that the model focuses on the most critical information, maximizing the effectiveness of the response even within token constraints.

Synergy at Work: How Contextual RAG Transforms AI Retrieval

The true genius of Contextual RAG lies in how these four innovations interconnect and amplify each other. Individually, each enhancement offers significant improvements, but their combined effect creates a highly optimized retrieval pipeline.

Synergistic Integration:

Dual-Method Retrieval: Embeddings and BM25 work together to balance semantic understanding with lexical precision.
Expanded Retrieval Pool: Retrieving the top-20 chunks ensures a comprehensive information base.
Contextual Enrichment: Self-contained chunks provide clarity and reduce ambiguity.
Reranking Excellence: Prioritizing relevant chunks ensures that the most valuable information is utilized effectively.

Outcome: This layered approach transforms traditional RAG systems into a refined, highly effective retrieval mechanism. The synergy between these strategies results in a system that is not only more accurate and relevant but also more robust in handling diverse and complex queries.

Stacking the Benefits: A Masterclass in Synergy: Anthropic’s Contextual RAG

Practical Application: Hands-On Exercise with Contextual RAG

This hands-on exercise allows you to experience how Contextual RAG retrieves, contextualizes, reranks, and generates answers using a retrieval-augmented generation model. The enhanced workflow now includes detailed steps on how context is generated for each chunk using the original document and the chunk itself, as well as adding surrounding context before indexing it into the vector database.

Setting Up the Environment

Make sure to install the following dependencies to run the code:

pip install langchain langchain-openai openai faiss-cpu python-dotenv rank_bm25
pip install -U langchain-community

Step 1: Import Libraries and Initialize Models

Load essential Python libraries for text processing, embeddings, and retrieval. Import LangChain modules for text splitting, vector stores, and AI model interactions.

import hashlib
import os
import getpass
from typing import List, Tuple
from dotenv import load_dotenv
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from rank_bm25 import BM25Okapi

Step 2: Set the OpenAI API Key

Set the OPENAI_API_KEY using the secure userdata module in your environment. This ensures seamless access to OpenAI’s language models without exposing sensitive credentials.

from google.colab import userdata
os.environ["OPENAI_API_KEY"] =userdata.get('openai')

Sets the OPENAI_API_KEY environment variable to the value retrieved from userdata, specifically the key saved under the name ‘openai’. This makes the API key accessible within the environment for secure access by OpenAI functions.

Step 3: Implement Contextual Document Retrieval System

This code defines the ContextualRetrieval class, which processes documents to enhance searchability by creating contextualized chunks.

Initialize Components: Sets up a text splitter, embeddings generator, and language model for processing.
Process Document: Splits the document into chunks and generates context for each chunk.
Context Generation: Uses a prompt to generate contextual summaries for each chunk, focusing on financial topics for better search relevance.
Vector Store & BM25 Index: Creates a FAISS vector store and a BM25 index for embedding-based and keyword-based search.
Cache Key Generation: Generates a unique key for each document to enable caching.
Answer Generation: Constructs a prompt to generate concise answers based on relevant document chunks, enhancing retrieval accuracy.

class ContextualRetrieval:
    """
    A class that implements the Contextual Retrieval system.
    """


    def __init__(self):
        """
        Initialize the ContextualRetrieval system.
        """
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=800,
            chunk_overlap=100,
        )
        self.embeddings = OpenAIEmbeddings()
        self.llm = ChatOpenAI(
            model="gpt-4o",
            temperature=0,
            max_tokens=None,
            timeout=None,
            max_retries=2,
        )


    def process_document(self, document: str) -> Tuple[List[Document], List[Document]]:
        """
        Process a document by splitting it into chunks and generating context for each chunk.
        """
        chunks = self.text_splitter.create_documents([document])
        contextualized_chunks = self._generate_contextualized_chunks(document, chunks)
        return chunks, contextualized_chunks


    def _generate_contextualized_chunks(self, document: str, chunks: List[Document]) -> List[Document]:
        """
        Generate contextualized versions of the given chunks.
        """
        contextualized_chunks = []
        for chunk in chunks:
            context = self._generate_context(document, chunk.page_content)
            contextualized_content = f"{context}\n\n{chunk.page_content}"
            contextualized_chunks.append(Document(page_content=contextualized_content, metadata=chunk.metadata))
        return contextualized_chunks


    def _generate_context(self, document: str, chunk: str) -> str:
        """
        Generate context for a specific chunk using the language model.
        """
        prompt = ChatPromptTemplate.from_template("""
        You are an AI assistant specializing in financial analysis, particularly for Tesla, Inc. Your task is to provide brief, relevant context for a chunk of text from Tesla's Q3 2023 financial report.
        Here is the financial report:
        <document>
        {document}
        </document>


        Here is the chunk we want to situate within the whole document::
        <chunk>
        {chunk}
        </chunk>


        Provide a concise context (2-3 sentences) for this chunk, considering the following guidelines:
        1. Identify the main financial topic or metric discussed (e.g., revenue, profitability, segment performance, market position).
        2. Mention any relevant time periods or comparisons (e.g., Q3 2023, year-over-year changes).
        3. If applicable, note how this information relates to Tesla's overall financial health, strategy, or market position.
        4. Include any key figures or percentages that provide important context.
        5. Do not use phrases like "This chunk discusses" or "This section provides". Instead, directly state the context.


        Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else.


        Context:
        """)
        messages = prompt.format_messages(document=document, chunk=chunk)
        response = self.llm.invoke(messages)
        return response.content


    def create_vectorstores(self, chunks: List[Document]) -> FAISS:
        """
        Create a vector store for the given chunks.
        """
        return FAISS.from_documents(chunks, self.embeddings)


    def create_bm25_index(self, chunks: List[Document]) -> BM25Okapi:
        """
        Create a BM25 index for the given chunks.
        """
        tokenized_chunks = [chunk.page_content.split() for chunk in chunks]
        return BM25Okapi(tokenized_chunks)


    @staticmethod
    def generate_cache_key(document: str) -> str:
        """
        Generate a cache key for a document.
        """
        return hashlib.md5(document.encode()).hexdigest()


    def generate_answer(self, query: str, relevant_chunks: List[str]) -> str:
        prompt = ChatPromptTemplate.from_template("""
        Based on the following information, please provide a concise and accurate answer to the question.
        If the information is not sufficient to answer the question, say so.


        Question: {query}


        Relevant information:
        {chunks}


        Answer:
        """)
        messages = prompt.format_messages(query=query, chunks="\n\n".join(relevant_chunks))
        response = self.llm.invoke(messages)
        return response.content

Step 4: Define a Sample Financial Document for Analysis

This block of code assigns a detailed financial document about Tesla, Inc.’s Q3 2023 performance to the variable document it is a document initialization for contextual retrieval.

# Example financial document

document = """
    Tesla, Inc. (TSLA) Financial Analysis and Market Overview - Q3 2023


    Executive Summary:
    Tesla, Inc. (NASDAQ: TSLA) continues to lead the electric vehicle (EV) market, showcasing strong financial performance and strategic growth initiatives in Q3 2023. This comprehensive analysis delves into Tesla's financial statements, market position, and future outlook, providing investors and stakeholders with crucial insights into the company's performance and potential.


    1. Financial Performance Overview:


    Revenue:
    Tesla reported total revenue of $23.35 billion in Q3 2023, marking a 9% increase year-over-year (YoY) from $21.45 billion in Q3 2022. The automotive segment remained the primary revenue driver, contributing $19.63 billion, up 5% YoY. Energy generation and storage revenue saw significant growth, reaching $1.56 billion, a 40% increase YoY.


    Profitability:
    Gross profit for Q3 2023 stood at $4.18 billion, with a gross margin of 17.9%. While this represents a decrease from the 25.1% gross margin in Q3 2022, it remains above industry averages. Operating income was $1.76 billion, resulting in an operating margin of 7.6%. Net income attributable to common stockholders was $1.85 billion, translating to diluted earnings per share (EPS) of $0.53.


    Cash Flow and Liquidity:
    Tesla's cash and cash equivalents at the end of Q3 2023 were $26.08 billion, a robust position that provides ample liquidity for ongoing operations and future investments. Free cash flow for the quarter was $0.85 billion, reflecting the company's ability to generate cash despite significant capital expenditures.


    2. Operational Highlights:


    Production and Deliveries:
    Tesla produced 430,488 vehicles in Q3 2023, a 17% increase YoY. The Model 3/Y accounted for 419,666 units, while the Model S/X contributed 10,822 units. Total deliveries reached 435,059 vehicles, up 27% YoY, demonstrating strong demand and improved production efficiency.


    Manufacturing Capacity:
    The company's installed annual vehicle production capacity increased to over 2 million units across its factories in Fremont, Shanghai, Berlin-Brandenburg, and Texas. The Shanghai Gigafactory remains the highest-volume plant, with an annual capacity exceeding 950,000 units.


    Energy Business:
    Tesla's energy storage deployments grew by 90% YoY, reaching 4.0 GWh in Q3 2023. Solar deployments also increased by 48% YoY to 106 MW, reflecting growing demand for Tesla's energy products.


    3. Market Position and Competitive Landscape:


    Global EV Market Share:
    Tesla maintained its position as the world's largest EV manufacturer by volume, with an estimated global market share of 18% in Q3 2023. However, competition is intensifying, particularly from Chinese manufacturers like BYD and established automakers accelerating their EV strategies.


    Brand Strength:
    Tesla's brand value continues to grow, ranked as the 12th most valuable brand globally by Interbrand in 2023, with an estimated brand value of $56.3 billion, up 4% from 2022.


    Technology Leadership:
    The company's focus on innovation, particularly in battery technology and autonomous driving capabilities, remains a key differentiator. Tesla's Full Self-Driving (FSD) beta program has expanded to over 800,000 customers in North America, showcasing its advanced driver assistance systems.


    4. Strategic Initiatives and Future Outlook:


    Product Roadmap:
    Tesla reaffirmed its commitment to launching the Cybertruck in 2023, with initial deliveries expected in Q4. The company also hinted at progress on a next-generation vehicle platform, aimed at significantly reducing production costs.


    Expansion Plans:
    Plans for a new Gigafactory in Mexico are progressing, with production expected to commence in 2025. This facility will focus on producing Tesla's next-generation vehicles and expand the company's North American manufacturing footprint.


    Battery Production:
    Tesla continues to ramp up its in-house battery cell production, with 4680 cells now being used in Model Y vehicles produced at the Texas Gigafactory. The company aims to achieve an annual production rate of 1,000 GWh by 2030.


    5. Risk Factors and Challenges:


    Supply Chain Constraints:
    While easing compared to previous years, supply chain issues continue to pose challenges, particularly in sourcing semiconductor chips and raw materials for batteries.


    Regulatory Environment:
    Evolving regulations around EVs, autonomous driving, and data privacy across different markets could impact Tesla's operations and expansion plans.


    Macroeconomic Factors:
    Rising interest rates and inflationary pressures may affect consumer demand for EVs and impact Tesla's profit margins.


    Competition:
    Intensifying competition in the EV market, especially in key markets like China and Europe, could pressure Tesla's market share and pricing power.


    6. Financial Ratios and Metrics:


    Profitability Ratios:
    - Return on Equity (ROE): 18.2%
    - Return on Assets (ROA): 10.3%
    - EBITDA Margin: 15.7%


    Liquidity Ratios:
    - Current Ratio: 1.73
    - Quick Ratio: 1.25


    Efficiency Ratios:
    - Asset Turnover Ratio: 0.88
    - Inventory Turnover Ratio: 11.2


    Valuation Metrics:
    - Price-to-Earnings (P/E) Ratio: 70.5
    - Price-to-Sales (P/S) Ratio: 7.8
    - Enterprise Value to EBITDA (EV/EBITDA): 41.2


    7. Segment Analysis:


    Automotive Segment:
    - Revenue: $19.63 billion (84% of total revenue)
    - Gross Margin: 18.9%
    - Key Products: Model 3, Model Y, Model S, Model X


    Energy Generation and Storage:
    - Revenue: $1.56 billion (7% of total revenue)
    - Gross Margin: 14.2%
    - Key Products: Powerwall, Powerpack, Megapack, Solar Roof


    Services and Other:
    - Revenue: $2.16 billion (9% of total revenue)
    - Gross Margin: 5.3%
    - Includes vehicle maintenance, repair, and used vehicle sales


    Conclusion:
    Tesla's Q3 2023 financial results demonstrate the company's continued leadership in the EV market, with strong revenue growth and operational improvements. While facing increased competition and margin pressures, Tesla's robust balance sheet, technological innovations, and expanding product portfolio position it well for future growth. Investors should monitor key metrics such as production ramp-up, margin trends, and progress on strategic initiatives to assess Tesla's long-term value proposition in the rapidly evolving automotive and energy markets.
    """

Step 5: Initialize the Contextual Retrieval System

Prepares the system to process documents, create context-based embeddings, and enable search functionality for relevant content.

This step ensures that cr is ready to use for further operations like processing documents or generating answers based on queries.

# Initialize ContextualRetrieval
cr = ContextualRetrieval()
cr

Step 5: Initialize the Contextual Retrieval System

Step 6: Process the Document and Get Chunk Length

This code takes a document and breaks it into smaller pieces, creating two versions of these pieces: one that keeps each part exactly as it is in the original (called original_chunks) and another where each part has been processed to add extra context or formatting (called contextualized_chunks). It then counts how many pieces are in the contextualized_chunks list to see how many sections were created with added context. Finally, it prints out the first piece from the original_chunks list to show what the first part of the document looks like in its unaltered form.

# Process the document
original_chunks, contextualized_chunks = cr.process_document(document)
len(contextualized_chunks)
print(original_chunks[0])

Step 6: Process the Document and Get Chunk Length

Step 7: Print Specific Chunks

Combine the top-ranked chunks into a coherent context string and generate a detailed response using the GPT-4o model.

print(contextualized_chunks[0])
print(original_chunks[10])
print(contextualized_chunks[10])

Step 7: Print Specific Chunks : Anthropic’s Contextual RAG

In this code:

print(contextualized_chunks[0]): This prints the first chunk of the document that includes added context. It’s useful to see how the very first section of the document looks after processing.
print(original_chunks[10]): This prints the 11th chunk (index 10) from the original, unmodified version of the document. This gives a snapshot of what the document looks like in its raw form at this position.
print(contextualized_chunks[10]): It prints the 11th chunk (index 10) of the document from the contextualized version, enabling you to compare how adding context modified the original content.

Step 8: Creating Search Indexes

This step involves creating search indexes for both the original and context-enhanced chunks of the document, making it easier to search and retrieve relevant information from those chunks:

Vectorstore creation

The `create_vectorstores()` method converts the document chunks into numerical representations (vectors), which can be used for semantic search. This allows for searching based on meaning rather than exact keywords.
`original_vectorstore` holds the vectors for the original chunks, and `contextualized_vectorstore` holds the vectors for the context-enhanced chunks.

BM25 index creation

The `create_bm25_index()` method creates an index based on the BM25 algorithm, which is a common way to rank chunks of text based on keyword matching and relevance.
`original_bm25_index` holds the BM25 index for the original chunks, and `contextualized_bm25_index` holds the BM25 index for the context-enhanced chunks.

This step prepares both types of search systems (vector-based and BM25-based) to efficiently search and retrieve information from the two versions of the document (original and contextualized). It enhances the ability to perform both semantic searches (based on meaning) and keyword-based searches.

# Create vectorstores
original_vectorstore = cr.create_vectorstores(original_chunks)
contextualized_vectorstore = cr.create_vectorstores(contextualized_chunks)
# Create BM25 indexes
original_bm25_index = cr.create_bm25_index(original_chunks)
contextualized_bm25_index = cr.create_bm25_index(contextualized_chunks)

Step 9: Generate a Unique cache key

This step generates a unique cache key for the document to efficiently track and store its processed data, preventing the need to re-process it later. It also prints out the number of chunks the document was divided into and displays the generated cache key, helping to confirm the document’s processing and caching status. This is useful for optimizing document retrieval and managing processed data efficiently.

# Generate cache key for the document
cache_key = cr.generate_cache_key(document)
cache_key
print(f"Processed {len(original_chunks)} chunks")
print(f"Cache key for the document: {cache_key}")

Step 10: Searching and Answering Queries

# Example queries related to financial information
queries = [
        "What was Tesla's total revenue in Q3 2023? what was the gross profit and cash position?",
        "How does the automotive gross margin in Q3 2023 compare to the previous year?",
        "What is Tesla's current debt-to-equity ratio?",
        "How much did Tesla invest in R&D during Q3 2023?",
        "What is Tesla's market share in the global EV market for Q3 2023?"
    ]
for query in queries:
        print(f"\nQuery: {query}")


        # Retrieve from original vectorstore
        original_vector_results = original_vectorstore.similarity_search(query, k=3)


        # Retrieve from contextualized vectorstore
        contextualized_vector_results = contextualized_vectorstore.similarity_search(query, k=3)


        # Retrieve from original BM25
        original_tokenized_query = query.split()
        original_bm25_results = original_bm25_index.get_top_n(original_tokenized_query, original_chunks, n=3)


        # Retrieve from contextualized BM25
        contextualized_tokenized_query = query.split()
        contextualized_bm25_results = contextualized_bm25_index.get_top_n(contextualized_tokenized_query, contextualized_chunks, n=3)


        # Generate answers
        original_vector_answer = cr.generate_answer(query, [doc.page_content for doc in original_vector_results])
        contextualized_vector_answer = cr.generate_answer(query, [doc.page_content for doc in contextualized_vector_results])
        original_bm25_answer = cr.generate_answer(query, [doc.page_content for doc in original_bm25_results])
        contextualized_bm25_answer = cr.generate_answer(query, [doc.page_content for doc in contextualized_bm25_results])




        print("\nOriginal Vector Search Results:")
        for i, doc in enumerate(original_vector_results, 1):
            print(f"{i}. {doc.page_content[:200]}...")


        print("\nOriginal Vector Search Answer:")
        print(original_vector_answer)
        print("\n" + "-"*50)


        print("\nContextualized Vector Search Results:")
        for i, doc in enumerate(contextualized_vector_results, 1):
            print(f"{i}. {doc.page_content[:200]}...")


        print("\nContextualized Vector Search Answer:")
        print(contextualized_vector_answer)
        print("\n" + "-"*50)


        print("\nOriginal BM25 Search Results:")
        for i, doc in enumerate(original_bm25_results, 1):
            print(f"{i}. {doc.page_content[:200]}...")


        print("\nOriginal BM25 Search Answer:")
        print(original_bm25_answer)
        print("\n" + "-"*50)


        print("\nContextualized BM25 Search Results:")
        for i, doc in enumerate(contextualized_bm25_results, 1):
            print(f"{i}. {doc.page_content[:200]}...")


        print("\nContextualized BM25 Search Answer:")
        print(contextualized_bm25_answer)


        print("\n" + "="*50)

This step involves searching and answering specific queries about Tesla’s financial information using different retrieval methods and data versions:

The queries ask for financial details, like revenue, margins, and market share.
The search methods (original vectorstore, contextualized vectorstore, original BM25, and contextualized BM25) find the most relevant documents or text chunks to answer the queries.
Each search method retrieves the top 3 results and generates an answer by summarizing their content.
The system then prints the retrieved documents and answers for each query, enabling a comparison of how the different search methods perform in providing answers.

Output:

Step 10: Searching and Answering Queries : Anthropic’s Contextual RAG

Step 11: Searching and Answering for Complex Queries

 # Complex queries requiring contextual information
    queries = [
        "How do Tesla's financial results in Q3 2023 reflect its overall strategy in both the automotive and energy sectors? Consider revenue growth, profitability, and investments in each sector.",


        "Analyze the relationship between Tesla's R&D spending, capital expenditures, and its financial performance. How might this impact its competitive position in the EV and energy storage markets over the next 3-5 years?",


        "Compare Tesla's financial health and market position in different geographic regions. How do regional variations in revenue, market share, and growth rates inform Tesla's global strategy?",


        "Evaluate Tesla's progress in vertical integration, considering its investments in battery production, software development, and manufacturing capabilities. How is this reflected in its financial statements and future outlook?",


        "Assess the potential impact of Tesla's Full Self-Driving (FSD) technology on its financial projections. Consider revenue streams, liability risks, and required investments in the context of the broader autonomous vehicle market.",


        "How does Tesla's financial performance and strategy in the energy storage and generation segment align with or diverge from its automotive business? What synergies or conflicts exist between these segments?",


        "Analyze Tesla's capital structure and liquidity position in the context of its growth strategy and market conditions. How well-positioned is the company to weather potential economic downturns or increased competition?",


        "Evaluate Tesla's pricing strategy across its product lines and geographic markets. How does this strategy impact its financial metrics, market share, and competitive positioning?",


        "Considering Tesla's current financial position, market trends, and competitive landscape, what are the most significant opportunities and risks for the company in the next 2-3 years? How might these factors affect its financial projections?",


        "Assess the potential financial implications of Tesla's expansion into new markets or product categories (e.g., Cybertruck, robotaxis, AI). How do these initiatives align with the company's core competencies and financial strategy?"
    ]
for query in queries:
        print(f"\nQuery: {query}")


        # Retrieve from original vectorstore
        original_vector_results = original_vectorstore.similarity_search(query, k=3)


        # Retrieve from contextualized vectorstore
        contextualized_vector_results = contextualized_vectorstore.similarity_search(query, k=3)


        # Retrieve from original BM25
        original_tokenized_query = query.split()
        original_bm25_results = original_bm25_index.get_top_n(original_tokenized_query, original_chunks, n=3)


        # Retrieve from contextualized BM25
        contextualized_tokenized_query = query.split()
        contextualized_bm25_results = contextualized_bm25_index.get_top_n(contextualized_tokenized_query, contextualized_chunks, n=3)


        # Generate answers
        original_vector_answer = cr.generate_answer(query, [doc.page_content for doc in original_vector_results])
        contextualized_vector_answer = cr.generate_answer(query, [doc.page_content for doc in contextualized_vector_results])
        original_bm25_answer = cr.generate_answer(query, [doc.page_content for doc in original_bm25_results])
        contextualized_bm25_answer = cr.generate_answer(query, [doc.page_content for doc in contextualized_bm25_results])

        print("\nOriginal Vector Search Results:")
        for i, doc in enumerate(original_vector_results, 1):
            print(f"{i}. {doc.page_content[:200]}...")


        print("\nOriginal Vector Search Answer:")
        print(original_vector_answer)
        print("\n" + "-"*50)


        print("\nContextualized Vector Search Results:")
        for i, doc in enumerate(contextualized_vector_results, 1):
            print(f"{i}. {doc.page_content[:200]}...")


        print("\nContextualized Vector Search Answer:")
        print(contextualized_vector_answer)
        print("\n" + "-"*50)


        print("\nOriginal BM25 Search Results:")
        for i, doc in enumerate(original_bm25_results, 1):
            print(f"{i}. {doc.page_content[:200]}...")


        print("\nOriginal BM25 Search Answer:")
        print(original_bm25_answer)
        print("\n" + "-"*50)


        print("\nContextualized BM25 Search Results:")
        for i, doc in enumerate(contextualized_bm25_results, 1):
            print(f"{i}. {doc.page_content[:200]}...")


        print("\nContextualized BM25 Search Answer:")
        print(contextualized_bm25_answer)


        print("\n" + "="*50)

This code runs a series of complex financial queries about Tesla and retrieves relevant documents using four different search methods: original vectorstore, contextualized vectorstore, original BM25, and contextualized BM25. For each method, it retrieves the top 3 relevant documents and generates answers by summarizing the content. The system prints the results and answers for each query, enabling you to directly compare how each search method and data version (original vs. contextualized) performs in delivering answers to these detailed financial questions.

Summary

This hands-on exercise demonstrates how contextual RAG workflows enhance document retrieval and answer generation by adding context and using multiple search techniques. It’s particularly useful for handling large, complex documents like financial reports, where understanding the relationships between various parts of the document is key to accurate and meaningful answers.

Conclusion

Anthropic’s Contextual RAG exemplifies the profound impact of seemingly simple optimizations on complex systems. By intelligently stacking straightforward improvements—combining embeddings with BM25, expanding the retrieval pool, enriching chunks with context, and implementing reranking—Anthropic has transformed traditional RAG into a highly optimized retrieval system.

Contextual RAG stands out by delivering substantial improvements through elegant simplicity in a field where incremental changes often yield marginal gains. This approach not only enhances retrieval accuracy and relevance but also sets a new standard for how AI systems can effectively manage and utilize vast amounts of information.

Anthropic’s work serves as a testament to the idea that sometimes, the most effective solutions are those that leverage simplicity with strategic insight. Contextual RAG’s “stupidly brilliant” design proves that in the quest for better AI, thoughtful layering of simple techniques can lead to extraordinary results.

For more details refer to this.

Key Takeaways

The combination of embeddings and BM25 harnesses both semantic depth and lexical precision, ensuring comprehensive and accurate information retrieval.
Expanding retrieval to the top-20 chunks enriches the information pool, enabling more informed and nuanced responses.
Self-contained chunks reduce ambiguity and improve the model’s ability to interpret and utilize information effectively.
By prioritizing the most relevant chunks, you highlight critical information, enhancing response accuracy and relevance.

Frequently Asked Questions

Q1. What makes Contextual RAG different from traditional RAG systems?

A. Contextual RAG improves retrieval accuracy by integrating embeddings with BM25, expanding the retrieval pool, making chunks self-contained, and reranking results for optimal relevance. This multi-layered approach enhances both precision and contextual depth.

Q2. Why does Contextual RAG use the Top-20 Chunk Method?

A. Expanding the number of retrieved chunks increases the diversity of information the model receives, leading to more comprehensive and well-rounded responses.

Q3. How does reranking improve retrieval results?

A. Reranking ensures that the highest-relevance chunks appear first, helping the model focus on the most valuable information. This is especially useful when token limits restrict the number of chunks used.

Q4. Can I use Contextual RAG with other AI models besides GPT?

A. Yes, Contextual RAG can integrate with various generative AI models. The retrieval and reranking methods are model-agnostic and can work alongside different architectures.

Q5. Is Contextual RAG computationally expensive?

A. Although it involves several steps, Contextual RAG optimizes efficiency. The combination of embeddings with BM25, self-contained chunks, and reranking improves retrieval without adding undue complexity.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Neha Dwivedi

I'm Neha Dwivedi, a Data Science enthusiast , Graduated from MIT World Peace University,Pune. I'm passionate about Data Science and rising trends with it. I'm excited to share insights and learn from this community!

Advanced Generative AI LLMs

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Magic Behind Anthropic’s Contextual RAG for AI Retrieval

Learning Objectives

Table of contents

Understanding the Need for Enhanced Retrieval in AI

Core Innovations of Contextual RAG

Embeddings + BM25: The Perfect Synergy

Expanding Context: The Top-20 Chunk Method

Self-Contained Chunks: Enhancing Each Piece of Information

Reranking for Optimal Relevance

Synergy at Work: How Contextual RAG Transforms AI Retrieval

Practical Application: Hands-On Exercise with Contextual RAG

Setting Up the Environment

Step 1: Import Libraries and Initialize Models

Step 2: Set the OpenAI API Key

Step 3: Implement Contextual Document Retrieval System

Step 4: Define a Sample Financial Document for Analysis

Step 5: Initialize the Contextual Retrieval System

Step 6: Process the Document and Get Chunk Length

Step 7: Print Specific Chunks

Step 8: Creating Search Indexes

Vectorstore creation

BM25 index creation

Step 9: Generate a Unique cache key

Step 10: Searching and Answering Queries

Step 11: Searching and Answering for Complex Queries

Summary

Conclusion

Key Takeaways

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)