In an era where artificial intelligence (AI) is tasked with navigating and synthesizing vast amounts of information, the efficiency and accuracy of retrieval methods are paramount. Anthropic, a leading AI research company, has introduced a groundbreaking approach called Contextual Retrieval-Augmented Generation (RAG). This method marries traditional retrieval techniques with innovative tweaks, significantly enhancing retrieval accuracy and relevance. Dubbed “stupidly brilliant,” Anthropic’s Contextual RAG demonstrates that simplicity when applied thoughtfully, can lead to extraordinary advancements in AI.
This article was published as a part of the Data Science Blogathon.
Retrieval-Augmented Generation (RAG) is a pivotal technique in the AI landscape, aiming to fetch pertinent information that a model can utilize to generate accurate, context-rich responses. Traditional RAG systems predominantly rely on embeddings, which adeptly capture the semantic essence of text but sometimes falter in precise keyword matching. Recognizing these limitations, Anthropic has developed Contextual RAG—a series of ingenious optimizations that elevate the retrieval process without adding undue complexity.
By integrating embeddings with BM25, increasing the number of chunks fed to the model, and implementing reranking, Contextual RAG redefines the potential of RAG systems. This layered approach ensures that the AI not only understands the context but also retrieves the most relevant information with remarkable precision.
Anthropic’s Contextual RAG stands out due to its strategic combination of established retrieval methods enhanced with subtle, yet impactful modifications. Let’s delve into the four key innovations that make this approach exceptionally effective.
Embeddings are vector representations of text that capture semantic relationships, enabling models to understand context and meaning beyond mere keyword matching. On the other hand, BM25 is a robust keyword-based retrieval algorithm known for its precision in lexical matching.
Contextual RAG ingeniously combines these two methods:
Why It’s Brilliant: While combining these methods might appear straightforward, the synergy they create is profound. BM25’s precision complements embeddings’ contextual depth, resulting in a retrieval process that is both accurate and contextually aware. This dual approach allows the model to grasp the intent behind queries more effectively, leading to higher quality responses.
Traditional RAG systems often limit retrieval to the top 5 or 10 chunks of information, which can constrain the model’s ability to generate comprehensive responses. Contextual RAG breaks this limitation by expanding the retrieval to the top-20 chunks.
Benefits of Top-20 Chunk Retrieval:
Why It’s Brilliant: Simply increasing the number of retrieved chunks amplifies the variety and depth of information available to the model. This broader context ensures that responses are not only accurate but also nuanced and well-rounded.
In Contextual RAG, each retrieved chunk contains additional context, ensuring clarity and relevance when viewed independently. This is particularly crucial for complex queries where individual chunks might be ambiguous.
Implementation of Self-Contained Chunks:
Why It’s Brilliant: Enhancing each chunk with additional context minimizes ambiguity and ensures that the model can effectively utilize each piece of information. This leads to more precise and coherent responses, as the AI can better discern the significance of each chunk in relation to the query.
After retrieving the most relevant chunks, reranking is employed to order them based on their relevance. This step ensures that the highest-quality information is prioritized, which is especially important when dealing with token limitations.
Reranking Process:
Why It’s Brilliant: Reranking acts as a final filter that elevates the most relevant and high-quality chunks to the forefront. This prioritization ensures that the model focuses on the most critical information, maximizing the effectiveness of the response even within token constraints.
The true genius of Contextual RAG lies in how these four innovations interconnect and amplify each other. Individually, each enhancement offers significant improvements, but their combined effect creates a highly optimized retrieval pipeline.
Synergistic Integration:
Outcome: This layered approach transforms traditional RAG systems into a refined, highly effective retrieval mechanism. The synergy between these strategies results in a system that is not only more accurate and relevant but also more robust in handling diverse and complex queries.
This hands-on exercise allows you to experience how Contextual RAG retrieves, contextualizes, reranks, and generates answers using a retrieval-augmented generation model. The enhanced workflow now includes detailed steps on how context is generated for each chunk using the original document and the chunk itself, as well as adding surrounding context before indexing it into the vector database.
Make sure to install the following dependencies to run the code:
pip install langchain langchain-openai openai faiss-cpu python-dotenv rank_bm25
pip install -U langchain-community
Load essential Python libraries for text processing, embeddings, and retrieval. Import LangChain modules for text splitting, vector stores, and AI model interactions.
import hashlib
import os
import getpass
from typing import List, Tuple
from dotenv import load_dotenv
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from rank_bm25 import BM25Okapi
Set the OPENAI_API_KEY
using the secure userdata
module in your environment. This ensures seamless access to OpenAI’s language models without exposing sensitive credentials.
from google.colab import userdata
os.environ["OPENAI_API_KEY"] =userdata.get('openai')
Sets the OPENAI_API_KEY environment variable to the value retrieved from userdata, specifically the key saved under the name ‘openai’. This makes the API key accessible within the environment for secure access by OpenAI functions.
This code defines the ContextualRetrieval class, which processes documents to enhance searchability by creating contextualized chunks.
class ContextualRetrieval:
"""
A class that implements the Contextual Retrieval system.
"""
def __init__(self):
"""
Initialize the ContextualRetrieval system.
"""
self.text_splitter = RecursiveCharacterTextSplitter(
chunk_size=800,
chunk_overlap=100,
)
self.embeddings = OpenAIEmbeddings()
self.llm = ChatOpenAI(
model="gpt-4o",
temperature=0,
max_tokens=None,
timeout=None,
max_retries=2,
)
def process_document(self, document: str) -> Tuple[List[Document], List[Document]]:
"""
Process a document by splitting it into chunks and generating context for each chunk.
"""
chunks = self.text_splitter.create_documents([document])
contextualized_chunks = self._generate_contextualized_chunks(document, chunks)
return chunks, contextualized_chunks
def _generate_contextualized_chunks(self, document: str, chunks: List[Document]) -> List[Document]:
"""
Generate contextualized versions of the given chunks.
"""
contextualized_chunks = []
for chunk in chunks:
context = self._generate_context(document, chunk.page_content)
contextualized_content = f"{context}\n\n{chunk.page_content}"
contextualized_chunks.append(Document(page_content=contextualized_content, metadata=chunk.metadata))
return contextualized_chunks
def _generate_context(self, document: str, chunk: str) -> str:
"""
Generate context for a specific chunk using the language model.
"""
prompt = ChatPromptTemplate.from_template("""
You are an AI assistant specializing in financial analysis, particularly for Tesla, Inc. Your task is to provide brief, relevant context for a chunk of text from Tesla's Q3 2023 financial report.
Here is the financial report:
<document>
{document}
</document>
Here is the chunk we want to situate within the whole document::
<chunk>
{chunk}
</chunk>
Provide a concise context (2-3 sentences) for this chunk, considering the following guidelines:
1. Identify the main financial topic or metric discussed (e.g., revenue, profitability, segment performance, market position).
2. Mention any relevant time periods or comparisons (e.g., Q3 2023, year-over-year changes).
3. If applicable, note how this information relates to Tesla's overall financial health, strategy, or market position.
4. Include any key figures or percentages that provide important context.
5. Do not use phrases like "This chunk discusses" or "This section provides". Instead, directly state the context.
Please give a short succinct context to situate this chunk within the overall document for the purposes of improving search retrieval of the chunk. Answer only with the succinct context and nothing else.
Context:
""")
messages = prompt.format_messages(document=document, chunk=chunk)
response = self.llm.invoke(messages)
return response.content
def create_vectorstores(self, chunks: List[Document]) -> FAISS:
"""
Create a vector store for the given chunks.
"""
return FAISS.from_documents(chunks, self.embeddings)
def create_bm25_index(self, chunks: List[Document]) -> BM25Okapi:
"""
Create a BM25 index for the given chunks.
"""
tokenized_chunks = [chunk.page_content.split() for chunk in chunks]
return BM25Okapi(tokenized_chunks)
@staticmethod
def generate_cache_key(document: str) -> str:
"""
Generate a cache key for a document.
"""
return hashlib.md5(document.encode()).hexdigest()
def generate_answer(self, query: str, relevant_chunks: List[str]) -> str:
prompt = ChatPromptTemplate.from_template("""
Based on the following information, please provide a concise and accurate answer to the question.
If the information is not sufficient to answer the question, say so.
Question: {query}
Relevant information:
{chunks}
Answer:
""")
messages = prompt.format_messages(query=query, chunks="\n\n".join(relevant_chunks))
response = self.llm.invoke(messages)
return response.content
This block of code assigns a detailed financial document about Tesla, Inc.’s Q3 2023 performance to the variable document it is a document initialization for contextual retrieval.
# Example financial document
document = """
Tesla, Inc. (TSLA) Financial Analysis and Market Overview - Q3 2023
Executive Summary:
Tesla, Inc. (NASDAQ: TSLA) continues to lead the electric vehicle (EV) market, showcasing strong financial performance and strategic growth initiatives in Q3 2023. This comprehensive analysis delves into Tesla's financial statements, market position, and future outlook, providing investors and stakeholders with crucial insights into the company's performance and potential.
1. Financial Performance Overview:
Revenue:
Tesla reported total revenue of $23.35 billion in Q3 2023, marking a 9% increase year-over-year (YoY) from $21.45 billion in Q3 2022. The automotive segment remained the primary revenue driver, contributing $19.63 billion, up 5% YoY. Energy generation and storage revenue saw significant growth, reaching $1.56 billion, a 40% increase YoY.
Profitability:
Gross profit for Q3 2023 stood at $4.18 billion, with a gross margin of 17.9%. While this represents a decrease from the 25.1% gross margin in Q3 2022, it remains above industry averages. Operating income was $1.76 billion, resulting in an operating margin of 7.6%. Net income attributable to common stockholders was $1.85 billion, translating to diluted earnings per share (EPS) of $0.53.
Cash Flow and Liquidity:
Tesla's cash and cash equivalents at the end of Q3 2023 were $26.08 billion, a robust position that provides ample liquidity for ongoing operations and future investments. Free cash flow for the quarter was $0.85 billion, reflecting the company's ability to generate cash despite significant capital expenditures.
2. Operational Highlights:
Production and Deliveries:
Tesla produced 430,488 vehicles in Q3 2023, a 17% increase YoY. The Model 3/Y accounted for 419,666 units, while the Model S/X contributed 10,822 units. Total deliveries reached 435,059 vehicles, up 27% YoY, demonstrating strong demand and improved production efficiency.
Manufacturing Capacity:
The company's installed annual vehicle production capacity increased to over 2 million units across its factories in Fremont, Shanghai, Berlin-Brandenburg, and Texas. The Shanghai Gigafactory remains the highest-volume plant, with an annual capacity exceeding 950,000 units.
Energy Business:
Tesla's energy storage deployments grew by 90% YoY, reaching 4.0 GWh in Q3 2023. Solar deployments also increased by 48% YoY to 106 MW, reflecting growing demand for Tesla's energy products.
3. Market Position and Competitive Landscape:
Global EV Market Share:
Tesla maintained its position as the world's largest EV manufacturer by volume, with an estimated global market share of 18% in Q3 2023. However, competition is intensifying, particularly from Chinese manufacturers like BYD and established automakers accelerating their EV strategies.
Brand Strength:
Tesla's brand value continues to grow, ranked as the 12th most valuable brand globally by Interbrand in 2023, with an estimated brand value of $56.3 billion, up 4% from 2022.
Technology Leadership:
The company's focus on innovation, particularly in battery technology and autonomous driving capabilities, remains a key differentiator. Tesla's Full Self-Driving (FSD) beta program has expanded to over 800,000 customers in North America, showcasing its advanced driver assistance systems.
4. Strategic Initiatives and Future Outlook:
Product Roadmap:
Tesla reaffirmed its commitment to launching the Cybertruck in 2023, with initial deliveries expected in Q4. The company also hinted at progress on a next-generation vehicle platform, aimed at significantly reducing production costs.
Expansion Plans:
Plans for a new Gigafactory in Mexico are progressing, with production expected to commence in 2025. This facility will focus on producing Tesla's next-generation vehicles and expand the company's North American manufacturing footprint.
Battery Production:
Tesla continues to ramp up its in-house battery cell production, with 4680 cells now being used in Model Y vehicles produced at the Texas Gigafactory. The company aims to achieve an annual production rate of 1,000 GWh by 2030.
5. Risk Factors and Challenges:
Supply Chain Constraints:
While easing compared to previous years, supply chain issues continue to pose challenges, particularly in sourcing semiconductor chips and raw materials for batteries.
Regulatory Environment:
Evolving regulations around EVs, autonomous driving, and data privacy across different markets could impact Tesla's operations and expansion plans.
Macroeconomic Factors:
Rising interest rates and inflationary pressures may affect consumer demand for EVs and impact Tesla's profit margins.
Competition:
Intensifying competition in the EV market, especially in key markets like China and Europe, could pressure Tesla's market share and pricing power.
6. Financial Ratios and Metrics:
Profitability Ratios:
- Return on Equity (ROE): 18.2%
- Return on Assets (ROA): 10.3%
- EBITDA Margin: 15.7%
Liquidity Ratios:
- Current Ratio: 1.73
- Quick Ratio: 1.25
Efficiency Ratios:
- Asset Turnover Ratio: 0.88
- Inventory Turnover Ratio: 11.2
Valuation Metrics:
- Price-to-Earnings (P/E) Ratio: 70.5
- Price-to-Sales (P/S) Ratio: 7.8
- Enterprise Value to EBITDA (EV/EBITDA): 41.2
7. Segment Analysis:
Automotive Segment:
- Revenue: $19.63 billion (84% of total revenue)
- Gross Margin: 18.9%
- Key Products: Model 3, Model Y, Model S, Model X
Energy Generation and Storage:
- Revenue: $1.56 billion (7% of total revenue)
- Gross Margin: 14.2%
- Key Products: Powerwall, Powerpack, Megapack, Solar Roof
Services and Other:
- Revenue: $2.16 billion (9% of total revenue)
- Gross Margin: 5.3%
- Includes vehicle maintenance, repair, and used vehicle sales
Conclusion:
Tesla's Q3 2023 financial results demonstrate the company's continued leadership in the EV market, with strong revenue growth and operational improvements. While facing increased competition and margin pressures, Tesla's robust balance sheet, technological innovations, and expanding product portfolio position it well for future growth. Investors should monitor key metrics such as production ramp-up, margin trends, and progress on strategic initiatives to assess Tesla's long-term value proposition in the rapidly evolving automotive and energy markets.
"""
Prepares the system to process documents, create context-based embeddings, and enable search functionality for relevant content.
This step ensures that cr is ready to use for further operations like processing documents or generating answers based on queries.
# Initialize ContextualRetrieval
cr = ContextualRetrieval()
cr
This code takes a document and breaks it into smaller pieces, creating two versions of these pieces: one that keeps each part exactly as it is in the original (called original_chunks) and another where each part has been processed to add extra context or formatting (called contextualized_chunks). It then counts how many pieces are in the contextualized_chunks list to see how many sections were created with added context. Finally, it prints out the first piece from the original_chunks list to show what the first part of the document looks like in its unaltered form.
# Process the document
original_chunks, contextualized_chunks = cr.process_document(document)
len(contextualized_chunks)
print(original_chunks[0])
Combine the top-ranked chunks into a coherent context string and generate a detailed response using the GPT-4o model.
print(contextualized_chunks[0])
print(original_chunks[10])
print(contextualized_chunks[10])
In this code:
This step involves creating search indexes for both the original and context-enhanced chunks of the document, making it easier to search and retrieve relevant information from those chunks:
This step prepares both types of search systems (vector-based and BM25-based) to efficiently search and retrieve information from the two versions of the document (original and contextualized). It enhances the ability to perform both semantic searches (based on meaning) and keyword-based searches.
# Create vectorstores
original_vectorstore = cr.create_vectorstores(original_chunks)
contextualized_vectorstore = cr.create_vectorstores(contextualized_chunks)
# Create BM25 indexes
original_bm25_index = cr.create_bm25_index(original_chunks)
contextualized_bm25_index = cr.create_bm25_index(contextualized_chunks)
This step generates a unique cache key for the document to efficiently track and store its processed data, preventing the need to re-process it later. It also prints out the number of chunks the document was divided into and displays the generated cache key, helping to confirm the document’s processing and caching status. This is useful for optimizing document retrieval and managing processed data efficiently.
# Generate cache key for the document
cache_key = cr.generate_cache_key(document)
cache_key
print(f"Processed {len(original_chunks)} chunks")
print(f"Cache key for the document: {cache_key}")
# Example queries related to financial information
queries = [
"What was Tesla's total revenue in Q3 2023? what was the gross profit and cash position?",
"How does the automotive gross margin in Q3 2023 compare to the previous year?",
"What is Tesla's current debt-to-equity ratio?",
"How much did Tesla invest in R&D during Q3 2023?",
"What is Tesla's market share in the global EV market for Q3 2023?"
]
for query in queries:
print(f"\nQuery: {query}")
# Retrieve from original vectorstore
original_vector_results = original_vectorstore.similarity_search(query, k=3)
# Retrieve from contextualized vectorstore
contextualized_vector_results = contextualized_vectorstore.similarity_search(query, k=3)
# Retrieve from original BM25
original_tokenized_query = query.split()
original_bm25_results = original_bm25_index.get_top_n(original_tokenized_query, original_chunks, n=3)
# Retrieve from contextualized BM25
contextualized_tokenized_query = query.split()
contextualized_bm25_results = contextualized_bm25_index.get_top_n(contextualized_tokenized_query, contextualized_chunks, n=3)
# Generate answers
original_vector_answer = cr.generate_answer(query, [doc.page_content for doc in original_vector_results])
contextualized_vector_answer = cr.generate_answer(query, [doc.page_content for doc in contextualized_vector_results])
original_bm25_answer = cr.generate_answer(query, [doc.page_content for doc in original_bm25_results])
contextualized_bm25_answer = cr.generate_answer(query, [doc.page_content for doc in contextualized_bm25_results])
print("\nOriginal Vector Search Results:")
for i, doc in enumerate(original_vector_results, 1):
print(f"{i}. {doc.page_content[:200]}...")
print("\nOriginal Vector Search Answer:")
print(original_vector_answer)
print("\n" + "-"*50)
print("\nContextualized Vector Search Results:")
for i, doc in enumerate(contextualized_vector_results, 1):
print(f"{i}. {doc.page_content[:200]}...")
print("\nContextualized Vector Search Answer:")
print(contextualized_vector_answer)
print("\n" + "-"*50)
print("\nOriginal BM25 Search Results:")
for i, doc in enumerate(original_bm25_results, 1):
print(f"{i}. {doc.page_content[:200]}...")
print("\nOriginal BM25 Search Answer:")
print(original_bm25_answer)
print("\n" + "-"*50)
print("\nContextualized BM25 Search Results:")
for i, doc in enumerate(contextualized_bm25_results, 1):
print(f"{i}. {doc.page_content[:200]}...")
print("\nContextualized BM25 Search Answer:")
print(contextualized_bm25_answer)
print("\n" + "="*50)
This step involves searching and answering specific queries about Tesla’s financial information using different retrieval methods and data versions:
Output:
# Complex queries requiring contextual information
queries = [
"How do Tesla's financial results in Q3 2023 reflect its overall strategy in both the automotive and energy sectors? Consider revenue growth, profitability, and investments in each sector.",
"Analyze the relationship between Tesla's R&D spending, capital expenditures, and its financial performance. How might this impact its competitive position in the EV and energy storage markets over the next 3-5 years?",
"Compare Tesla's financial health and market position in different geographic regions. How do regional variations in revenue, market share, and growth rates inform Tesla's global strategy?",
"Evaluate Tesla's progress in vertical integration, considering its investments in battery production, software development, and manufacturing capabilities. How is this reflected in its financial statements and future outlook?",
"Assess the potential impact of Tesla's Full Self-Driving (FSD) technology on its financial projections. Consider revenue streams, liability risks, and required investments in the context of the broader autonomous vehicle market.",
"How does Tesla's financial performance and strategy in the energy storage and generation segment align with or diverge from its automotive business? What synergies or conflicts exist between these segments?",
"Analyze Tesla's capital structure and liquidity position in the context of its growth strategy and market conditions. How well-positioned is the company to weather potential economic downturns or increased competition?",
"Evaluate Tesla's pricing strategy across its product lines and geographic markets. How does this strategy impact its financial metrics, market share, and competitive positioning?",
"Considering Tesla's current financial position, market trends, and competitive landscape, what are the most significant opportunities and risks for the company in the next 2-3 years? How might these factors affect its financial projections?",
"Assess the potential financial implications of Tesla's expansion into new markets or product categories (e.g., Cybertruck, robotaxis, AI). How do these initiatives align with the company's core competencies and financial strategy?"
]
for query in queries:
print(f"\nQuery: {query}")
# Retrieve from original vectorstore
original_vector_results = original_vectorstore.similarity_search(query, k=3)
# Retrieve from contextualized vectorstore
contextualized_vector_results = contextualized_vectorstore.similarity_search(query, k=3)
# Retrieve from original BM25
original_tokenized_query = query.split()
original_bm25_results = original_bm25_index.get_top_n(original_tokenized_query, original_chunks, n=3)
# Retrieve from contextualized BM25
contextualized_tokenized_query = query.split()
contextualized_bm25_results = contextualized_bm25_index.get_top_n(contextualized_tokenized_query, contextualized_chunks, n=3)
# Generate answers
original_vector_answer = cr.generate_answer(query, [doc.page_content for doc in original_vector_results])
contextualized_vector_answer = cr.generate_answer(query, [doc.page_content for doc in contextualized_vector_results])
original_bm25_answer = cr.generate_answer(query, [doc.page_content for doc in original_bm25_results])
contextualized_bm25_answer = cr.generate_answer(query, [doc.page_content for doc in contextualized_bm25_results])
print("\nOriginal Vector Search Results:")
for i, doc in enumerate(original_vector_results, 1):
print(f"{i}. {doc.page_content[:200]}...")
print("\nOriginal Vector Search Answer:")
print(original_vector_answer)
print("\n" + "-"*50)
print("\nContextualized Vector Search Results:")
for i, doc in enumerate(contextualized_vector_results, 1):
print(f"{i}. {doc.page_content[:200]}...")
print("\nContextualized Vector Search Answer:")
print(contextualized_vector_answer)
print("\n" + "-"*50)
print("\nOriginal BM25 Search Results:")
for i, doc in enumerate(original_bm25_results, 1):
print(f"{i}. {doc.page_content[:200]}...")
print("\nOriginal BM25 Search Answer:")
print(original_bm25_answer)
print("\n" + "-"*50)
print("\nContextualized BM25 Search Results:")
for i, doc in enumerate(contextualized_bm25_results, 1):
print(f"{i}. {doc.page_content[:200]}...")
print("\nContextualized BM25 Search Answer:")
print(contextualized_bm25_answer)
print("\n" + "="*50)
This code runs a series of complex financial queries about Tesla and retrieves relevant documents using four different search methods: original vectorstore, contextualized vectorstore, original BM25, and contextualized BM25. For each method, it retrieves the top 3 relevant documents and generates answers by summarizing the content. The system prints the results and answers for each query, enabling you to directly compare how each search method and data version (original vs. contextualized) performs in delivering answers to these detailed financial questions.
This hands-on exercise demonstrates how contextual RAG workflows enhance document retrieval and answer generation by adding context and using multiple search techniques. It’s particularly useful for handling large, complex documents like financial reports, where understanding the relationships between various parts of the document is key to accurate and meaningful answers.
Anthropic’s Contextual RAG exemplifies the profound impact of seemingly simple optimizations on complex systems. By intelligently stacking straightforward improvements—combining embeddings with BM25, expanding the retrieval pool, enriching chunks with context, and implementing reranking—Anthropic has transformed traditional RAG into a highly optimized retrieval system.
Contextual RAG stands out by delivering substantial improvements through elegant simplicity in a field where incremental changes often yield marginal gains. This approach not only enhances retrieval accuracy and relevance but also sets a new standard for how AI systems can effectively manage and utilize vast amounts of information.
Anthropic’s work serves as a testament to the idea that sometimes, the most effective solutions are those that leverage simplicity with strategic insight. Contextual RAG’s “stupidly brilliant” design proves that in the quest for better AI, thoughtful layering of simple techniques can lead to extraordinary results.
For more details refer to this.
A. Contextual RAG improves retrieval accuracy by integrating embeddings with BM25, expanding the retrieval pool, making chunks self-contained, and reranking results for optimal relevance. This multi-layered approach enhances both precision and contextual depth.
A. Expanding the number of retrieved chunks increases the diversity of information the model receives, leading to more comprehensive and well-rounded responses.
A. Reranking ensures that the highest-relevance chunks appear first, helping the model focus on the most valuable information. This is especially useful when token limits restrict the number of chunks used.
A. Yes, Contextual RAG can integrate with various generative AI models. The retrieval and reranking methods are model-agnostic and can work alongside different architectures.
A. Although it involves several steps, Contextual RAG optimizes efficiency. The combination of embeddings with BM25, self-contained chunks, and reranking improves retrieval without adding undue complexity.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.