Top 5 RAG Frameworks for AI Applications

Harsh Mishra Last Updated : 20 Mar, 2025

7 min read

RAG has become a popular technology in 2025, it avoids the fine-tuning of the model which is expensive as well as time-consuming. There’s an increased demand for RAG frameworks in the current scenario, Lets Understand what are these. Retrieval-augmented generation (RAG) frameworks are essential tools in the field of artificial intelligence. They enhance the capabilities of Large Language Models (LLMs) by allowing them to retrieve relevant information from external sources. This leads to more accurate and context-aware responses. Here, we will explore five notable RAG frameworks: LangChain, LlamaIndex, LangGraph, Haystack, and RAGFlow. Each framework offers unique features that can improve your AI projects.

LangChain
LlamaIndex
LangGraph
Haystack
RAGFlow
Conclusion

1. LangChain

LangChain is a flexible framework that simplifies the development of applications using LLMs. It provides tools for building RAG applications, making integration straightforward.

Key Features:
- Modular design for easy customization.
- Supports various LLMs and data sources.
- Built-in tools for document retrieval and processing.
- Suitable for chatbots and virtual assistants.

Here’s the hands-on:

Install the following libraries

! pip install langchain_community tiktoken langchain-openai langchainhub chromadb langchain

Set up OpenAI API key and os environment

from getpass import getpass
openai = getpass("OpenAI API Key:")
import os
os.environ["OPENAI_API_KEY"] = openai

Import the following dependencies

import bs4
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

Loading the document for RAG using WebBase Loader (replace with your own Data)

# Load Documents
loader = WebBaseLoader(
   web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
   bs_kwargs=dict(
       parse_only=bs4.SoupStrainer(
           class_=("post-content", "post-title", "post-header")
       )
   ),
)
docs = loader.load()

Chunking the document using RecursiveCharacterTextSplitter

# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

Storing the vector documents in ChromaDB

# Embed
vectorstore = Chroma.from_documents(documents=splits,
                                   embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()

Pulling the RAG prompt from the LangChain hub and defining LLM

# Prompt
prompt = hub.pull("rlm/rag-prompt")
# LLM
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

Processing the retrieved docs

# Post-processing
def format_docs(docs):
   return "\n\n".join(doc.page_content for doc in docs)

Creating the RAG chain

# Chain
rag_chain = (
   {"context": retriever | format_docs, "question": RunnablePassthrough()}
   | prompt
   | llm
   | StrOutputParser()

Invoking the chain with the question

# Question
rag_chain.invoke("What is Task Decomposition?")

Output

‘Task Decomposition is a technique used to break down complex tasks into
 smaller and simpler steps. This approach helps agents to plan ahead and
 tackle difficult tasks more effectively. Task decomposition can be done
 through various methods, including using prompting techniques, task-specific
 instructions, or human inputs.’

Also Read: Find everything about LangChain Here.

2. LlamaIndex

LlamaIndex, previously known as the GPT Index, focuses on organizing and retrieving data efficiently for LLM applications. It helps developers access and use large datasets quickly.

Key Features:
- Organizes data for fast lookups.
- Customizable components for RAG workflows.
- Supports multiple data formats, including PDFs and SQL.
- Integrates with vector stores like Pinecone and FAISS.

Here’s the hands-on:

Install the following dependencies

!pip install llama-index llama-index-readers-file
!pip install llama-index-embeddings-openai
!pip install llama-index-llms-openai

Import the following dependencies and initialize the LLM and embeddings

from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
llm = OpenAI(model='gpt-4o')
embed_model = OpenAIEmbedding()
from llama_index.core import Settings
Settings.llm = llm
Settings.embed_model = embed_model

Download the data (You can replace it with your data)

!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O './uber_2021.pdf'

Read the data using SimpleDirectoryReader

from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader(input_files=["/content/uber_2021.pdf"]).load_data()

Chunking the document using TokenTextSplitter

from llama_index.core.node_parser import TokenTextSplitter
splitter = TokenTextSplitter(
   chunk_size=512,
   chunk_overlap=0,
)
nodes = splitter.get_nodes_from_documents(documents)

Storing the vector embeddings in VectorStoreIndex

from llama_index.core import VectorStoreIndex
index = VectorStoreIndex(nodes)
query_engine = index.as_query_engine(similarity_top_k=2)
Invoking the LLM using RAG
response = query_engine.query("What is the revenue of Uber in 2021?")
print(response)

Output

‘The revenue of Uber in 2021 was $171.7 million.

3. LangGraph

LangGraph connects LLMs with graph-based data structures. This framework is useful for applications that require complex data relationships.

Key Features:
- Efficiently retrieves data from graph structures.
- Combines LLMs with graph data for better context.
- Allows customization of the retrieval process.

Code

Install the following dependencies

%pip install --quiet --upgrade langchain-text-splitters langchain-community langgraph langchain-openai

Initialise the model, embeddings and Vector database

from langchain.chat_models import init_chat_model
llm = init_chat_model("gpt-4o-mini", model_provider="openai")
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
from langchain_core.vectorstores import InMemoryVectorStore
vector_store = InMemoryVectorStore(embeddings)

Import the following dependencies

import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict

Download the dataset using WebBaseLoader(replace it with your own dataset)

# Load and chunk contents of the blog
loader = WebBaseLoader(
   web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
   bs_kwargs=dict(
       parse_only=bs4.SoupStrainer(
           class_=("post-content", "post-title", "post-header")
       )
   ),
)
docs = loader.load()

Chunking of the document using RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)
# Index chunks
_ = vector_store.add_documents(documents=all_splits)

Extracting the prompt from the LangChain hub

# Define prompt for question-answering
prompt = hub.pull("rlm/rag-prompt")
Defining the State, Nodes and edges in Langgraph
Define state for application
class State(TypedDict):
   question: str
   context: List[Document]
   answer: str
# Define application steps
def retrieve(state: State):
   retrieved_docs = vector_store.similarity_search(state["question"])
   return {"context": retrieved_docs}
def generate(state: State):
   docs_content = "\n\n".join(doc.page_content for doc in state["context"])
   messages = prompt.invoke({"question": state["question"], "context": docs_content})
   response = llm.invoke(messages)
   return {"answer": response.content}

Compiling the Graph

# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

Invoking the LLM for RAG

response = graph.invoke({"question": "What is Task Decomposition?"})
print(response["answer"])

Output

Task Decomposition is the process of breaking down a complicated task into
 smaller, manageable steps. This can be achieved using techniques like Chain
 of Thought (CoT) or Tree of Thoughts, which guide models to reason step by
 step or evaluate multiple possibilities. The goal is to simplify complex
 tasks and enhance understanding of the reasoning process.

4. Haystack

Haystack is an end-to-end framework for developing applications powered by LLMs and transformer models. It excels in document search and question answering.

Key Features:
- Combines document search with LLM capabilities.
- Uses various retrieval methods for optimal results.
- Offers pre-built pipelines for quick development.
- Compatible with Elasticsearch and OpenSearch.

Here’s the hands-on:

Install the following Dependencies

!pip install haystack-ai
!pip install "datasets>=2.6.1"
!pip install "sentence-transformers>=3.0.0"
Import the VectorStore and initialise it
from haystack.document_stores.in_memory import InMemoryDocumentStore
document_store = InMemoryDocumentStore()

Loading the inbuilt dataset from the dataset library

from datasets import load_dataset
from haystack import Document
dataset = load_dataset("bilgeyucel/seven-wonders", split="train")
docs = [Document(content=doc["content"], meta=doc["meta"]) for doc in dataset]

Downloading the Embedding model (you can replace it with OpenAI embeddings also)

from haystack.components.embedders import SentenceTransformersDocumentEmbedder
doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
doc_embedder.warm_up()
docs_with_embeddings = doc_embedder.run(docs)
document_store.write_documents(docs_with_embeddings["documents"])

Storing the embeddings in VectorStore

from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
retriever = InMemoryEmbeddingRetriever(document_store)

Defining the prompt for RAG

from haystack.components.builders import ChatPromptBuilder
from haystack.dataclasses import ChatMessage
template = [
   ChatMessage.from_user(
       """
Given the following information, answer the question.
Context:
{% for document in documents %}
   {{ document.content }}
{% endfor %}
Question: {{question}}
Answer:
"""
   )
]
prompt_builder = ChatPromptBuilder(template=template)

Initializing the LLM

from haystack.components.generators.chat import OpenAIChatGenerator
chat_generator = OpenAIChatGenerator(model="gpt-4o-mini")

Defining the Pipeline nodes

from haystack import Pipeline
basic_rag_pipeline = Pipeline()
# Add components to your pipeline
basic_rag_pipeline.add_component("text_embedder", text_embedder)
basic_rag_pipeline.add_component("retriever", retriever)
basic_rag_pipeline.add_component("prompt_builder", prompt_builder)
basic_rag_pipeline.add_component("llm", chat_generator)

Connecting the nodes to each other

# Now, connect the components to each other
basic_rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
basic_rag_pipeline.connect("retriever", "prompt_builder")
basic_rag_pipeline.connect("prompt_builder.prompt", "llm.messages")

Invoking the LLM using RAG

question = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({"text_embedder": {"text": question}, "prompt_builder": {"question": question}})
print(response["llm"]["replies"][0].text)

Output

Batches: 100%

 1/1 [00:00<00:00, 17.91it/s]

‘The Colossus of Rhodes, a statue of the Greek sun-god Helios, is believed to
 have stood approximately 33 meters (108 feet) tall and was constructed with
 iron tie bars and brass plates forming its skin, filled with stone blocks.
 Although the specific details of its appearance are not definitively known,
 contemporary accounts suggest that it had curly hair with bronze or silver
 spikes radiating like flames on the head. The statue likely depicted Helios
 in a powerful, commanding pose, possibly with one hand shielding his eyes,
 similar to other representations of the sun god from the time. Overall, it
 was designed to project strength and radiance, celebrating Rhodes' victory
 over its enemies.’

5. RAGFlow

RAGFlow focuses on integrating retrieval and generation processes. It streamlines the development of RAG applications.

Key Features:
- Simplifies the connection between retrieval and generation.
- Allows for tailored workflows to meet project needs.
- Integrates easily with various databases and document formats.

Here’s the hands-on:

Then Click on Create Knowledge Base

Then Go to Model Providers and select the LLM model that you want to use, We are using Groq here and paste its API key.

Then Go to System Model settings and select the chat model from there.

Now go to datasets and upload the pdf you want, then click on the Play button near the Parsing status column and wait for the pdf to get parsed.

Now go to the chat section create an assistant there, Give it a name and also select the knowledge base that you created.

Then create a new chat and ask the question it will perform RAG over your knowledge base and answer accordingly.

Conclusion

RAG has become an important technology for custom enterprise datasets in recent times, hence the need for RAG frameworks has increased drastically. Frameworks like LangChain, LlamaIndex, LangGraph, Haystack, and RAGFlow represent significant advancements in AI applications. By using these frameworks, developers can create systems that provide accurate and relevant information. As AI continues to evolve, these tools will play an important role in shaping intelligent applications.

Harsh Mishra

Harsh Mishra is an AI/ML Engineer who spends more time talking to Large Language Models than actual humans. Passionate about GenAI, NLP, and making machines smarter (so they don’t replace him just yet). When not optimizing models, he’s probably optimizing his coffee intake. 🚀☕

Beginner Generative AI RAG

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Top 5 RAG Frameworks for AI Applications

Table of contents

1. LangChain

Install the following libraries

Set up OpenAI API key and os environment

Import the following dependencies

Chunking the document using RecursiveCharacterTextSplitter

Storing the vector documents in ChromaDB

Pulling the RAG prompt from the LangChain hub and defining LLM

Processing the retrieved docs

Creating the RAG chain

Invoking the chain with the question

Output

2. LlamaIndex

Install the following dependencies

Import the following dependencies and initialize the LLM and embeddings

Download the data (You can replace it with your data)

Read the data using SimpleDirectoryReader

Chunking the document using TokenTextSplitter

Storing the vector embeddings in VectorStoreIndex

Output

3. LangGraph

Code

Install the following dependencies

Initialise the model, embeddings and Vector database

Import the following dependencies

Download the dataset using WebBaseLoader(replace it with your own dataset)

Chunking of the document using RecursiveCharacterTextSplitter

Extracting the prompt from the LangChain hub

Compiling the Graph

Invoking the LLM for RAG

Output

4. Haystack

Install the following Dependencies

Loading the inbuilt dataset from the dataset library

Downloading the Embedding model (you can replace it with OpenAI embeddings also)

Storing the embeddings in VectorStore

Defining the prompt for RAG

Initializing the LLM

Defining the Pipeline nodes

Connecting the nodes to each other

Invoking the LLM using RAG

Output

5. RAGFlow

Conclusion

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk