LightRAG: Simple and Fast Alternative to GraphRAG

Nibedita Dutta Last Updated : 22 Jan, 2025
11 min read

As Large Language Models continue to evolve at a fast pace, enhancing their ability to leverage external knowledge has become a major challenge. Retrieval-Augmented Generation techniques improve model output by integrating relevant information during generation, but traditional RAG systems can be complex and resource-heavy. To address this, the HKU Data Science Lab has developed LightRAG, a more efficient alternative. LightRAG combines the power of knowledge graphs with vector retrieval, enabling it to process textual information effectively while preserving the structured relationships between data.

Learning Objectives

  • Understand the limitations of traditional Retrieval-Augmented Generation (RAG) systems and the need for LightRAG.
  • Learn the architecture of LightRAG, including its dual-level retrieval mechanism and graph-based text indexing.
  • Explore how LightRAG integrates graph structures with vector embeddings for efficient and context-rich information retrieval.
  • Compare the performance of LightRAG against GraphRAG through benchmarks across various domains.

This article was published as a part of the Data Science Blogathon.

Why LightRAG Over Traditional RAG Systems?

Current RAG systems face significant challenges that limit their effectiveness. One major issue is that many rely on simple, flat data representations, which restrict their ability to comprehend and retrieve information based on the complex relationships between entities. Another key drawback is the lack of contextual understanding, making it difficult for these systems to maintain coherence across different entities and their connections. This often leads to responses that fail to fully address user queries.

Traditional RAG suffers in Integration of Information

For instance, if a user asks, “How does the rise of electric vehicles affect urban air quality and public transportation infrastructure?”, existing RAG systems might retrieve individual documents on electric vehicles, air pollution, and public transportation, but they may struggle to integrate this information into a unified answer. These systems could fail to explain how electric vehicles can improve air quality, which in turn influences the planning of public transportation systems. As a result, users may receive fragmented and incomplete answers that overlook the complex relationships between these topics.

How LightRAG Works?

LightRAG revolutionizes information retrieval by leveraging graph-based indexing and dual-level retrieval mechanisms. These innovations enable it to handle complex queries efficiently while preserving the relationships between entities for context-rich responses.

How LightRAG Works?
Source: LightRAG

Graph-based Text Indexing

Graph-based Text Indexing
Source: LightRAG
  • Chunking: Your documents are segmented into smaller, more manageable pieces
  • Entity Recognition: LLMs are leveraged to identify and extract various entities (e.g., names, dates, locations, and events) along with the relationships between them.
  • Knowledge Graph Construction: The information collected through the previous process is used to create a comprehensive knowledge graph that highlights the connections and insights across the entire collection of documents Any duplicate nodes or redundant relationships are removed to optimize the graph.
  • Embedding Storage: The descriptions and relationships are embedded into vectors and stored in a vector database

Dual-Level Retrieval

Dual-Level Retrieval
Source: LightRAG

Since queries are usually of two types: either very specific or abstract in nature, LightRAG employs a dual leveral retrieval mechanism to handle these both.

  • Low-Level Retrieval: This stage concentrates on identifying particular entities and their relevant attributes or connections. Queries at this level are focused on obtaining detailed, specific data related to individual nodes or edges within the graph.
  • High-Level Retrieval: This level deals with broader subjects and general concepts. Queries here seek to gather information that spans multiple related entities and their connections, offering a comprehensive overview or summary of higher-level themes rather than specific facts or details.

How is LightRAG Different from GraphRAG?

High Token Consumption and Large Number of API calls To LLM. In the retrieval phase, GraphRAG generates a large number of communities, with many of them communities actively utilized for retrieval during a query processing. Each community report averages a very high number of tokens, resulting in a extremely high total token consumption. Additionally, GraphRAG’s requirement to traverse each community individually leads to hundreds of API calls, significantly increasing retrieval overhead.

LightRAG ,for each query, utilizes the LLM to generate relevant keywords. Similar to current Retrieval-Augmented Generation (RAG) systems, the LightRAG retrieval mechanism relies on vector-based search. However, instead of retrieving chunks as in conventional RAG, retrieval of entities and relationships are carried out. This approach leads to way less retrieval overhead as compared to the community-based traversal method used in GraphRAG.

Performance Benchmarks of LightRAG

In order to evaluate LightRAG’s performance against traditional RAG frameworks, a robust LLM, specifically GPT-4o-mini, was used to rank each baseline against LightRAG. In total, the following four evaluation dimensions were utilized –

  • Comprehensiveness: How thoroughly does the answer address all aspects and details of the question?
  • Diversity: How varied and rich is the answer in offering different perspectives and insights related to the question?
  • Empowerment: How effectively does the answer enable the reader to understand the topic and make informed judgments?
  • Overall: This dimension assesses the cumulative performance across the three preceding criteria to identify the best overall answer.

The LLM directly compares two answers for each dimension and selects the superior response for each criterion. After identifying the winning answer for the three dimensions, the LLM combines the results to determine the overall better answer. Win rates are calculated accordingly, ultimately leading to the final results.

LightRAG table
Source: LightRAG

As seen from the Table above, 4 domains were specifically used to evaluate: Agricultural, Computer Science, Legal and Mixed Domain. In Mixed Domain, a rich variety of literary, biographical, and philosophical texts, spanning a broad spectrum of disciplines, including cultural, historical, and philosophical studies were used.

  • When dealing with large volumes of tokens and intricate queries that require a deep understanding of the dataset’s context, graph-based retrieval models like LightRAG and GraphRAG consistently outperform simpler, chunk-based approaches such as NaiveRAG, HyDE, and RQRAG.
  • In comparison to various baseline models, LightRAG excels in the Diversity metric, particularly on the larger Legal dataset. Its consistent superiority in this area highlights LightRAG’s ability to generate a broader array of responses, making it especially valuable when diverse outputs are needed. This advantage may stem from LightRAG’s dual-level retrieval approach.

Hands On Python Implementation on Google Colab Using Open AI Model

Below we will follow few steps on google colab using Open AI model:

Step 1: Install Necessary Libraries

Install the required libraries, including LightRAG, vector database tools, and Ollama, to set up the environment for implementation.

!pip install lightrag-hku
!pip install aioboto3
!pip install tiktoken
!pip install nano_vectordb

#Install Ollama
!sudo apt update
!sudo apt install -y pciutils
!pip install langchain-ollama
!curl -fsSL https://ollama.com/install.sh | sh
!pip install ollama==0.4.2

Step 2: Import Necessary Libraries and Define Open AI Key

Import essential libraries, define the OPENAI_API_KEY, and prepare the setup for querying using OpenAI’s models.

from lightrag import LightRAG, QueryParam
from lightrag.llm import gpt_4o_mini_complete, gpt_4o_complete
import os
os.environ['OPENAI_API_KEY'] =''

Step 3: Calling The Tool and Loading the Data

Initialize LightRAG, define the working directory, and load data into the model using a sample text file for processing.

import nest_asyncio
nest_asyncio.apply()

WORKING_DIR = "./content"


if not os.path.exists(WORKING_DIR):
    os.mkdir(WORKING_DIR)

rag = LightRAG(
    working_dir=WORKING_DIR,
    llm_model_func=gpt_4o_mini_complete  # Use gpt_4o_mini_complete LLM model
    # llm_model_func=gpt_4o_complete  # Optionally, use a stronger model
)

#Insert Data
with open("./Coffe.txt") as f:
    rag.insert(f.read())

The use of nest_asyncio is particularly beneficial in environments where we need to run asynchronous code without conflicts due to existing event loops. Since we need to insert our data (rag.insert()) which is another event loop, we use nest_asyncio .

We use this txt file: https://github.com/mimiwb007/LightRAG/blob/main/Coffee.txt for querying. It can be downloaded from Git and then uploaded in the working directory of Colab.

Step 4: Querying on Specific Question

Use hybrid or naive modes to query the dataset for specific questions, showcasing LightRAG’s ability to retrieve detailed and relevant answers.

Hybrid Mode

print(rag.query("Which section of Indian Society is Coffee getting traction in?", param=QueryParam(mode="hybrid")))

Output


{
"high_level_keywords": ["Indian society", "Coffee consumption", "Cultural trends"],
"low_level_keywords": ["Urban areas", "Millennials", "Coffee shops", "Specialty
coffee", "Consumer behavior"]}
## Growing Popularity of Coffee in Indian Society
Coffee consumption in India is witnessing a notable rise, particularly among
specific demographics that reflect broader societal changes. Here are the key
sections of Indian society where coffee is gaining traction: ### Younger Generations
One significant demographic contributing to the growing popularity of coffee is the
younger generation, particularly individuals aged between 20 to 40 years. With
approximately **56% of Indians** showing increased interest in coffee,
### Women
Women are playing a vital role in driving the increasing consumption of coffee. This
segment of the population has shown a marked interest in coffee as part of their
daily routines and socializing habits, reflecting changing attitude
### Affluent Backgrounds
Individuals from affluent backgrounds are also becoming more engaged with coffee.
Their increased disposable income allows them to explore different coffee
experiences, contributing to the rise of premium coffee consumption and the d
###Lower-Tier Cities
Interestingly, coffee is also making strides in lower-tier cities in India. As
cultural and social trends evolve, people in these regions are increasingly
embracing coffee, marking a shift in beverage preferences that were traditional
###Southern States
Southern states like **Karnataka**, **Kerala**, and **Tamil Nadu** are particularly
significant in the coffee landscape. These regions not only lead in coffee
production but also reflect a growing coffee culture among their residents
## Conclusion
The rise of coffee in India underscores a significant cultural shift, with younger
consumers, women, and individuals from affluent backgrounds spearheading its
popularity. Additionally, the engagement of lower-tier cities points to a

As we can see from the output above, both high level keywords and low level keywords are matched with the keywords in the query when we choose the mode as hybrid.

We can see that the output has covered all relevant points to our query addressing the response under different sections as well what are very relevant like “Younger Generations”, “Women”, “Affluent Backgrounds” etc.

Naive Mode

print(rag.query("Which section of Indian Society is Coffee getting traction in?", param=QueryParam(mode="naive")))

Output


Coffee is gaining significant traction primarily among the younger generations in
Indian society, particularly individuals aged 20 to 40. This demographic shift
indicates a growing acceptance and preference for coffee, which can be at Moreover,
southern states, including Karnataka, Kerala, and Tamil Nadu-which are also the main
coffee-producing regions-are leading the charge in this growing popularity of
coffee. The shift toward coffee as a social beverage is infl Overall, while tea
remains the dominant beverage in India, the ongoing cultural changes and the
evolving tastes of the younger population suggest a robust potential for coffee
consumption to expand further in this segment of society.

As we can see from the output above, high level keywords and low level keywords are NOT PRESENT HERE when we choose the mode as naive.

Also, We can see that the output is in a summarized form in 2-3 lines unlike the output from Hybrid Mode which had covered the response under different sections.

Step 5: Querying on a Broad Level Question

Demonstrate LightRAG’s capability to summarize entire datasets by querying broader topics using hybrid and naive modes.

Hybrid Mode

print(rag.query("Summarize content of the article", param=QueryParam(mode="hybrid")))

Output


{
"high_level_keywords": ["Article", "Content summary"],
"low_level_keywords": ["Key points", "Main ideas", "Themes", "Conclusions"]
}
# Summary of Coffee Consumption Trends in India
Coffee consumption in India is rising, particularly among the younger generations,
which is a notable shift influenced by changing demographics and lifestyle
preferences. Approximately 56% of Indians are embracing coffee, with a dist:
## Growing Popularity and Cultural Influence
The influence of Western culture is a significant factor in this rising trend.
Through media and lifestyle changes, coffee has become synonymous with modern
socializing for young adults aged 20 to 40. As a result, coffee has establis

## Market Growth and Consumption Statistics
The coffee market in India witnessed significant growth, with consumption reaching
approximately 1.23 million bags (each weighing 60 kilograms) in the financial year
2022-2023. There is an optimistic outlook for the market, projectin
## Coffee Production and Export Trends
India stands as the sixth-largest coffee producer globally, with Karnataka
contributing about 70% of the total output. In 2023, the country produced over
393,000 metric tons of coffee. While India is responsible for about 80% of its

## Challenges and Opportunities
Despite the positive growth trajectory, coffee consumption faces certain challenges,
primarily regarding perceptions of being expensive and unhealthy among non-
consumers; tea continues to be the dominant beverage choice for many. How In
conclusion, the landscape of coffee consumption in India is undergoing rapid
evolution, driven by demographic shifts and cultural adaptations. With promising
growth potential and emerging niche segments, the future of coffee in In

As we can see from the output above, both high level keywords and low level keywords are matched with the keywords in the query when we choose the mode as hybrid.

We can see that the output has covered all relevant points to our query addressing the response under different sections as well with all the sections like “Growing Popularity & Cultural Influence”, “Market Growth & Consumption Statistics” which are relevant for summarization of the article.

Naive Mode

print(rag.query("Summarize content of the article", param=QueryParam(mode="naive")))

Output


# Summary of Coffee Consumption in India
India is witnessing a notable rise in coffee consumption, fueled by demographic
shifts and changing lifestyle preferences, especially among younger generations.
This trend is primarily seen in women and youthful urbanites, and is part
## Growing Popularity
Approximately **56% of Indians** are embracing coffee, influenced by Western culture
and media, which have made it a popular beverage for social interactions among
those aged 20 to 40. This cultural integration points towards a shift
## Market Growth
In the financial year 2022-2023, coffee consumption in India surged to around **1.23
million bags**. The market forecasts a robust growth trajectory, estimating a
**9.87% CAGR** from 2023 to 2032. This growth is particularly evident
## Coffee Production
India ranks as the **sixth-largest producer** of coffee globally, with Karnataka
responsible for **70%** of the national output, totaling **393,000 metric tons** of
coffee produced in 2023. Although a significant portion (about 80%)
## Challenges and Opportunities
Despite the growth trajectory, coffee faces challenges, including perceptions of
being costly and unhealthy, which may deter non-consumers. Tea continues to hold a
dominant position in the beverage preference of many. However, the exit
## Conclusion
In conclusion, India's coffee consumption landscape is rapidly changing, driven by
demographic and cultural shifts. The growth potential is significant, particularly
within the specialty coffee sector, even as traditional tea drinking

As we can see from the output above, high level keywords and low level keywords are NOT PRESENT HERE when we choose the mode as naive.

However considering this is a summary query, we can see that the output is in a summarized form and covers the response under relevant sections like that seen in the “Hybrid” mode.

Conclusion

LightRAG offers a substantial improvement over traditional RAG systems by addressing key limitations such as inadequate contextual understanding and poor integration of information. Traditional systems often struggle with complex, multi-dimensional queries, resulting in fragmented or incomplete responses. In contrast, LightRAG’s graph-based text indexing and dual-level retrieval mechanisms enable it to better understand and retrieve information from intricate, interrelated entities and concepts. This results in more comprehensive, diverse, and empowering answers to complex queries.

Performance benchmarks demonstrate LightRAG’s superiority in terms of comprehensiveness, diversity, and overall answer quality, solidifying its position as a more effective solution for nuanced information retrieval. Through its integration of knowledge graphs and vector embeddings, LightRAG provides a sophisticated approach to understanding and answering complex questions, making it a significant advancement in the field of RAG systems.

Key Takeaways

  • Traditional RAG systems struggle to integrate complex, interconnected information across multiple entities. LightRAG overcomes this by using graph-based text indexing, enabling the system to comprehend and retrieve data based on the relationships between entities, leading to more coherent and complete answers.
  • LightRAG introduces a dual-level retrieval system that handles both specific and abstract queries. This allows for precise extraction of detailed data at a low level, and comprehensive insights at a high level, offering a more adaptable and accurate approach to diverse user queries.
  • LightRAG utilizes entity recognition and knowledge graph construction to map out relationships and connections across documents. This method optimizes the retrieval process, ensuring that the system accesses relevant, interlinked information rather than isolated, disconnected data points.
  • By combining graph structures with vector embeddings, LightRAG improves its contextual understanding of queries, allowing it to retrieve and integrate information more effectively. This ensures that responses are more contextually rich, addressing the nuanced relationships between entities and their attributes.

Frequently Asked Questions

Q1. What is LightRAG, and how does it differ from traditional RAG systems?

A. LightRAG is an advanced retrieval-augmented generation (RAG) system that overcomes the limitations of traditional RAG systems by utilizing graph-based text indexing and dual-level retrieval mechanisms. Unlike traditional RAG systems, which often struggle with understanding complex relationships between entities, LightRAG effectively integrates interconnected information, providing more comprehensive and contextually accurate responses.

Q2. How does LightRAG handle complex queries involving multiple topics?

A. LightRAG excels at handling complex queries by leveraging its knowledge graph construction and dual-level retrieval approach. It breaks down documents into smaller, manageable chunks, identifies key entities, and understands the relationships between them. It then retrieves both specific details at a low level and broader conceptual information at a high level, ensuring that responses address the entire scope of complex queries.

Q3. What are the key features of LightRAG that improve its performance?

A. The key features of LightRAG include graph-based text indexing, entity recognition, knowledge graph construction, and dual-level retrieval. These features allow LightRAG to understand and integrate complex relationships between entities, retrieve relevant data efficiently, and provide answers that are more comprehensive, diverse, and insightful compared to traditional RAG systems.

Q4. How does LightRAG improve the coherence and relevance of its responses?

A. LightRAG improves the coherence and relevance of its responses by combining graph structures with vector embeddings. This integration allows the system to capture the contextual relationships between entities, ensuring that the information retrieved is interconnected and contextually appropriate, leading to more coherent and relevant answers.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Nibedita completed her master’s in Chemical Engineering from IIT Kharagpur in 2014 and is currently working as a Senior Data Scientist. In her current capacity, she works on building intelligent ML-based solutions to improve business processes.

Responses From Readers

Clear

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details