As Large Language Models continue to evolve at a fast pace, enhancing their ability to leverage external knowledge has become a major challenge. Retrieval-Augmented Generation techniques improve model output by integrating relevant information during generation, but traditional RAG systems can be complex and resource-heavy. To address this, the HKU Data Science Lab has developed LightRAG, a more efficient alternative. LightRAG combines the power of knowledge graphs with vector retrieval, enabling it to process textual information effectively while preserving the structured relationships between data.
This article was published as a part of the Data Science Blogathon.
Current RAG systems face significant challenges that limit their effectiveness. One major issue is that many rely on simple, flat data representations, which restrict their ability to comprehend and retrieve information based on the complex relationships between entities. Another key drawback is the lack of contextual understanding, making it difficult for these systems to maintain coherence across different entities and their connections. This often leads to responses that fail to fully address user queries.
For instance, if a user asks, “How does the rise of electric vehicles affect urban air quality and public transportation infrastructure?”, existing RAG systems might retrieve individual documents on electric vehicles, air pollution, and public transportation, but they may struggle to integrate this information into a unified answer. These systems could fail to explain how electric vehicles can improve air quality, which in turn influences the planning of public transportation systems. As a result, users may receive fragmented and incomplete answers that overlook the complex relationships between these topics.
LightRAG revolutionizes information retrieval by leveraging graph-based indexing and dual-level retrieval mechanisms. These innovations enable it to handle complex queries efficiently while preserving the relationships between entities for context-rich responses.
Since queries are usually of two types: either very specific or abstract in nature, LightRAG employs a dual leveral retrieval mechanism to handle these both.
High Token Consumption and Large Number of API calls To LLM. In the retrieval phase, GraphRAG generates a large number of communities, with many of them communities actively utilized for retrieval during a query processing. Each community report averages a very high number of tokens, resulting in a extremely high total token consumption. Additionally, GraphRAG’s requirement to traverse each community individually leads to hundreds of API calls, significantly increasing retrieval overhead.
LightRAG ,for each query, utilizes the LLM to generate relevant keywords. Similar to current Retrieval-Augmented Generation (RAG) systems, the LightRAG retrieval mechanism relies on vector-based search. However, instead of retrieving chunks as in conventional RAG, retrieval of entities and relationships are carried out. This approach leads to way less retrieval overhead as compared to the community-based traversal method used in GraphRAG.
In order to evaluate LightRAG’s performance against traditional RAG frameworks, a robust LLM, specifically GPT-4o-mini, was used to rank each baseline against LightRAG. In total, the following four evaluation dimensions were utilized –
The LLM directly compares two answers for each dimension and selects the superior response for each criterion. After identifying the winning answer for the three dimensions, the LLM combines the results to determine the overall better answer. Win rates are calculated accordingly, ultimately leading to the final results.
As seen from the Table above, 4 domains were specifically used to evaluate: Agricultural, Computer Science, Legal and Mixed Domain. In Mixed Domain, a rich variety of literary, biographical, and philosophical texts, spanning a broad spectrum of disciplines, including cultural, historical, and philosophical studies were used.
Below we will follow few steps on google colab using Open AI model:
Install the required libraries, including LightRAG, vector database tools, and Ollama, to set up the environment for implementation.
!pip install lightrag-hku
!pip install aioboto3
!pip install tiktoken
!pip install nano_vectordb
#Install Ollama
!sudo apt update
!sudo apt install -y pciutils
!pip install langchain-ollama
!curl -fsSL https://ollama.com/install.sh | sh
!pip install ollama==0.4.2
Import essential libraries, define the OPENAI_API_KEY
, and prepare the setup for querying using OpenAI’s models.
from lightrag import LightRAG, QueryParam
from lightrag.llm import gpt_4o_mini_complete, gpt_4o_complete
import os
os.environ['OPENAI_API_KEY'] =''
Initialize LightRAG, define the working directory, and load data into the model using a sample text file for processing.
import nest_asyncio
nest_asyncio.apply()
WORKING_DIR = "./content"
if not os.path.exists(WORKING_DIR):
os.mkdir(WORKING_DIR)
rag = LightRAG(
working_dir=WORKING_DIR,
llm_model_func=gpt_4o_mini_complete # Use gpt_4o_mini_complete LLM model
# llm_model_func=gpt_4o_complete # Optionally, use a stronger model
)
#Insert Data
with open("./Coffe.txt") as f:
rag.insert(f.read())
The use of nest_asyncio is particularly beneficial in environments where we need to run asynchronous code without conflicts due to existing event loops. Since we need to insert our data (rag.insert()) which is another event loop, we use nest_asyncio .
We use this txt file: https://github.com/mimiwb007/LightRAG/blob/main/Coffee.txt for querying. It can be downloaded from Git and then uploaded in the working directory of Colab.
Use hybrid or naive modes to query the dataset for specific questions, showcasing LightRAG’s ability to retrieve detailed and relevant answers.
print(rag.query("Which section of Indian Society is Coffee getting traction in?", param=QueryParam(mode="hybrid")))
{
"high_level_keywords": ["Indian society", "Coffee consumption", "Cultural trends"],
"low_level_keywords": ["Urban areas", "Millennials", "Coffee shops", "Specialty
coffee", "Consumer behavior"]}
## Growing Popularity of Coffee in Indian Society
Coffee consumption in India is witnessing a notable rise, particularly among
specific demographics that reflect broader societal changes. Here are the key
sections of Indian society where coffee is gaining traction: ### Younger Generations
One significant demographic contributing to the growing popularity of coffee is the
younger generation, particularly individuals aged between 20 to 40 years. With
approximately **56% of Indians** showing increased interest in coffee,
### Women
Women are playing a vital role in driving the increasing consumption of coffee. This
segment of the population has shown a marked interest in coffee as part of their
daily routines and socializing habits, reflecting changing attitude
### Affluent Backgrounds
Individuals from affluent backgrounds are also becoming more engaged with coffee.
Their increased disposable income allows them to explore different coffee
experiences, contributing to the rise of premium coffee consumption and the d
###Lower-Tier Cities
Interestingly, coffee is also making strides in lower-tier cities in India. As
cultural and social trends evolve, people in these regions are increasingly
embracing coffee, marking a shift in beverage preferences that were traditional
###Southern States
Southern states like **Karnataka**, **Kerala**, and **Tamil Nadu** are particularly
significant in the coffee landscape. These regions not only lead in coffee
production but also reflect a growing coffee culture among their residents
## Conclusion
The rise of coffee in India underscores a significant cultural shift, with younger
consumers, women, and individuals from affluent backgrounds spearheading its
popularity. Additionally, the engagement of lower-tier cities points to a
As we can see from the output above, both high level keywords and low level keywords are matched with the keywords in the query when we choose the mode as hybrid.
We can see that the output has covered all relevant points to our query addressing the response under different sections as well what are very relevant like “Younger Generations”, “Women”, “Affluent Backgrounds” etc.
print(rag.query("Which section of Indian Society is Coffee getting traction in?", param=QueryParam(mode="naive")))
Output
Coffee is gaining significant traction primarily among the younger generations in
Indian society, particularly individuals aged 20 to 40. This demographic shift
indicates a growing acceptance and preference for coffee, which can be at Moreover,
southern states, including Karnataka, Kerala, and Tamil Nadu-which are also the main
coffee-producing regions-are leading the charge in this growing popularity of
coffee. The shift toward coffee as a social beverage is infl Overall, while tea
remains the dominant beverage in India, the ongoing cultural changes and the
evolving tastes of the younger population suggest a robust potential for coffee
consumption to expand further in this segment of society.
As we can see from the output above, high level keywords and low level keywords are NOT PRESENT HERE when we choose the mode as naive.
Also, We can see that the output is in a summarized form in 2-3 lines unlike the output from Hybrid Mode which had covered the response under different sections.
Demonstrate LightRAG’s capability to summarize entire datasets by querying broader topics using hybrid and naive modes.
Hybrid Mode
print(rag.query("Summarize content of the article", param=QueryParam(mode="hybrid")))
Output
{
"high_level_keywords": ["Article", "Content summary"],
"low_level_keywords": ["Key points", "Main ideas", "Themes", "Conclusions"]
}
# Summary of Coffee Consumption Trends in India
Coffee consumption in India is rising, particularly among the younger generations,
which is a notable shift influenced by changing demographics and lifestyle
preferences. Approximately 56% of Indians are embracing coffee, with a dist:
## Growing Popularity and Cultural Influence
The influence of Western culture is a significant factor in this rising trend.
Through media and lifestyle changes, coffee has become synonymous with modern
socializing for young adults aged 20 to 40. As a result, coffee has establis
## Market Growth and Consumption Statistics
The coffee market in India witnessed significant growth, with consumption reaching
approximately 1.23 million bags (each weighing 60 kilograms) in the financial year
2022-2023. There is an optimistic outlook for the market, projectin
## Coffee Production and Export Trends
India stands as the sixth-largest coffee producer globally, with Karnataka
contributing about 70% of the total output. In 2023, the country produced over
393,000 metric tons of coffee. While India is responsible for about 80% of its
## Challenges and Opportunities
Despite the positive growth trajectory, coffee consumption faces certain challenges,
primarily regarding perceptions of being expensive and unhealthy among non-
consumers; tea continues to be the dominant beverage choice for many. How In
conclusion, the landscape of coffee consumption in India is undergoing rapid
evolution, driven by demographic shifts and cultural adaptations. With promising
growth potential and emerging niche segments, the future of coffee in In
As we can see from the output above, both high level keywords and low level keywords are matched with the keywords in the query when we choose the mode as hybrid.
We can see that the output has covered all relevant points to our query addressing the response under different sections as well with all the sections like “Growing Popularity & Cultural Influence”, “Market Growth & Consumption Statistics” which are relevant for summarization of the article.
print(rag.query("Summarize content of the article", param=QueryParam(mode="naive")))
Output
# Summary of Coffee Consumption in India
India is witnessing a notable rise in coffee consumption, fueled by demographic
shifts and changing lifestyle preferences, especially among younger generations.
This trend is primarily seen in women and youthful urbanites, and is part
## Growing Popularity
Approximately **56% of Indians** are embracing coffee, influenced by Western culture
and media, which have made it a popular beverage for social interactions among
those aged 20 to 40. This cultural integration points towards a shift
## Market Growth
In the financial year 2022-2023, coffee consumption in India surged to around **1.23
million bags**. The market forecasts a robust growth trajectory, estimating a
**9.87% CAGR** from 2023 to 2032. This growth is particularly evident
## Coffee Production
India ranks as the **sixth-largest producer** of coffee globally, with Karnataka
responsible for **70%** of the national output, totaling **393,000 metric tons** of
coffee produced in 2023. Although a significant portion (about 80%)
## Challenges and Opportunities
Despite the growth trajectory, coffee faces challenges, including perceptions of
being costly and unhealthy, which may deter non-consumers. Tea continues to hold a
dominant position in the beverage preference of many. However, the exit
## Conclusion
In conclusion, India's coffee consumption landscape is rapidly changing, driven by
demographic and cultural shifts. The growth potential is significant, particularly
within the specialty coffee sector, even as traditional tea drinking
As we can see from the output above, high level keywords and low level keywords are NOT PRESENT HERE when we choose the mode as naive.
However considering this is a summary query, we can see that the output is in a summarized form and covers the response under relevant sections like that seen in the “Hybrid” mode.
LightRAG offers a substantial improvement over traditional RAG systems by addressing key limitations such as inadequate contextual understanding and poor integration of information. Traditional systems often struggle with complex, multi-dimensional queries, resulting in fragmented or incomplete responses. In contrast, LightRAG’s graph-based text indexing and dual-level retrieval mechanisms enable it to better understand and retrieve information from intricate, interrelated entities and concepts. This results in more comprehensive, diverse, and empowering answers to complex queries.
Performance benchmarks demonstrate LightRAG’s superiority in terms of comprehensiveness, diversity, and overall answer quality, solidifying its position as a more effective solution for nuanced information retrieval. Through its integration of knowledge graphs and vector embeddings, LightRAG provides a sophisticated approach to understanding and answering complex questions, making it a significant advancement in the field of RAG systems.
A. LightRAG is an advanced retrieval-augmented generation (RAG) system that overcomes the limitations of traditional RAG systems by utilizing graph-based text indexing and dual-level retrieval mechanisms. Unlike traditional RAG systems, which often struggle with understanding complex relationships between entities, LightRAG effectively integrates interconnected information, providing more comprehensive and contextually accurate responses.
A. LightRAG excels at handling complex queries by leveraging its knowledge graph construction and dual-level retrieval approach. It breaks down documents into smaller, manageable chunks, identifies key entities, and understands the relationships between them. It then retrieves both specific details at a low level and broader conceptual information at a high level, ensuring that responses address the entire scope of complex queries.
A. The key features of LightRAG include graph-based text indexing, entity recognition, knowledge graph construction, and dual-level retrieval. These features allow LightRAG to understand and integrate complex relationships between entities, retrieve relevant data efficiently, and provide answers that are more comprehensive, diverse, and insightful compared to traditional RAG systems.
A. LightRAG improves the coherence and relevance of its responses by combining graph structures with vector embeddings. This integration allows the system to capture the contextual relationships between entities, ensuring that the information retrieved is interconnected and contextually appropriate, leading to more coherent and relevant answers.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.