The new langchain-kuzu integration package is now available on PyPI! This package bridges the powerful capabilities of LangChain with Kùzu’s cutting-edge graph database, enabling seamless transformation of unstructured text into structured graphs. Whether you’re a data scientist, developer, or AI enthusiast, this integration simplifies complex tasks like entity extraction, graph creation, and natural language querying. Let’s explore what makes this package a game-changer for your data workflows.
This article was published as a part of the Data Science Blogathon.
To get started, simply install the package on Google Colab:
pip install -U langchain-kuzu langchain-openai langchain-experimental
This installation includes dependencies for LangChain and Kùzu, along with support for LLMs like OpenAI’s GPT models. If you prefer other LLM providers, you can install their respective Python packages supported by LangChain.
If you work with unstructured text data and want to create graph-based representations, this package is designed for you.
Key features include:
Let’s walk through a practical example to see this integration in action.
First create a Kùzu database on your local machine and connect to it:
import kuzu
db = kuzu.Database("test_db")
conn = kuzu.Connection(db)
Kùzu’s integration with LangChain makes it convenient to create and update graphs from unstructured text, and also to query graphs via a Text2Cypher pipeline that utilizes the power of LangChain’s LLM chains. To begin, we create a KuzuGraph object that uses the database object we created above in combination with the KuzuGraph constructor.
from langchain_kuzu.graphs.kuzu_graph import KuzuGraph
graph = KuzuGraph(db, allow_dangerous_requests=True)
Imagine we want to transform the following text into a graph:
text = "Tim Cook is the CEO of Apple. Apple has its headquarters in California."
First, define the types of entities (nodes) and relationships you want to include.
# Define schema
allowed_nodes = ["Person", "Company", "Location"]
allowed_relationships = [
("Person", "IS_CEO_OF", "Company"),
("Company", "HAS_HEADQUARTERS_IN", "Location"),
]
Use the LLMGraphTransformer class to process the text into structured graph documents:
from langchain_core.documents import Document
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_openai import ChatOpenAI
# Define the LLMGraphTransformer
llm_transformer = LLMGraphTransformer(
llm=ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key='OPENAI_API_KEY'), # noqa: F821
allowed_nodes=allowed_nodes,
allowed_relationships=allowed_relationships,
)
documents = [Document(page_content=text)]
graph_documents = llm_transformer.convert_to_graph_documents(documents)
Load the graph documents into Kùzu for further use:
from langchain_kuzu.graphs.kuzu_graph import KuzuGraph
graph = KuzuGraph(db)
graph.add_graph_documents(graph_documents, include_source=True, allow_dangerous_requests= True)
graph_documents[:2]
Note: In KuzuGraph method, set ‘allow_dangerous_requests’ parameter to True if you get an error.
Output:
[GraphDocument(nodes=[Node(id='Tim Cook', type='Person', properties={}),
Node(id='Apple', type='Company', properties={}), Node(id='California', \
type='Location', properties={})], relationships=[Relationship(source=Node(id='Tim
Cook', type='Person', properties={}), target=Node(id='Apple', type='Company',
properties={}), type='IS_CEO_OF', properties={}),
Relationship(source=Node(id='Apple', type='Company', properties={}),
target=Node(id='California', type='Location', properties={}),
type='HAS_HEADQUARTERS_IN', properties={})], source=Document(metadata={},
page_content='Tim Cook is the CEO of Apple. Apple has its headquarters in
California.'))]
With the KuzuQAChain, you can query the graph using natural language:
# Add the graph document to the graph
graph.add_graph_documents(
graph_documents,
include_source=True,
)
from langchain_kuzu.chains.graph_qa.kuzu import KuzuQAChain
# Create the KuzuQAChain with verbosity enabled to see the generated Cypher queries
chain = KuzuQAChain.from_llm(
llm=ChatOpenAI(model="gpt-4o-mini", temperature=0.3, api_key='OPENAI_API_KEY'), # noqa: F821
graph=graph,
verbose=True,
allow_dangerous_requests=True,
)
chain.invoke("Where is Apple headquartered?")
Output:
> Entering new KuzuQAChain chain...
Generated Cypher:
MATCH (c:Company {id: 'Apple'})-[:HAS_HEADQUARTERS_IN]->(l:Location) RETURN l
Full Context:
[{'l': {'_id': {'offset': 0, 'table': 1}, '_label': 'Location', 'id': 'California', 'type': 'entity'}}]
> Finished chain.
{'query': 'Where is Apple headquartered?',
'result': 'Apple is headquartered in California.'}
The LangChain-Kùzu integration offers several advanced features to enhance your workflows:
Kùzu is a high-performance, embeddable graph database built for modern applications. Key highlights include:
Explore more in the Kùzu documentation.
To begin your journey:
Visit the PyPI page for more detailed examples and updates. Don’t forget to star our repository on GitHub and share your feedback—your input drives our progress!
The langchain-kuzu integration redefines how you interact with unstructured data. Whether it’s transforming text into structured graphs or querying those graphs with natural language, this package unlocks powerful possibilities for AI-driven data insights. Try it today and discover a more intuitive way to work with graph data!
A. Simply run the command pip install langchain-kuzu. Ensure you have Python 3.7 or later installed on your system.
A. The package supports OpenAI’s GPT models and can be extended to other LLM providers supported by LangChain.
A. Yes, you can define your own schema by specifying the nodes and relationships you want to extract from the text.
A. The schema refreshes automatically when you invoke the chain. However, you can manually call the refresh_schema() method on the KuzuGraph object.
A. Absolutely! You can configure separate LLMs for these tasks by specifying cypher_llm and qa_llm parameters in the KuzuQAChain object.
A. Kùzu supports data from CSV, JSON, and relational databases, making it highly versatile.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.