In the ever-evolving landscape of technology, we find ourselves on the cusp of a groundbreaking revolution in the world of data storage and retrieval. Imagine a world where applications can process vast amounts of information at lightning speed, effortlessly searching, and analyzing data with unparalleled efficiency. This is the promise of Vector Databases, a cutting-edge technology that is redefining the way we interact with data. In this article, we explore the world of Vector Databases and their incredible potential, focusing specifically on their role in the creation of Low-Latency Machine (LLM) applications. Join us! As the intricate fusion of cutting-edge technology and innovative application development to unlock the secrets of building LLM apps using Vector Databases. Get ready to revolutionize the harness data, as we unveil the keys to unlock the future of data-driven applications!
For example, if you ask, “How do I change my language in the Android app?” to the Amazon customer service app, it might not have been trained on this exact text and hence might be unable to answer. This is where a vector database comes to the rescue. A vector database stores the domain texts (in this case, help docs) and past queries by all the users, including order history, etc., as numerical embeddings and provides a lookup of similar vectors in real-time. In this case, it encodes this query into a numerical vector and uses it to perform a similarity search in its database of vectors and find its closest neighbors. With this help, the chatbot can guide the user correctly to the “Change your language preference” section on the Amazon app.
This article was published as a part of the Data Science Blogathon.
Large Language Models (LLMs) are foundational machine learning models that use deep learning algorithms to process and understand natural language. These models are trained on massive amounts of text data to learn patterns and entity relationships in the language. LLMs can perform many types of language tasks, such as translating languages, analyzing sentiments, chatbot conversations, and more. They can understand complex textual data, identify entities and relationships between them, and generate new text that is coherent and grammatically accurate.
Read More about LLMs here.
LLMs are trained using a large amount of data, often terabytes, even petabytes, with billions or trillions of parameters, enabling them to predict and generate relevant responses based on the user’s prompts or queries. They process input data through word embeddings, self-attention layers, and feedforward networks to generate meaningful text. You can read more about LLM architectures here.
While LLMs seem to generate responses with quite a high accuracy, even better than humans in many standardized tests, these models still have limitations. Firstly, they solely rely on their training data to build their reasoning and hence may lack specific or current information in the data. This leads to the model generating incorrect or unusual responses, AKA “hallucinations.” There has been an ongoing effort to mitigate this. Secondly, the model may not behave or respond in a manner that aligns with the user’s expectations.
To address this, vector databases and embedding models enhance the knowledge of LLMs/Generative AI by providing additional lookups to similar modalities (text, image, video, etc.) for which the user is seeking information. Here is an example where LLMs do not have the response the user asks for and instead rely on a vector database to find that information.
Large Language Models (LLMs) are being utilized or integrated in many parts of industry, such as e-commerce, travel, search, content creation, and finance. These models rely on a relatively newer type of database, known as a vector database, which stores a numerical representation of text, images, videos, and other data in a binary representation called embeddings. This section highlights the fundamentals of vector databases and embeddings and, more significantly, focuses on how to use them to integrate with LLM applications.
A vector database is a database that stores and searches for embeddings using high-dimensional space. These vectors are numerical representations of a data’s features or attributes. Using algorithms that calculate the distance or similarity between vectors in a high-dimensional space, vector databases can quickly and efficiently retrieve similar data. Unlike traditional scalar-based databases that store data in rows or columns and use exact matching or keyword-based search methods, vector databases operate differently. They use vector databases to search and compare a large collection of vectors in a very short amount of time (order of milliseconds) using techniques such as Approximate Nearest Neighbors (ANN).
AI models generate embeddings by inputting raw data such as text, video, images to a vector embedding library such as word2vec and In the context of AI and machine learning, these features represent different dimensions of the data that are essential for understanding patterns relationships, and underlying structures.
Here is an example of how to generate word embeddings using word2vec.
1. Generate the model using your custom corpus of data or use a sample prebuilt model from Google or FastText. If you generate your own, you can save it to your file system as a “word2vec.model” file.
import gensim
# Create a word2vec model
model = gensim.models.Word2Vec(corpus)
# Save the model file
model.save('word2vec.model')
2. Load the model, generate a vector embedding for an input word, and use it to get similar words in the vector embedding space.
import gensim
import numpy as np
# Load the word2vec model
model = gensim.models.Word2Vec.load('word2vec.model')
# Get the vector for the word "king"
king_vector = model['king']
# Get the most similar vectors to the king vector
similar_vectors = model.similar_by_vector(king_vector, topn=5)
# Print the most similar vectors
for vector in similar_vectors:
print(vector[0], vector[1])
3. Here are the top 5 words close to the input word.
Output:
man 0.85
prince 0.78
queen 0.75
lord 0.74
emperor 0.72
At a high level, vector databases rely on embedding models for handling both the creation and querying of embeddings. On the ingestion path, the corpus content is encoded into vectors using the embedding model and stored in vector databases like Pinecone, ChromaDB, Weaviate, etc. On the read path, the application makes a query using sentences or words, and it is again encoded by the embedding model into a vector that is then queried into the vector db to fetch the results.
LLM apps helps in language tasks and is embedded into a broader class of models, such as Generative AI that can generate images and videos apart from just text. In this section, we will learn how to build practical LLM/Generative AI applications using vector databases. I used transformers and torch libs for language models and pinecone as a vector database. You can choose any language model for LLM apps /embeddings and any vector database for storage and searching.
To build a chatbot using a vector database, you can follow these steps:
Here is a simple example of a chatbot application that uses a vector database and a language model:
import pinecone
import transformers
# Create an API client for the vector database
client = pinecone.Client(api_key="YOUR_API_KEY")
# Load the language model
model = transformers.AutoModelForCausalLM.from_pretrained("google/bigbird-roberta-base")
# Define a function to generate text
def generate_text(prompt):
inputs = model.prepare_inputs_for_generation(prompt, return_tensors="pt")
outputs = model.generate(inputs, max_length=100)
return outputs[0].decode("utf-8")
# Define a function to retrieve the most similar vectors to the user's query vector
def retrieve_similar_vectors(query_vector):
results = client.search("my_index", query_vector)
return results
# Define a function to generate a response to the user's query
def generate_response(query):
# Retrieve the most similar vectors to the user's query vector
similar_vectors = retrieve_similar_vectors(query)
# Generate text based on the retrieved vectors
response = generate_text(similar_vectors[0])
return response
# Start the chatbot
while True:
# Get the user's query
query = input("What is your question? ")
# Generate a response to the user's query
response = generate_response(query)
# Print the response
print(response)
This chatbot application will retrieve the most similar vectors to the user’s query vector from the vector database and then generate text using the language model based on the retrieved vectors.
ChatBot > What is your question?
User_A> How tall is the Eiffel Tower?
ChatBot>The height of the Eiffel Tower measures 324 meters (1,063 feet)
from its base to the top of its antenna.
Let’s explore how to build an Image Generator app that uses both Generative AI and Low-Latency Machine Application libraries.
Here is a simple example of a program that integrates a vector database and a GAN to generate images:
import pinecone
import torch
from torchvision import transforms
# Create an API client for the vector database
client = pinecone.Client(api_key="YOUR_API_KEY")
# Load the GAN
generator = torch.load("generator.pt")
# Define a function to generate an image from a vector
def generate_image(vector):
# Convert the vector to a tensor
tensor = torch.from_numpy(vector).float()
# Generate the image
image = generator(tensor)
# Transform the image to a PIL image
image = transforms.ToPILImage()(image)
return image
# Start the image generator
while True:
# Get the user's query
query = input("What kind of image would you like to generate? ")
# Retrieve the most similar vector to the user's query vector
similar_vectors = client.search("my_index", query)
# Generate an image from the retrieved vector
image = generate_image(similar_vectors[0])
# Display the image
image.show()
This program will retrieve the most similar vector to the user’s query vector from the vector database and then generate an image using the GAN based on the retrieved vector.
ImageBot>What kind of image would you like to generate?
Me>An idyllic image of a mountain with a flowing river.
ImageBot> Wait a minute! Here you go...
You can customize this program to meet your specific needs. For example, you can train a GAN specialized in generating a particular type of image, such as portraits or landscapes.
Let’s explore how to build a movie recommendation app from a movie corpus. You can use a similar idea to build a recommendation system for products or other entities.
Here is an example of how to use the Pinecone API to recommend movies to users:
import pinecone
# Create an API client
client = pinecone.Client(api_key="YOUR_API_KEY")
# Get the user's vector
user_vector = client.get_vector("user_index", user_id)
# Recommend movies to the user
results = client.search("movie_index", user_vector)
# Print the results
for result in results:
print(result["title"])
Here is a sample recommendation for a user
The Shawshank Redemption
The Dark Knight
Inception
The Godfather
Pulp Fiction
Vector databases are useful for building various Low-Latency Machine Application, such as image generation, movie or product recommendations, and chatbots. They provide LLMs apps with additional or similar information that LLMs apps have not been trained on. They store the vector embeddings efficiently in a high dimensional space and use nearest neighbors search to find similar embeddings with high accuracy.
The key takeaways from this article are that vector databases are highly suitable for Low-Latency Machine Application and offer the following significant features for users to integrate with:
In addition, I would like to highlight that many existing SQL/NoSQL solutions already add vector embedding storage, indexing, and similarity search features, e.g., PostgreSQL and Redis.
A. LLMs apps are advanced Artificial Intelligence (AI) programs trained on a large corpus of text data using neural networks to mimic human-like responses with context. They can predict, answer, and generate textual data in the domain they have been trained on.
A. Embeddings are numerical representations of text, images, video, or other data formats. They make colocating and finding semantically similar objects easier in a high-dimensional space.
A. A database stores and queries high-dimensional vector embeddings to find similar vectors using nearest-neighbour algorithms such as locality-sensitive hashing. LLMs apps /Generative AI needs them to help them provide additional lookups for similar vectors instead of fine-tuning the LLM apps themselves.
A. Vector databases are niche databases that help index and search vector embeddings. They are widely popular in the open-source community, and many organizations/ apps are integrating with them. However, many existing SQL/NoSQL databases are adding similar capabilities so that the developer community will have many options in the near future.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.