How to Build Large Language Model Application Using Vector Database?

Apurva Kumar Last Updated : 10 Mar, 2025

3 min read

In the ever-evolving landscape of technology, we find ourselves on the cusp of a groundbreaking revolution in the world of data storage and retrieval. Imagine a world where applications can process vast amounts of information at lightning speed, effortlessly searching, and analyzing data with unparalleled efficiency. This is the promise of Vector Databases, a cutting-edge technology that is redefining the way we interact with data. In this article, we explore the world of Vector Databases and their incredible potential, focusing specifically on their role in the creation of Low-Latency Machine (LLM) applications. Join us! As the intricate fusion of cutting-edge technology and innovative application development to unlock the secrets of building LLM apps using Vector Databases. Get ready to revolutionize the harness data, as we unveil the keys to unlock the future of data-driven applications!

For example, if you ask, “How do I change my language in the Android app?” to the Amazon customer service app, it might not have been trained on this exact text and hence might be unable to answer. This is where a vector database comes to the rescue. A vector database stores the domain texts (in this case, help docs) and past queries by all the users, including order history, etc., as numerical embeddings and provides a lookup of similar vectors in real-time. In this case, it encodes this query into a numerical vector and uses it to perform a similarity search in its database of vectors and find its closest neighbors. With this help, the chatbot can guide the user correctly to the “Change your language preference” section on the Amazon app.

Learning Objectives

How do LLMs work, what are their limitations, and why do they need vector databases?
Introduction to embedding models and how to encode and use them in applications.
Learn what is a vector database and how they are part of LLM application architecture.
Learn how to code LLM/Generative AI applications using vector databases and tensorflow.

This article was published as a part of the Data Science Blogathon.

What are LLMs?
How do LLMs work?
Limitations of LLMs
LLMs and Vector Databases
A Quick Tutorial on Embeddings
LLM Application Architecture
LLM Applications Using Vector Databases
Building a Chatbot App
Building an Image Generator App
Building a Movie Recommendation Low-Latency Machine Application
Real-world Use Cases of LLMs Apps Using Vector Search/Database

Frequently Asked Questions

What are LLMs?

Large Language Models (LLMs) are foundational machine learning models that use deep learning algorithms to process and understand natural language. These models are trained on massive amounts of text data to learn patterns and entity relationships in the language. LLMs can perform many types of language tasks, such as translating languages, analyzing sentiments, chatbot conversations, and more. They can understand complex textual data, identify entities and relationships between them, and generate new text that is coherent and grammatically accurate.

How do LLMs work?

LLMs are trained using a large amount of data, often terabytes, even petabytes, with billions or trillions of parameters, enabling them to predict and generate relevant responses based on the user’s prompts or queries. They process input data through word embeddings, self-attention layers, and feedforward networks to generate meaningful text. You can read more about LLM architectures here.

Limitations of LLMs

While LLMs seem to generate responses with quite a high accuracy, even better than humans in many standardized tests, these models still have limitations. Firstly, they solely rely on their training data to build their reasoning and hence may lack specific or current information in the data. This leads to the model generating incorrect or unusual responses, AKA “hallucinations.” There has been an ongoing effort to mitigate this. Secondly, the model may not behave or respond in a manner that aligns with the user’s expectations.

To address this, vector databases and embedding models enhance the knowledge of LLMs/Generative AI by providing additional lookups to similar modalities (text, image, video, etc.) for which the user is seeking information. Here is an example where LLMs do not have the response the user asks for and instead rely on a vector database to find that information.

LLMs and Vector Databases

Large Language Models (LLMs) are being utilized or integrated in many parts of industry, such as e-commerce, travel, search, content creation, and finance. These models rely on a relatively newer type of database, known as a vector database, which stores a numerical representation of text, images, videos, and other data in a binary representation called embeddings. This section highlights the fundamentals of vector databases and embeddings and, more significantly, focuses on how to use them to integrate with LLM applications.

A vector database is a database that stores and searches for embeddings using high-dimensional space. These vectors are numerical representations of a data’s features or attributes. Using algorithms that calculate the distance or similarity between vectors in a high-dimensional space, vector databases can quickly and efficiently retrieve similar data. Unlike traditional scalar-based databases that store data in rows or columns and use exact matching or keyword-based search methods, vector databases operate differently. They use vector databases to search and compare a large collection of vectors in a very short amount of time (order of milliseconds) using techniques such as Approximate Nearest Neighbors (ANN).

A Quick Tutorial on Embeddings

AI models generate embeddings by inputting raw data such as text, video, images to a vector embedding library such as word2vec and In the context of AI and machine learning, these features represent different dimensions of the data that are essential for understanding patterns relationships, and underlying structures.

Here is an example of how to generate word embeddings using word2vec.

1. Generate the model using your custom corpus of data or use a sample prebuilt model from Google or FastText. If you generate your own, you can save it to your file system as a “word2vec.model” file.

import gensim

# Create a word2vec model
model = gensim.models.Word2Vec(corpus)

# Save the model file
model.save('word2vec.model')

2. Load the model, generate a vector embedding for an input word, and use it to get similar words in the vector embedding space.

import gensim
import numpy as np

# Load the word2vec model
model = gensim.models.Word2Vec.load('word2vec.model')

# Get the vector for the word "king"
king_vector = model['king']

# Get the most similar vectors to the king vector
similar_vectors = model.similar_by_vector(king_vector, topn=5)

# Print the most similar vectors
for vector in similar_vectors:
    print(vector[0], vector[1])

3. Here are the top 5 words close to the input word.

Output:

man 0.85
prince 0.78
queen 0.75
lord 0.74
emperor 0.72

LLM Application Architecture

At a high level, vector databases rely on embedding models for handling both the creation and querying of embeddings. On the ingestion path, the corpus content is encoded into vectors using the embedding model and stored in vector databases like Pinecone, ChromaDB, Weaviate, etc. On the read path, the application makes a query using sentences or words, and it is again encoded by the embedding model into a vector that is then queried into the vector db to fetch the results.

LLM Applications Using Vector Databases

LLM apps helps in language tasks and is embedded into a broader class of models, such as Generative AI that can generate images and videos apart from just text. In this section, we will learn how to build practical LLM/Generative AI applications using vector databases. I used transformers and torch libs for language models and pinecone as a vector database. You can choose any language model for LLM apps /embeddings and any vector database for storage and searching.

Building a Chatbot App

To build a chatbot using a vector database, you can follow these steps:

Choose a vector database such as Pinecone, Chroma, Weaviate, AWS Kendra, etc.
Create a vector index for your chatbot.
Train a language model using a large text corpus of your choice. For e.g, for a news chatbot, you can feed in news data.
Integrate the vector database and the language model.

Here is a simple example of a chatbot application that uses a vector database and a language model:

import pinecone
import transformers

# Create an API client for the vector database
client = pinecone.Client(api_key="YOUR_API_KEY")

# Load the language model
model = transformers.AutoModelForCausalLM.from_pretrained("google/bigbird-roberta-base")

# Define a function to generate text
def generate_text(prompt):
    inputs = model.prepare_inputs_for_generation(prompt, return_tensors="pt")
    outputs = model.generate(inputs, max_length=100)
    return outputs[0].decode("utf-8")

# Define a function to retrieve the most similar vectors to the user's query vector
def retrieve_similar_vectors(query_vector):
    results = client.search("my_index", query_vector)
    return results

# Define a function to generate a response to the user's query
def generate_response(query):
    # Retrieve the most similar vectors to the user's query vector
    similar_vectors = retrieve_similar_vectors(query)

    # Generate text based on the retrieved vectors
    response = generate_text(similar_vectors[0])

    return response

# Start the chatbot
while True:
    # Get the user's query
    query = input("What is your question? ")

    # Generate a response to the user's query
    response = generate_response(query)

    # Print the response
    print(response)

This chatbot application will retrieve the most similar vectors to the user’s query vector from the vector database and then generate text using the language model based on the retrieved vectors.

ChatBot > What is your question?
User_A> How tall is the Eiffel Tower?
ChatBot>The height of the Eiffel Tower measures 324 meters (1,063 feet) 
from its base to the top of its antenna.

Building an Image Generator App

Let’s explore how to build an Image Generator app that uses both Generative AI and Low-Latency Machine Application libraries.

Create a vector database to store your image vectors.
Extract image vectors from your training data.
Insert the image vectors into the vector database.
Train a generative adversarial network (GAN). Read here if you need an introduction to GAN.
Integrate the vector database and the GAN.

Here is a simple example of a program that integrates a vector database and a GAN to generate images:

import pinecone
import torch
from torchvision import transforms

# Create an API client for the vector database
client = pinecone.Client(api_key="YOUR_API_KEY")

# Load the GAN
generator = torch.load("generator.pt")

# Define a function to generate an image from a vector
def generate_image(vector):
    # Convert the vector to a tensor
    tensor = torch.from_numpy(vector).float()

    # Generate the image
    image = generator(tensor)

    # Transform the image to a PIL image
    image = transforms.ToPILImage()(image)

    return image

# Start the image generator
while True:
    # Get the user's query
    query = input("What kind of image would you like to generate? ")

    # Retrieve the most similar vector to the user's query vector
    similar_vectors = client.search("my_index", query)

    # Generate an image from the retrieved vector
    image = generate_image(similar_vectors[0])

    # Display the image
    image.show()

This program will retrieve the most similar vector to the user’s query vector from the vector database and then generate an image using the GAN based on the retrieved vector.

ImageBot>What kind of image would you like to generate?
Me>An idyllic image of a mountain with a flowing river.
ImageBot> Wait a minute! Here you go...

You can customize this program to meet your specific needs. For example, you can train a GAN specialized in generating a particular type of image, such as portraits or landscapes.

Building a Movie Recommendation Low-Latency Machine Application

Let’s explore how to build a movie recommendation app from a movie corpus. You can use a similar idea to build a recommendation system for products or other entities.

Create a vector database to store your movie vectors.
Extract movie vectors from your movie metadata.
Insert the movie vectors into the vector database.
Recommend movies to users.

Here is an example of how to use the Pinecone API to recommend movies to users:

import pinecone

# Create an API client
client = pinecone.Client(api_key="YOUR_API_KEY")

# Get the user's vector
user_vector = client.get_vector("user_index", user_id)

# Recommend movies to the user
results = client.search("movie_index", user_vector)

# Print the results
for result in results:
    print(result["title"])

Here is a sample recommendation for a user

The Shawshank Redemption
The Dark Knight
Inception
The Godfather
Pulp Fiction

Real-world Use Cases of LLMs Apps Using Vector Search/Database

Microsoft and TikTok use vector databases such as Pinecone for long-term memory and faster lookups. This is something Low-Latency Machine Application cannot do alone without a vector database. It is helping users save their past questions/ responses and resume their session. For example, users can ask, “Tell me more about the pasta recipe we discussed last week.” Read here.

Flipkart’s Decision Assistant recommends products to users by first encoding the query as vector embedding and doing a lookup against vectors storing relevant products in high dimensional space. For example, if you search for “Wrangler leather jacket brown men medium,” it recommends relevant products to the user using a vector similarity search. Otherwise, Low-Latency Machine Application would not have any recommendations, as no product catalog would contain such titles or product details. You can read it here.
Chipper Cash, a fintech in Africa, uses a vector database to reduce fraud user signups by 10x. It does this by storing all the images of previous user signups as vector embeddings. Then, when a new user signs up, it encodes it as a vector and compares it against the existing users to detect fraud. You can read it here.

Vector Database in LLM — Source: Chipper Cash

Facebook has been using its vector search library called FAISS (blog) in many products internally, including Instagram Reels and Facebook Stories, to do a quick lookup of any multimedia and find similar candidates for better suggestions to be shown to the user.

Conclusion

Vector databases are useful for building various Low-Latency Machine Application, such as image generation, movie or product recommendations, and chatbots. They provide LLMs apps with additional or similar information that LLMs apps have not been trained on. They store the vector embeddings efficiently in a high dimensional space and use nearest neighbors search to find similar embeddings with high accuracy.

Key Takeaways

The key takeaways from this article are that vector databases are highly suitable for Low-Latency Machine Application and offer the following significant features for users to integrate with:

Performance: Vector databases are specifically designed to efficiently store and retrieve vector data, which is important for developing high-performance Low-Latency Machine Application
Precision: Vector databases can accurately match similar vectors, even if they exhibit slight variations. They use nearest-neighbor algorithms to compute similar vectors.
Multi-Modal: Vector databases can accommodate various multi-modal data, including text, images, and sound. This versatility makes them an ideal choice for Low-Latency Machine Application/Generative AI apps that necessitate working with diverse data types.
Developer-friendly: Vector databases are relatively user-friendly, even for developers who may not possess extensive knowledge of machine learning techniques.

In addition, I would like to highlight that many existing SQL/NoSQL solutions already add vector embedding storage, indexing, and similarity search features, e.g., PostgreSQL and Redis.

Frequently Asked Questions

Q1. What are LLMs?

A. LLMs apps are advanced Artificial Intelligence (AI) programs trained on a large corpus of text data using neural networks to mimic human-like responses with context. They can predict, answer, and generate textual data in the domain they have been trained on.

Q2. What are embeddings?

A. Embeddings are numerical representations of text, images, video, or other data formats. They make colocating and finding semantically similar objects easier in a high-dimensional space.

Q3. What is a vector database? Why do LLMs apps need them?

A. A database stores and queries high-dimensional vector embeddings to find similar vectors using nearest-neighbour algorithms such as locality-sensitive hashing. LLMs apps /Generative AI needs them to help them provide additional lookups for similar vectors instead of fine-tuning the LLM apps themselves.

Q4. What is the future of vector databases?

A. Vector databases are niche databases that help index and search vector embeddings. They are widely popular in the open-source community, and many organizations/ apps are integrating with them. However, many existing SQL/NoSQL databases are adding similar capabilities so that the developer community will have many options in the near future.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Apurva Kumar

Apurva Kumar is a Principal Software Engineer at Walmart Labs. He has over 16 years of experience in the tech industry in the AI and Data Infrastructure space at Amazon, Yahoo, Uber, and Samsung.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

How to Build Large Language Model Application Using Vector Database?

Learning Objectives

Table of contents

What are LLMs?

How do LLMs work?

Limitations of LLMs

LLMs and Vector Databases

A Quick Tutorial on Embeddings

LLM Application Architecture

LLM Applications Using Vector Databases

Building a Chatbot App

Building an Image Generator App

Building a Movie Recommendation Low-Latency Machine Application

Real-world Use Cases of LLMs Apps Using Vector Search/Database

Conclusion

Key Takeaways

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg