RAG and Streamlit Chatbot: Chat with Documents Using LLM

rendyk Last Updated : 22 May, 2024

21 min read

Introduction

This article aims to create an AI-powered RAG and Streamlit chatbot that can answer users questions based on custom documents. Users can upload documents, and the chatbot can answer questions by referring to those documents. The interface will be generated using Streamlit, and the chatbot will use open-source Large Language Model (LLM) models, making it cost-free. This RAG and Streamlit chatbot is similar to ChatGPT, Gemini, and other AI applications that are trained on general information. Let us now dive deeper on how we can develop RAG and Streamlit chatbot and chat with documents using LLM.

Learning Objectives

Understand the concept of LLM and Retrieval-Augmented Generation in the context of AI-powered chatbots.
Learn how to perform RAG step-by-step in a Jupyter Notebook environment, including document splitting, embedding, storing, answer retrieval, and generation.
Experiment with different open-source LLM models, temperature, and max_length parameters to enhance chatbot performance.
Gain proficiency in developing a Streamlit application as the User Interface for displaying the chatbot and utilizing LangChain memory.
Develop skills in creating a Streamlit application for uploading new documents and integrating them into the chatbot’s knowledge base.
Understand the significance of RAG in enhancing chatbot capabilities and its application in real-world scenarios, such as document-based question answering.

This article was published as a part of the Data Science Blogathon.

Introduction
RAG and Streamlit Chatbot
Implementing RAG in Jupyter Notebook
Summary of Outputs
LangChain Memory
Streamlit Experiment: Developing the User Interface
Demonstrate the Result
Stremlit chatbot with memory
Conclusion
Frequently Asked Questions

RAG and Streamlit Chatbot

I can describe how to create a chatbot using Streamlit and Retrieval-Augmented Generation (RAG)!

RAG for Chatbots in Context

Retrieval-Augmented Generation is referred to as RAG. It is a method for augmenting large language models (LLMs) such as GPT-3 by feeding them more context while they are being generated. Usually, you develop a bespoke knowledge base from which this context originates.

This is a condensed synopsis of RAG:

Indexing: To get your knowledge base ready, transform data (articles, papers, etc.) into a format that can be searched. This entails identifying important details and establishing links between them.
Querying: Based on the user’s query and the history of their conversations, the chatbot collects pertinent material from the knowledge base in response to a user’s question.
Generation: The LLM makes use of the data that was retrieved in addition to its general.

A Python framework called Streamlit is used to create web apps quickly. It lets you design an intuitive user interface for your RAG chatbot. What Streamlit has to offer is this:

Text Input: A text box is provided for users to type their questions.
Chat History: The history of the conversations is shown, giving future inquiries context.
Chatbot Response: The user interface displays the LLM’s answer to their question.
Advantages of RAG and Streamlit Together

Increased Accuracy: RAG makes sure the chatbot makes use of your unique knowledge base, which results in more pertinent and accurate responses.
Knowledge that can be customized to fit your domain or area of expertise is available.
User-Friendly Interface: To communicate with the chatbot, Streamlit offers a clear and simple interface.

Implementing RAG in Jupyter Notebook

You can find the notebook here. To start the experiment on a notebook, install the required packages and import them.

# Install packages
!pip install -q langchain faiss-cpu sentence-transformers==2.2.2 InstructorEmbedding pypdf

import from langchain.document_loaders import TextLoader
from pypdf import PdfReader
from langchain import HuggingFaceHub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceInstructEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA, ConversationalRetrievalChain
from langchain.memory import ConversationBufferWindowMemory

PdfReader from pypdf, as its name suggests, is the function to read pdf files. LangChain, the main library of this article, is the library for developing LLM-based applications. It was released in late October 2022, making it relatively new. At the time of publishing this article, it has been around for about one and a half years.

Summarize the process of developing RAG in 3 steps:

Splitting Documents
Embedding and Storing
Answer Retrieval and Generation.

Let’s start by loading the documents.

Splitting Documents

In this experiment, two source documents are used as the custom knowledge. One of them is about a popular manga and another one is about the general knowledge of snakes. The sources are from Wikipedia. This is the code for reading a pdf file. Observe the first printed 300 characters below.

# Load pdf documents
documents_1 = ''

reader = PdfReader('../data sources/wikipedia_naruto.pdf')
for page in reader.pages:
    documents_1 += page.extract_text()

documents_1[:300]

Output

Source: This article is about the manga series. For the anime, see Naruto (TV series). For other uses, see Naruto (disambiguation). Not to be confused with Naruhito, the emperor of Japan.

The text is split into text chunks, which are then transformed into embeddings and stored in a vector store. The LLM uses these chunks to generate answers without processing the entire document.

# Document Splitting
chunk_size = 200
chunk_overlap = 10

splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap
)
split_1 = splitter.split_text(documents_1)
split_1 = splitter.create_documents(split_1)

If we have multiple sources of documents, repeat the same things. Below is an example of reading and chunking a txt file. Other types of acceptable file extensions are csv, doc, docs, ppt, etc.

# Load txt documents
reader = TextLoader('../data sources/wikipedia_snake.txt')
reader = reader.load()
print(len(reader))
documents_2 = reader[0]

documents_2.page_content[:300]

Output

source: This article primarily focuses on snakes, the reptiles. For further distinctions, the term “Snake (disambiguation)” is used. Snakes belong to the scientific classification system as follows: Domain: Eukaryota, Kingdom: Animalia, Phylum: Chordata, Class: Reptilia, Order: Squamata, and form a clade within the evolutionary hierarchy.

# Document Splitting
split_2 = splitter.split_text(documents_2.page_content)
split_2 = splitter.create_documents(split_2)

The code splits text with chunk_size = 200 and chunk_overlap = 20, ensuring the continuation of consecutive chunks by limiting the maximum number of characters in each chunk.

ChunkViz visualizes chunking by displaying different colors for each chunk in a paragraph, with mixed colors representing overlapping between consecutive chunks, and a 200-character chunk size indicating chunk size.

Embedding and Storing

Embedding is the process of capturing the semantic, contextual, and relationships of words in the text chunks and storing them as high-dimensional vectors representing the text. In the example below, it uses “hkunlp/instructor-xl” as the embeddings model. The other options are “hkunlp/instructor-large”, OpenAIEmbeddings, and others. The result is saved as a vector store.

This tutorial uses FAISS as the vector store. There are many other vector store options listed in here . PGVector is one of them that allows developers to save the vector store in Postgres.

# Load embeddings instructor
instructor_embeddings = HuggingFaceInstructEmbeddings(
    model_name='hkunlp/instructor-xl', model_kwargs={'device':'cuda'}
)

# Implement embeddings
db = FAISS.from_documents(split_1, instructor_embeddings)

# Save db
db.save_local('vector store/naruto')

# Implement embeddings for second doc
db_2 = FAISS.from_documents(split_2, instructor_embeddings)

# Save db
db_2.save_local('vector store/snake')

The two vector stores are saved separately. They can be merged and saved as another combined vector store.

# Merge two DBs
db.merge_from(db_2)
db.save_local('vector store/naruto_snake')

Answer Retrieval and Generation

This part is the session when a user asks a question. The system converts the question text into embeddings and utilizes them to search and retrieve similar text chunks from the vector store. Subsequently, it sends these text chunks to the LLM to generate sentences for answering the user’s question.

The code below loads the vector store if this process is started in a new notebook.

# Load db
loaded_db = FAISS.load_local(
    'vector store/naruto_snake', instructor_embeddings, allow_dangerous_deserialization=True
)

This is the process of searching the similar text chunks. The question is “what is naruto?”. By default, it retrieves 4 text chunks which are most likely to contain the expected answers.

# Retrieve answer
question = 'what is naruto?'

search = loaded_db.similarity_search(question)
search

Output

[Document(page_content=’Naruto is a Japanese manga series written and illustrated by Masashi Kishimoto. It tells the story of’),
Document(page_content=’Naruto Uzumaki, a young ninja who seeks recognition from his peers and dreams of becoming the’),
Document(page_content=’Naruto Uzumaki. \n Not to be confused with Naruhito, the emperor of Japan. \n Naruto’),
Document(page_content=’Source: https://en.wikipedia.org/wiki/Naruto \n \n This article is about the manga series. For the title character, see’)]

To query a different number of text chunks, pass the specified number to the k parameter. Here is an example of retrieving 6 text chunks.

# Query more or less text chunks
search = loaded_db.similarity_search(question, k=6)
search

Output

[Document(page_content=’Naruto is a Japanese manga series written and illustrated by Masashi Kishimoto. It tells the story of’),
Document(page_content=’Naruto Uzumaki, a young ninja who seeks recognition from his peers and dreams of becoming the’),
Document(page_content=’Naruto Uzumaki. \n Not to be confused with Naruhito, the emperor of Japan. \n Naruto’),
Document(page_content=’Source: https://en.wikipedia.org/wiki/Naruto \n \n This article is about the manga series. For the title character, see’),
Document(page_content=’Naruto is one of the best-selling manga series of all time, having 250 million copies in circulation’),
Document(page_content=”companies. The story of Naruto continues in Boruto, where Naruto’s son Boruto Uzumaki creates his own \nninja way instead of following his father’s.”)]

We can also check the similarity scores. The smaller score means that the distance of the text chunk is closer to the query. Hence, it is more likely to contain the answer.

search_scores = loaded_db.similarity_search_with_score(question)
search_scores

Output

[(Document(page_content=’Naruto is a Japanese manga series written and illustrated by Masashi Kishimoto. It tells the story of’), 0.33290553),
(Document(page_content=’Naruto Uzumaki, a young ninja who seeks recognition from his peers and dreams of becoming the’), 0.34495327),
(Document(page_content=’Naruto Uzumaki. Not to be confused with Naruhito, the emperor of Japan. \n Naruto’), 0.36766833),
(Document(page_content=’Source: https://en.wikipedia.org/wiki/Naruto . This article is about the manga series. For the title character, see’), 0.3688009)]

To call an LLM model for generating text, the LLM repo parameter specifies which LLM model to use, for example “tiiuae/falcon-7b-instruct”, “mistralai/Mistral-7B-Instruct-v0.2”, “bigscience/bloom”, and others. The temperature default value is 1. Setting it higher than 1 will give more creative and random answers. Setting it lower than 1 will give more predictable answers.

temperature = 1
max_length = 300
llm_model = 'tiiuae/falcon-7b-instruct'

# Load LLM
llm = HuggingFaceHub(
    repo_id=llm_model,
    model_kwargs={'temperature': temperature, 'max_length': max_length},
    huggingfacehub_api_token=token
)

# Create the chatbot
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type='stuff',
    retriever=loaded_db.as_retriever(),
    return_source_documents=True,
)

Ask a question by passing it to the query. Notice that the response has the query as the question, result, and source documents. The result contains the string of the prompt, question, and helpful answer. The helpful answer is parsed to get the string.

# Ask a question
question = 'what is naruto?'
response = qa({'query': question})
response

Output

(For the full version, refer to the notebook.)

{'query': 'what is naruto?',

'result': "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nNaruto is a Japanese . . . n\nQuestion: what is naruto?\nHelpful Answer: Naruto is a fictional character in the manga series of the same name. He is a young ninja who dreams of becoming the Hokage, the leader of his village.",

 'source_documents': [Document(page_content='Naruto is a Japanese manga series . . .  ')]}

answer = response.get('result').split('Helpful Answer:')[1].strip()

Output

Naruto is a fictional character in the manga series of the same name. He is a young ninja who dreams of becoming the Hokage, the leader of his village.

Let’s try the second question. The expected answer to the below question is that the LLM can continue the topic of Naruto referring to the first question. But, it fails to fulfill the expectation because it does not have a memory. It answers each question separately without considering the previous chat log. Later, there will be a way to set a memory to the model. For now, continue on the question-answering trial.

# Ask a question
question = 'do you know whom I am talking about?'
response = qa({'query': question})
answer = response.get('result').split('Helpful Answer:')[1].strip()
explanation = response.get('source_documents', [])
print(answer)

Output

Yes \n\nYes, I do. The person you are referring to is the character named Naruhito, who is the emperor of Japan.

The text explores the use of various LLM models, temperature, and max_length to generate answers to similar questions, highlighting the impact of temperature on creativity and randomness, and the influence of max_length on answer length.

Summary of Outputs

Summarize of the outputs in form of tables:

Summary in Table-1

Model: tiiuae/falcon-7b-instruct, Temperature: 1, Max_length: 300
Question	Answer
1a. what is naruto?	Naruto is a fictional character in the manga series of the same name. He is a young ninja who dreams of becoming the Hokage, the leader of his village.
1b. do you know whom I am talking about?	Yes Yes, I do. The person you are referring to is the character named Naruhito, who is the emperor of Japan.
2. what happened with the seven tailed beasts?	The seven tailed beasts were successfully captured and extracted by the Akatsuki. However, the process was not without its challenges, and the team had to face various obstacles and dangers. Ultimately, they succeeded in their mission and defeated the Tailed Beasts.
3. are there any species similar to snakes?	Yes, there are other species of lizards that have lost their limbs independently but which superficially look similar to snakes.
4. summarize about venomous snakes	Venomous snakes are classified in two taxonomic families: Elapids (cobras, kraits, mambas, Australian copperheads, sea snakes, and coral snakes) and Non-venomous snakes (most species). Most species of snake are nonvenomous and those that have venom use it primarily to kill and subdue prey rather than for self-defense. Some possess venomous fangs that are used to inject venom into their victims.
5. how to cook rice?	To cook rice, you need to follow these steps: 1. Rinse the rice thoroughly in a fine mesh strainer to remove excess starch.2. Place the rice in a pot and add the appropriate amount of water.3. Bring the water to a boil, then reduce the heat and let it simmer for about 10 minutes.4. After 10 minutes, remove the pot from the heat and let it cool for a few minutes.5. Fl
6. hi, how are you?	Hi, I’m fine.

Summary in Table-2

Model: mistralai/Mistral-7B-Instruct-v0.2, Temperature: 3, Max_length: 600
Question	Answer
1. what is naruto?	unqueunque Statistics Mode incorrectpackages geme household cescoveredthey g Att programme again day destroyed Main Count running browser�CU propagManagerкіSy Pom輸mar PHP Unlikesta Act(“-icolorn premrific usual Quick, vessels grantedа Store最��while leaves Tig Times King rubouteanie briefiti competition mutteredaper wait Agreconomwan BE jun give >=・ /** FA SC boagentmaker Fou ear modificationssoap\|ΦMASK membership sac String destination BiticeTabext U moreCHECKょ своиuries Shelаль yet

LangChain Memory

When we are having a conversation with a chatbot, we want the chatbot to remember the previous chats. Each chat is not separated, but connected. In other words, the chatbot has a memory.

Example of a Chatbot Without Memory

A conversation example of a chatbot without memory:

USER	AI CHATBOT
what fruits are in red?	red-colored fruits are apple, cherry, and strawberry?
how do they taste?	please elaborate with more context.

Example of a Chatbot With Memory

A conversation example of a chatbot with memory:

USER	AI CHATBOT
what fruits are in red?	red-colored fruits are apple, cherry, and strawberry?
how do they taste?	They taste sweet.

In the first example, the chatbot does not remember the topic from the previous conversation. In the second example, LangChain memory saves the previous conversation. If the next question is identified to be a follow-up question (related to the previous question), a new standalone question will be generated to answer it. For example, the standalone question is “how do the apple, cherry, and strawberry taste?”.

Types of Memories by LangChain

There are 4 types of memory provided by LangChain:

Conversation Buffer Memory saves the whole conversation from the beginning of the
session. In a long conversation, this memory needs more computation.
Conversation Buffer Window Memory saves a specified number of previous chats. In a
long conversation, it will remember only the latest chats, not from the beginning.
Conversation Token Buffer Memory saves the previous chats based on a specified
number of tokens. This can help plan the LLM cost if it relies on the token number.
Conversation Summary Buffer Memory summarizes the chat history when the token limit is reached.

In the next experiment, Conversation Buffer Window Memory will be used to save 2 latest chats. See that the response has chat_history to store the latest chats.

Implementation with Code

temperature = 1
max_length = 400
llm_model = 'mistralai/Mistral-7B-Instruct-v0.2'

# Load LLM
llm = HuggingFaceHub(
    repo_id=llm_model,
    model_kwargs={'temperature': temperature, 'max_length': max_length},
    huggingfacehub_api_token=token
)

memory = ConversationBufferWindowMemory(
    k=2,
    memory_key="chat_history",
    output_key="answer",
    return_messages=True,
)

qa_conversation = ConversationalRetrievalChain.from_llm(
    llm=llm,
    chain_type='stuff',
    retriever=loaded_db.as_retriever(),
    return_source_documents=True,
    memory=memory,
)

question = 'who is naruto?'
response = qa_conversation({'question': question})
response

Output

{'question': 'who is naruto?',
 'chat_history': [],
 'answer': . . .}

The next question is to confirm the topic from the former chat. It still remembers it as the chat history is now filled with its memory.

# Ask a question
question = 'do you know whom I am talking about?'
response = qa_conversation({'question': question})
response

answer = response.get('answer').split('Helpful Answer:')[-1].strip()
explanation = response.get('source_documents', [])
print(answer)
explanation

Output

Yes, you are referring to the same Naruto Uzumaki from the manga series.

Observe how the standalone question generation occurs. The pronoun “his” from the original question refers to “Naruto Uzumaki” based on the previous chat.

# Ask a question
question = 'who is his team member?'
response = qa_conversation({'question': question})
response

response.get('answer').split('Standalone question:')[2]

Original question: who is his team member?
Standalone question: " Who is a team member of Naruto Uzumaki in the manga series?
Helpful Answer: One of Naruto Uzumaki's team members is Sasuke Uchiha.

Example of Conversation

The following conversation is based on the snake knowledge. It can be found in the notebook, too. The first question talks about snake species. The second question asks if “they” are the only limbless animals. The AI chatbot can understand and refer to “they” as to snake.

USER	AI CHATBOT
are there any species similar to snakes?	to note that while snakes are limbless and evolved from lizards, these other species have lost their limbs independently.
are they the only limbless animals?	Yes, there are other limbless animals. For example, there are several species of apodid (or “apodan”) worm lizards, which are also limbless and belong to the same reptile order, Squamata. Additionally, there are some species of caecilians, which are limbless, legless amphibians.

Streamlit Experiment: Developing the User Interface

Completing the RAG experiment on a Jupyter Notebook is a nice job. However, users will not borrow the developers’ Jupyter Notebook and ask questions there. An interface is necessary to house the RAG and offer interaction capabilities to users. This part demonstrates how to build a chatbot using Streamlit to have a conversation based on custom documents. This part actually wraps the experiment in the notebook above into a web application. The repository is rendy-k/LLM-RAG. There are several important files:

rag_chatbot.py. : This is the main file to run the application. It contains the first page of the Streamlit. The Streamlit will have two pages. The first page is the chatbot for the conversation.
document_embeddings.py. : The second page processes the document embeddings to a vector store.
rag_functions.py.: This file contains the functions called by the two pages to process their tasks.
vector store/. : This folder contains the saved vector stores.

In the rag_chatbot.py, start with putting all of the required inputs after importing the libraries. Observe that there are 6 inputs.

Implementation with Code

import streamlit as st
import os
from pages.backend import rag_functions

st.title("RAG Chatbot")

# Setting the LLM
with st.expander("Setting the LLM"):
    st.markdown("This page is used to have a chat with the uploaded documents")
    with st.form("setting"):
        row_1 = st.columns(3)
        with row_1[0]:
            token = st.text_input("Hugging Face Token", type="password")

        with row_1[1]:
            llm_model = st.text_input("LLM model", value="tiiuae/falcon-7b-instruct")

        with row_1[2]:
            instruct_embeddings = st.text_input("Instruct Embeddings", value="hkunlp/instructor-xl")

        row_2 = st.columns(3)
        with row_2[0]:
            vector_store_list = os.listdir("vector store/")
            default_choice = (
                vector_store_list.index('naruto_snake')
                if 'naruto_snake' in vector_store_list
                else 0
            )
            existing_vector_store = st.selectbox("Vector Store", vector_store_list, default_choice)
        
        with row_2[1]:
            temperature = st.number_input("Temperature", value=1.0, step=0.1)

        with row_2[2]:
            max_length = st.number_input("Maximum character length", value=300, step=1)

        create_chatbot = st.form_submit_button("Create chatbot")

Prepare 3 session states: conversation, history, and source. Variables stored in the session states will remain after a rerun. The LLM with memory, chat history, and source documents must remain after
every rerun. The function prepare_rag_llm prepared the LLM for generating answers based on the given setting.

# Prepare the LLM model
if "conversation" not in st.session_state:
    st.session_state.conversation = None

if token:
    st.session_state.conversation = rag_functions.prepare_rag_llm(
        token, llm_model, instruct_embeddings, existing_vector_store, temperature, max_length
    )

# Chat history
if "history" not in st.session_state:
    st.session_state.history = []

# Source documents
if "source" not in st.session_state:
    st.session_state.source = []

def prepare_rag_llm(
    token, llm_model, instruct_embeddings, vector_store_list, temperature, max_length
):
    # Load embeddings instructor
    instructor_embeddings = HuggingFaceInstructEmbeddings(
        model_name=instruct_embeddings, model_kwargs={"device":"cuda"}
    )

    # Load db
    loaded_db = FAISS.load_local(
        f"vector store/{vector_store_list}",
        instructor_embeddings,
        allow_dangerous_deserialization=True
    )

    # Load LLM
    llm = HuggingFaceHub(
        repo_id=llm_model,
        model_kwargs={"temperature": temperature, "max_length": max_length},
        huggingfacehub_api_token=token
    )

    memory = ConversationBufferWindowMemory(
        k=2,
        memory_key="chat_history",
        output_key="answer",
        return_messages=True,
    )

    # Create the chatbot
    qa_conversation = ConversationalRetrievalChain.from_llm(
        llm=llm,
        chain_type="stuff",
        retriever=loaded_db.as_retriever(),
        return_source_documents=True,
        memory=memory,
    )

    return qa_conversation

Use this code to display the chat history in the application body.

# Display chats
for message in st.session_state.history:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

If a user enters a question, the following code will work. It will append the question to chat session_state.history. Then, the “generate_answer” accepts the question and calls LLM to return the
answer and source documents. The system then saves the answer again in the session_state.history. Additionally, it stores the source documents of each question and answer in the session_state.source.

# Ask a question
if question := st.chat_input("Ask a question"):
    # Append user question to history
    st.session_state.history.append({"role": "user", "content": question})
    # Add user question
    with st.chat_message("user"):
        st.markdown(question)

    # Answer the question
    answer, doc_source = rag_functions.generate_answer(question, token)
    with st.chat_message("assistant"):
        st.write(answer)
    # Append assistant answer to history
    st.session_state.history.append({"role": "assistant", "content": answer})

    # Append the document sources
    st.session_state.source.append({"question": question, "answer": answer, "document": doc_source})

def generate_answer(question, token):
    answer = "An error has occured"

    if token == "":
        answer = "Insert the Hugging Face token"
        doc_source = ["no source"]
    else:
        response = st.session_state.conversation({"question": question})
        answer = response.get("answer").split("Helpful Answer:")[-1].strip()
        explanation = response.get("source_documents", [])
        doc_source = [d.page_content for d in explanation]

    return answer, doc_source

Finally, display the source documents inside an expander.

# Source documents
with st.expander("Source documents"):
    st.write(st.session_state.source)

Output

The second page is in document_embedding.py. It builds the user interface to upload a custom file and process the splitting into text chunks and conversion into embeddings, before saving them into a vector store.

Implementation with Code

The code below imports the library and sets the required inputs.

import streamlit as st
import os
from pages.backend import rag_functions

st.title("Document embedding")
st.markdown("This page is used to upload the documents as the custom knowledge for the chatbot.")

with st.form("document_input"):
    
    document = st.file_uploader(
        "Knowledge Documents", type=['pdf', 'txt'], help=".pdf or .txt file"
    )

    row_1 = st.columns([2, 1, 1])
    with row_1[0]:
        instruct_embeddings = st.text_input(
            "Model Name of the Instruct Embeddings", value="hkunlp/instructor-xl"
        )
    
    with row_1[1]:
        chunk_size = st.number_input(
            "Chunk Size", value=200, min_value=0, step=1,
        )
    
    with row_1[2]:
        chunk_overlap = st.number_input(
            "Chunk Overlap", value=10, min_value=0, step=1,
            help="higher that chunk size"
        )
    
    row_2 = st.columns(2)
    with row_2[0]:
        # List the existing vector stores
        vector_store_list = os.listdir("vector store/")
        vector_store_list = ["<New>"] + vector_store_list
        
        existing_vector_store = st.selectbox(
            "Vector Store to Merge the Knowledge", vector_store_list,
            help="""
              Which vector store to add the new documents.
              Choose <New> to create a new vector store.
                 """
        )

    with row_2[1]:
        # List the existing vector stores     
        new_vs_name = st.text_input(
            "New Vector Store Name", value="new_vector_store_name",
            help="""
              If choose <New> in the dropdown / multiselect box,
              name the new vector store. Otherwise, fill in the existing vector
              store to merge.
            """
        )

    save_button = st.form_submit_button("Save vector store")

Output

This application allows 3 options for users. A user can upload a new document and (1) create a new vector store, (2) merge and update an existing vector store with the new text chunks, or (3) create a new vector store by merging an existing vector store with the new text chunks.

When the “Save vector store” button is clicked, the following processes are executed for the uploaded document.. Find the detailed functions in the file rag_functions.py. The notebook experiment section above covers the discussion of the functions.

if save_button:
    # Read the uploaded file
    if document.name[-4:] == ".pdf":
        document = rag_functions.read_pdf(document)
    elif document.name[-4:] == ".txt":
        document = rag_functions.read_txt(document)
    else:
        st.error("Check if the uploaded file is .pdf or .txt")

    # Split document
    split = rag_functions.split_doc(document, chunk_size, chunk_overlap)

    # Check whether to create new vector store
    create_new_vs = None
    if existing_vector_store == "<New>" and new_vs_name != "":
        create_new_vs = True
    elif existing_vector_store != "<New>" and new_vs_name != "":
        create_new_vs = False
    else:
        st.error(
          """Check the 'Vector Store to Merge the Knowledge'
             and 'New Vector Store Name'""")
    
    # Embeddings and storing
    rag_functions.embedding_storing(
        instruct_embeddings, split, create_new_vs, existing_vector_store, new_vs_name
    )

Demonstrate the Result

This part demonstrates the use of the RAG deployed in Streamlit. Let’s start the conversation by saying hi to the chatbot. The chatbot then replies by reminding the user to insert the Hugging Face token. It is important to load the LLM. After inserting the token, the chatbot can work well.

The first answer is relevant, but actually, there is a small mistake. Examine the source documents that the boa constrictor and green anaconda are actually viviparous, not ovoviviparous as the chatbot
answers.

Source documents transcript

Most species of snakes lay eggs which they abandon shortly after laying. However, a few species (such as the king cobra) construct nests and stay in the vicinity of the hatchlings after incubation.
Some species of snake are ovoviviparous and retain the eggs within their bodies until they are almost ready to hatch. Several species of snake, such as the boa constrictor and green anaconda, are
Most pythons coil around their egg-clutches and remain with them until they hatch. A female python will not leave the eggs, except to occasionally bask in the sun or drink water. She will even

The second question, “How about king cobra?”, expects the chatbot to reply about whether a king cobra will abandon the eggs. But, the question is too general. As a result, the answer fails to capture the context from the previous chat history. It even answers with external knowledge. Check the source documents to find that the answer is not from there.

The third question asks the same thing again. This time the chatbot understands that the word “them” refers to eggs. It then can reply correctly.

Source documents transcript (How about king cobra?)

Most species of snakes lay eggs which they abandon shortly after laying. However, a few species (such as the king cobra) construct nests and stay in the vicinity of the hatchlings after incubation.
Venomous snakes are classified in two taxonomic families: Elapids – cobras including king cobras, kraits, mambas, Australian copperheads, sea snakes, and coral snakes.
Some of the most highly evolved snakes are the Crotalidae, or pit vipers—the rattlesnakes and their associates. Pit vipers have all the sense organs of other snakes, as well as additional aids. Pit
scales. Many species of snakes have skulls with several more joints than their lizard ancestors, enabling them to swallow prey much larger than their heads (cranial kinesis). To accommodate their

Source documents transcript (Does king cobra abandon them?)

Most species of snakes lay eggs which they abandon shortly after laying. However, a few species (such as the king cobra) construct nests and stay in the vicinity of the hatchlings after incubation.
However, elapids, such as cobras and kraits, have hollow fangs that cannot be erected toward the front of their mouths and cannot “stab” like a viper. They must actually”.
order, as a snake-like body has independently evolved at least 26 times. Tetrapodophis does not have distinctive snake features in its spine and skull. A study in 2021 places the animal in a group of
Cobras, vipers, and closely related species use venom to immobilize, injure, or kill their prey. Venom, delivered through fangs, modifies saliva. The fangs of ‘advanced’ venomous snakes are involved in this process.

Source documents transcript (How successful is Naruto as an anime and mange?)

Naruto is a Japanese manga series written and illustrated by Masashi Kishimoto. It tells the story of”
Source: https://en.wikipedia.org/wiki/Naruto This article is about the manga series. For the anime, see Naruto (TV series). For the title character, see monthly Hop Step Award the following year, and Naruto (1997).

Move on to the second page, “Document Embedding”. The following demonstration uploads a pdf file.

Process the PDF file and export it as a vector store named “test”. Once the green success message appears, check the “vector store” folder. Notice that a new vector store named “test” is ready.

If the user does not name the new vector store, the application will display an error message.

Stremlit chatbot with memory

It is possible to create a chatbot with memory that is Streamlit! Here is a summary of the methodology and initial resources to help you get started:

The Retrieval-Augmented Generation (RAG) core concept

RAG is a method that combines memory-based retrieval and generation for chatbots. This is how it works:

Keep Conversation History: A database is used to keep track of previous discussions.
Procedure User Input: A new prompt is created by combining the user’s message with a section of the discussion history, such as the most recent messages.
Retrieval: The conversation history database is searched for pertinent prior discussions using the prompt.
Generation: A large language model (LLM) creates a response based on the information that has been retrieved and the user’s current message.
Tools and Libraries:

Streamlit: This builds your chatbot’s web application interface.

Langchain: This library makes it easier to retrieve data and communicate with the LLM as part of the RAG procedure.
Database: For more complex applications, external databases are an option, or in-memory storage for more basic demos. Cloud database solutions or SQLite are common options.
Large Language Model (LLM): For answer creation, you can select an LLM service such as OpenAI’s API.

Conclusion

LLM is an advanced AI technology capable of understanding and producing human-like natural language. It includes tasks like text classification, generation, and translation. Retrieval Augment Generation (RAG) enhances LLMs by integrating custom data sources, allowing them to answer questions based on specific information. Examples of LLMs designed for RAG include “tiiuae/falcon-7b-instruct,” “mistralai/Mistral-7B-Instruct-v0.2,” and “bigscience/bloom.” Building a RAG system involves splitting documents, embedding and storing them, and retrieving answers. The primary library used for LLM applications is LangChain, which ensures continuity in conversations across interactions with its memory feature. In this article we saw how to develop RAG and Streamlit chatbot and chat with documents using LLM. Also, We have talk about the Stremlit chatbot with memory and how it performs so, we you can check out in this article.

Key Takeaways

LLM and RAG enable users to ask questions and gain answers referring to specific documents.
We learned how to perform RAG step-by-step in a Jupyter Notebook from splitting documents, embedding text chunks, creating vector stores, retrieving answers, and finally generating the answers.
Explored how to experiment on different (open-source) LLM, temperature, and max_length. Each different setting will give different results.
Use of langchain.document_loaders.TextLoader and pypdf.PdfReader to read txt and pdf files, langchain.text_splitter.RecursiveCharacterTextSplitter to split files into text chunks, HuggingFaceInstructEmbeddings to load embedding models, langchain.vectorstores to create vector stores, and langchain.chains.RetrievalQA to retrieve and generate answers.
Use of streamlit.chat_message to display chat messages, streamlit.chat_input to input questions from users, and streamlit.session_state to save variables across reruns.

Frequently Asked Questions

Q1. What is LLM?

A. Large Language Model (LLM) is the Artificial Intelligence (AI) that can comprehend and generate human natural language (generative AI), including performing Natural Language Processing (NLP) tasks, such as text classification, text generation, or translation.

Q2. What is RAG?

A. Retrieval Augment Generation (RAG) is the approach of improving LLM by providing custom data sources so that it can answer questions referring to the provided data.

Q3. What are the examples of LLMs for RAG?

A. “tiiuae/falcon-7b-instruct”, “mistralai/Mistral-7B-Instruct-v0.2”, and “bigscience/bloom”.

Q4. What is the use of LangChain?

A. LangChain, the main library of this article, is the library for developing LLM-based applications

If you find this article interesting and would like to connect with me on LinkedIn, please find my profile here.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

rendyk

A Data Science professional with seasoned specializations in Machine Learning development and Geo-spatial analysis. Hold the TensorFlow Developer Certificate. Have strong work experience in: - delivering meaningful data-driven insights to support business goals, - automating data processing, - data analysis (tabular, time series, text/NLP, and image), - descriptive and inferential statistical analysis, - GIS or spatial data analysis, - data visualization and dashboard development, - Machine Learning modeling (regression, classification, clustering, dimensionality reduction, time series forecasting, recommender engine) - Deep Learning or Artificial Intelligence (regression and classification with MLP, image classification with CNN, time series forecasting with LSTM, text classification with LSTM) - Hugging face: transformers, fine-tuning - Large Language Models (LLM) - Stable Diffusion - web application development, - developing APIs, etc.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Xyzas H

If u want to see a live demo head to hugging face nexas/virtual-tutor

Muhammad Osama Nusrat

Hi I am getting an error when I upload the pdf to document embedding and click on save vector score it shows ImportError: Dependencies for InstructorEmbedding not found. Traceback: File "c:\users\m osama nusrat\appdata\local\programs\python\python39\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 534, in _run_script exec(code, module.__dict__) File "C:\Users\M Osama Nusrat\Downloads\LLM-RAG-main\LLM-RAG-main\pages\document_embedding.py", line 58, in rag_functions.embedding_storing(instruct_embeddings, split, create_new_vs, existing_vector_store, new_vs_name) File "C:\Users\M Osama Nusrat\Downloads\LLM-RAG-main\LLM-RAG-main\pages\backend\rag_functions.py", line 49, in embedding_storing instructor_embeddings = HuggingFaceInstructEmbeddings( File "c:\users\m osama nusrat\appdata\local\programs\python\python39\lib\site-packages\langchain_community\embeddings\huggingface.py", line 171, in __init__ raise ImportError("Dependencies for InstructorEmbedding not found.") from e

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

RAG and Streamlit Chatbot: Chat with Documents Using LLM

Introduction

Learning Objectives

Table of contents

RAG and Streamlit Chatbot

Implementing RAG in Jupyter Notebook

Splitting Documents

Output

Output

Embedding and Storing

Answer Retrieval and Generation

Output

Output

Output

Output

Output

Output

Summary of Outputs

Summary in Table-1

Summary in Table-2

LangChain Memory

Example of a Chatbot Without Memory

Example of a Chatbot With Memory

Types of Memories by LangChain

Implementation with Code

Output

Output

Example of Conversation

Streamlit Experiment: Developing the User Interface

Implementation with Code

Output

Implementation with Code

Output

Demonstrate the Result

Source documents transcript

Source documents transcript (How about king cobra?)

Source documents transcript (Does king cobra abandon them?)

Source documents transcript (How successful is Naruto as an anime and mange?)

Stremlit chatbot with memory

Conclusion

Key Takeaways

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID