This article aims to create an AI-powered RAG and Streamlit chatbot that can answer users questions based on custom documents. Users can upload documents, and the chatbot can answer questions by referring to those documents. The interface will be generated using Streamlit, and the chatbot will use open-source Large Language Model (LLM) models, making it cost-free. This RAG and Streamlit chatbot is similar to ChatGPT, Gemini, and other AI applications that are trained on general information. Let us now dive deeper on how we can develop RAG and Streamlit chatbot and chat with documents using LLM.
This article was published as a part of the Data Science Blogathon.
I can describe how to create a chatbot using Streamlit and Retrieval-Augmented Generation (RAG)!
RAG for Chatbots in Context
Retrieval-Augmented Generation is referred to as RAG. It is a method for augmenting large language models (LLMs) such as GPT-3 by feeding them more context while they are being generated. Usually, you develop a bespoke knowledge base from which this context originates.
This is a condensed synopsis of RAG:
Indexing: To get your knowledge base ready, transform data (articles, papers, etc.) into a format that can be searched. This entails identifying important details and establishing links between them.
Querying: Based on the user’s query and the history of their conversations, the chatbot collects pertinent material from the knowledge base in response to a user’s question.
Generation: The LLM makes use of the data that was retrieved in addition to its general.
A Python framework called Streamlit is used to create web apps quickly. It lets you design an intuitive user interface for your RAG chatbot. What Streamlit has to offer is this:
Text Input: A text box is provided for users to type their questions.
Chat History: The history of the conversations is shown, giving future inquiries context.
Chatbot Response: The user interface displays the LLM’s answer to their question.
Advantages of RAG and Streamlit Together
Increased Accuracy: RAG makes sure the chatbot makes use of your unique knowledge base, which results in more pertinent and accurate responses.
Knowledge that can be customized to fit your domain or area of expertise is available.
User-Friendly Interface: To communicate with the chatbot, Streamlit offers a clear and simple interface.
You can find the notebook here. To start the experiment on a notebook, install the required packages and import them.
# Install packages
!pip install -q langchain faiss-cpu sentence-transformers==2.2.2 InstructorEmbedding pypdf
import from langchain.document_loaders import TextLoader
from pypdf import PdfReader
from langchain import HuggingFaceHub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceInstructEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA, ConversationalRetrievalChain
from langchain.memory import ConversationBufferWindowMemory
PdfReader from pypdf, as its name suggests, is the function to read pdf files. LangChain, the main library of this article, is the library for developing LLM-based applications. It was released in late October 2022, making it relatively new. At the time of publishing this article, it has been around for about one and a half years.
Summarize the process of developing RAG in 3 steps:
Let’s start by loading the documents.
In this experiment, two source documents are used as the custom knowledge. One of them is about a popular manga and another one is about the general knowledge of snakes. The sources are from Wikipedia. This is the code for reading a pdf file. Observe the first printed 300 characters below.
# Load pdf documents
documents_1 = ''
reader = PdfReader('../data sources/wikipedia_naruto.pdf')
for page in reader.pages:
documents_1 += page.extract_text()
documents_1[:300]
Source: This article is about the manga series. For the anime, see Naruto (TV series). For other uses, see Naruto (disambiguation). Not to be confused with Naruhito, the emperor of Japan.
The text is split into text chunks, which are then transformed into embeddings and stored in a vector store. The LLM uses these chunks to generate answers without processing the entire document.
# Document Splitting
chunk_size = 200
chunk_overlap = 10
splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size,
chunk_overlap=chunk_overlap
)
split_1 = splitter.split_text(documents_1)
split_1 = splitter.create_documents(split_1)
If we have multiple sources of documents, repeat the same things. Below is an example of reading and chunking a txt file. Other types of acceptable file extensions are csv, doc, docs, ppt, etc.
# Load txt documents
reader = TextLoader('../data sources/wikipedia_snake.txt')
reader = reader.load()
print(len(reader))
documents_2 = reader[0]
documents_2.page_content[:300]
source: This article primarily focuses on snakes, the reptiles. For further distinctions, the term “Snake (disambiguation)” is used. Snakes belong to the scientific classification system as follows: Domain: Eukaryota, Kingdom: Animalia, Phylum: Chordata, Class: Reptilia, Order: Squamata, and form a clade within the evolutionary hierarchy.
# Document Splitting
split_2 = splitter.split_text(documents_2.page_content)
split_2 = splitter.create_documents(split_2)
The code splits text with chunk_size = 200 and chunk_overlap = 20, ensuring the continuation of consecutive chunks by limiting the maximum number of characters in each chunk.
ChunkViz visualizes chunking by displaying different colors for each chunk in a paragraph, with mixed colors representing overlapping between consecutive chunks, and a 200-character chunk size indicating chunk size.
Embedding is the process of capturing the semantic, contextual, and relationships of words in the text chunks and storing them as high-dimensional vectors representing the text. In the example below, it uses “hkunlp/instructor-xl” as the embeddings model. The other options are “hkunlp/instructor-large”, OpenAIEmbeddings, and others. The result is saved as a vector store.
This tutorial uses FAISS as the vector store. There are many other vector store options listed in here . PGVector is one of them that allows developers to save the vector store in Postgres.
# Load embeddings instructor
instructor_embeddings = HuggingFaceInstructEmbeddings(
model_name='hkunlp/instructor-xl', model_kwargs={'device':'cuda'}
)
# Implement embeddings
db = FAISS.from_documents(split_1, instructor_embeddings)
# Save db
db.save_local('vector store/naruto')
# Implement embeddings for second doc
db_2 = FAISS.from_documents(split_2, instructor_embeddings)
# Save db
db_2.save_local('vector store/snake')
The two vector stores are saved separately. They can be merged and saved as another combined vector store.
# Merge two DBs
db.merge_from(db_2)
db.save_local('vector store/naruto_snake')
This part is the session when a user asks a question. The system converts the question text into embeddings and utilizes them to search and retrieve similar text chunks from the vector store. Subsequently, it sends these text chunks to the LLM to generate sentences for answering the user’s question.
The code below loads the vector store if this process is started in a new notebook.
# Load db
loaded_db = FAISS.load_local(
'vector store/naruto_snake', instructor_embeddings, allow_dangerous_deserialization=True
)
This is the process of searching the similar text chunks. The question is “what is naruto?”. By default, it retrieves 4 text chunks which are most likely to contain the expected answers.
# Retrieve answer
question = 'what is naruto?'
search = loaded_db.similarity_search(question)
search
To query a different number of text chunks, pass the specified number to the k parameter. Here is an example of retrieving 6 text chunks.
# Query more or less text chunks
search = loaded_db.similarity_search(question, k=6)
search
We can also check the similarity scores. The smaller score means that the distance of the text chunk is closer to the query. Hence, it is more likely to contain the answer.
search_scores = loaded_db.similarity_search_with_score(question)
search_scores
To call an LLM model for generating text, the LLM repo parameter specifies which LLM model to use, for example “tiiuae/falcon-7b-instruct”, “mistralai/Mistral-7B-Instruct-v0.2”, “bigscience/bloom”, and others. The temperature default value is 1. Setting it higher than 1 will give more creative and random answers. Setting it lower than 1 will give more predictable answers.
temperature = 1
max_length = 300
llm_model = 'tiiuae/falcon-7b-instruct'
# Load LLM
llm = HuggingFaceHub(
repo_id=llm_model,
model_kwargs={'temperature': temperature, 'max_length': max_length},
huggingfacehub_api_token=token
)
# Create the chatbot
qa = RetrievalQA.from_chain_type(
llm=llm,
chain_type='stuff',
retriever=loaded_db.as_retriever(),
return_source_documents=True,
)
Ask a question by passing it to the query. Notice that the response has the query as the question, result, and source documents. The result contains the string of the prompt, question, and helpful answer. The helpful answer is parsed to get the string.
# Ask a question
question = 'what is naruto?'
response = qa({'query': question})
response
(For the full version, refer to the notebook.)
{'query': 'what is naruto?',
'result': "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nNaruto is a Japanese . . . n\nQuestion: what is naruto?\nHelpful Answer: Naruto is a fictional character in the manga series of the same name. He is a young ninja who dreams of becoming the Hokage, the leader of his village.",
'source_documents': [Document(page_content='Naruto is a Japanese manga series . . . ')]}
answer = response.get('result').split('Helpful Answer:')[1].strip()
Naruto is a fictional character in the manga series of the same name. He is a young ninja who dreams of becoming the Hokage, the leader of his village.
Let’s try the second question. The expected answer to the below question is that the LLM can continue the topic of Naruto referring to the first question. But, it fails to fulfill the expectation because it does not have a memory. It answers each question separately without considering the previous chat log. Later, there will be a way to set a memory to the model. For now, continue on the question-answering trial.
# Ask a question
question = 'do you know whom I am talking about?'
response = qa({'query': question})
answer = response.get('result').split('Helpful Answer:')[1].strip()
explanation = response.get('source_documents', [])
print(answer)
Yes \n\nYes, I do. The person you are referring to is the character named Naruhito, who is the emperor of Japan.
The text explores the use of various LLM models, temperature, and max_length to generate answers to similar questions, highlighting the impact of temperature on creativity and randomness, and the influence of max_length on answer length.
Summarize of the outputs in form of tables:
Model: tiiuae/falcon-7b-instruct, Temperature: 1, Max_length: 300 | |
Question | Answer |
1a. what is naruto? | Naruto is a fictional character in the manga series of the same name. He is a young ninja who dreams of becoming the Hokage, the leader of his village. |
1b. do you know whom I am talking about? | Yes Yes, I do. The person you are referring to is the character named Naruhito, who is the emperor of Japan. |
2. what happened with the seven tailed beasts? | The seven tailed beasts were successfully captured and extracted by the Akatsuki. However, the process was not without its challenges, and the team had to face various obstacles and dangers. Ultimately, they succeeded in their mission and defeated the Tailed Beasts. |
3. are there any species similar to snakes? | Yes, there are other species of lizards that have lost their limbs independently but which superficially look similar to snakes. |
4. summarize about venomous snakes | Venomous snakes are classified in two taxonomic families: Elapids (cobras, kraits, mambas, Australian copperheads, sea snakes, and coral snakes) and Non-venomous snakes (most species). Most species of snake are nonvenomous and those that have venom use it primarily to kill and subdue prey rather than for self-defense. Some possess venomous fangs that are used to inject venom into their victims. |
5. how to cook rice? | To cook rice, you need to follow these steps: 1. Rinse the rice thoroughly in a fine mesh strainer to remove excess starch.2. Place the rice in a pot and add the appropriate amount of water.3. Bring the water to a boil, then reduce the heat and let it simmer for about 10 minutes.4. After 10 minutes, remove the pot from the heat and let it cool for a few minutes.5. Fl |
6. hi, how are you? | Hi, I’m fine. |
Model: mistralai/Mistral-7B-Instruct-v0.2, Temperature: 3, Max_length: 600 | |
Question | Answer |
1. what is naruto? | unqueunque Statistics Mode incorrectpackages geme household cescoveredthey g Att programme again day destroyed Main Count running browser�CU propagManagerкіSy Pom輸mar PHP Unlikesta Act(“-icolorn premrific usual Quick, vessels grantedа Store最��while leaves Tig Times King rubouteanie briefiti competition mutteredaper wait Agreconomwan BE jun give >=・ /** FA SC boagentmaker Fou ear modificationssoap|ΦMASK membership sac String destination BiticeTabext U moreCHECKょ своиuries Shelаль yet |
When we are having a conversation with a chatbot, we want the chatbot to remember the previous chats. Each chat is not separated, but connected. In other words, the chatbot has a memory.
A conversation example of a chatbot without memory:
USER | AI CHATBOT |
what fruits are in red? | red-colored fruits are apple, cherry, and strawberry? |
how do they taste? | please elaborate with more context. |
A conversation example of a chatbot with memory:
USER | AI CHATBOT |
what fruits are in red? | red-colored fruits are apple, cherry, and strawberry? |
how do they taste? | They taste sweet. |
In the first example, the chatbot does not remember the topic from the previous conversation. In the second example, LangChain memory saves the previous conversation. If the next question is identified to be a follow-up question (related to the previous question), a new standalone question will be generated to answer it. For example, the standalone question is “how do the apple, cherry, and strawberry taste?”.
There are 4 types of memory provided by LangChain:
In the next experiment, Conversation Buffer Window Memory will be used to save 2 latest chats. See that the response has chat_history to store the latest chats.
temperature = 1
max_length = 400
llm_model = 'mistralai/Mistral-7B-Instruct-v0.2'
# Load LLM
llm = HuggingFaceHub(
repo_id=llm_model,
model_kwargs={'temperature': temperature, 'max_length': max_length},
huggingfacehub_api_token=token
)
memory = ConversationBufferWindowMemory(
k=2,
memory_key="chat_history",
output_key="answer",
return_messages=True,
)
qa_conversation = ConversationalRetrievalChain.from_llm(
llm=llm,
chain_type='stuff',
retriever=loaded_db.as_retriever(),
return_source_documents=True,
memory=memory,
)
question = 'who is naruto?'
response = qa_conversation({'question': question})
response
{'question': 'who is naruto?',
'chat_history': [],
'answer': . . .}
The next question is to confirm the topic from the former chat. It still remembers it as the chat history is now filled with its memory.
# Ask a question
question = 'do you know whom I am talking about?'
response = qa_conversation({'question': question})
response
answer = response.get('answer').split('Helpful Answer:')[-1].strip()
explanation = response.get('source_documents', [])
print(answer)
explanation
Yes, you are referring to the same Naruto Uzumaki from the manga series.
Observe how the standalone question generation occurs. The pronoun “his” from the original question refers to “Naruto Uzumaki” based on the previous chat.
# Ask a question
question = 'who is his team member?'
response = qa_conversation({'question': question})
response
response.get('answer').split('Standalone question:')[2]
Original question: who is his team member?
Standalone question: " Who is a team member of Naruto Uzumaki in the manga series?
Helpful Answer: One of Naruto Uzumaki's team members is Sasuke Uchiha.
The following conversation is based on the snake knowledge. It can be found in the notebook, too. The first question talks about snake species. The second question asks if “they” are the only limbless animals. The AI chatbot can understand and refer to “they” as to snake.
USER | AI CHATBOT |
are there any species similar to snakes? | to note that while snakes are limbless and evolved from lizards, these other species have lost their limbs independently. |
are they the only limbless animals? | Yes, there are other limbless animals. For example, there are several species of apodid (or “apodan”) worm lizards, which are also limbless and belong to the same reptile order, Squamata. Additionally, there are some species of caecilians, which are limbless, legless amphibians. |
Completing the RAG experiment on a Jupyter Notebook is a nice job. However, users will not borrow the developers’ Jupyter Notebook and ask questions there. An interface is necessary to house the RAG and offer interaction capabilities to users. This part demonstrates how to build a chatbot using Streamlit to have a conversation based on custom documents. This part actually wraps the experiment in the notebook above into a web application. The repository is rendy-k/LLM-RAG. There are several important files:
In the rag_chatbot.py, start with putting all of the required inputs after importing the libraries. Observe that there are 6 inputs.
import streamlit as st
import os
from pages.backend import rag_functions
st.title("RAG Chatbot")
# Setting the LLM
with st.expander("Setting the LLM"):
st.markdown("This page is used to have a chat with the uploaded documents")
with st.form("setting"):
row_1 = st.columns(3)
with row_1[0]:
token = st.text_input("Hugging Face Token", type="password")
with row_1[1]:
llm_model = st.text_input("LLM model", value="tiiuae/falcon-7b-instruct")
with row_1[2]:
instruct_embeddings = st.text_input("Instruct Embeddings", value="hkunlp/instructor-xl")
row_2 = st.columns(3)
with row_2[0]:
vector_store_list = os.listdir("vector store/")
default_choice = (
vector_store_list.index('naruto_snake')
if 'naruto_snake' in vector_store_list
else 0
)
existing_vector_store = st.selectbox("Vector Store", vector_store_list, default_choice)
with row_2[1]:
temperature = st.number_input("Temperature", value=1.0, step=0.1)
with row_2[2]:
max_length = st.number_input("Maximum character length", value=300, step=1)
create_chatbot = st.form_submit_button("Create chatbot")
Prepare 3 session states: conversation, history, and source. Variables stored in the session states will remain after a rerun. The LLM with memory, chat history, and source documents must remain after
every rerun. The function prepare_rag_llm prepared the LLM for generating answers based on the given setting.
# Prepare the LLM model
if "conversation" not in st.session_state:
st.session_state.conversation = None
if token:
st.session_state.conversation = rag_functions.prepare_rag_llm(
token, llm_model, instruct_embeddings, existing_vector_store, temperature, max_length
)
# Chat history
if "history" not in st.session_state:
st.session_state.history = []
# Source documents
if "source" not in st.session_state:
st.session_state.source = []
def prepare_rag_llm(
token, llm_model, instruct_embeddings, vector_store_list, temperature, max_length
):
# Load embeddings instructor
instructor_embeddings = HuggingFaceInstructEmbeddings(
model_name=instruct_embeddings, model_kwargs={"device":"cuda"}
)
# Load db
loaded_db = FAISS.load_local(
f"vector store/{vector_store_list}",
instructor_embeddings,
allow_dangerous_deserialization=True
)
# Load LLM
llm = HuggingFaceHub(
repo_id=llm_model,
model_kwargs={"temperature": temperature, "max_length": max_length},
huggingfacehub_api_token=token
)
memory = ConversationBufferWindowMemory(
k=2,
memory_key="chat_history",
output_key="answer",
return_messages=True,
)
# Create the chatbot
qa_conversation = ConversationalRetrievalChain.from_llm(
llm=llm,
chain_type="stuff",
retriever=loaded_db.as_retriever(),
return_source_documents=True,
memory=memory,
)
return qa_conversation
Use this code to display the chat history in the application body.
# Display chats
for message in st.session_state.history:
with st.chat_message(message["role"]):
st.markdown(message["content"])
If a user enters a question, the following code will work. It will append the question to chat session_state.history. Then, the “generate_answer” accepts the question and calls LLM to return the
answer and source documents. The system then saves the answer again in the session_state.history. Additionally, it stores the source documents of each question and answer in the session_state.source.
# Ask a question
if question := st.chat_input("Ask a question"):
# Append user question to history
st.session_state.history.append({"role": "user", "content": question})
# Add user question
with st.chat_message("user"):
st.markdown(question)
# Answer the question
answer, doc_source = rag_functions.generate_answer(question, token)
with st.chat_message("assistant"):
st.write(answer)
# Append assistant answer to history
st.session_state.history.append({"role": "assistant", "content": answer})
# Append the document sources
st.session_state.source.append({"question": question, "answer": answer, "document": doc_source})
def generate_answer(question, token):
answer = "An error has occured"
if token == "":
answer = "Insert the Hugging Face token"
doc_source = ["no source"]
else:
response = st.session_state.conversation({"question": question})
answer = response.get("answer").split("Helpful Answer:")[-1].strip()
explanation = response.get("source_documents", [])
doc_source = [d.page_content for d in explanation]
return answer, doc_source
Finally, display the source documents inside an expander.
# Source documents
with st.expander("Source documents"):
st.write(st.session_state.source)
The second page is in document_embedding.py. It builds the user interface to upload a custom file and process the splitting into text chunks and conversion into embeddings, before saving them into a vector store.
The code below imports the library and sets the required inputs.
import streamlit as st
import os
from pages.backend import rag_functions
st.title("Document embedding")
st.markdown("This page is used to upload the documents as the custom knowledge for the chatbot.")
with st.form("document_input"):
document = st.file_uploader(
"Knowledge Documents", type=['pdf', 'txt'], help=".pdf or .txt file"
)
row_1 = st.columns([2, 1, 1])
with row_1[0]:
instruct_embeddings = st.text_input(
"Model Name of the Instruct Embeddings", value="hkunlp/instructor-xl"
)
with row_1[1]:
chunk_size = st.number_input(
"Chunk Size", value=200, min_value=0, step=1,
)
with row_1[2]:
chunk_overlap = st.number_input(
"Chunk Overlap", value=10, min_value=0, step=1,
help="higher that chunk size"
)
row_2 = st.columns(2)
with row_2[0]:
# List the existing vector stores
vector_store_list = os.listdir("vector store/")
vector_store_list = ["<New>"] + vector_store_list
existing_vector_store = st.selectbox(
"Vector Store to Merge the Knowledge", vector_store_list,
help="""
Which vector store to add the new documents.
Choose <New> to create a new vector store.
"""
)
with row_2[1]:
# List the existing vector stores
new_vs_name = st.text_input(
"New Vector Store Name", value="new_vector_store_name",
help="""
If choose <New> in the dropdown / multiselect box,
name the new vector store. Otherwise, fill in the existing vector
store to merge.
"""
)
save_button = st.form_submit_button("Save vector store")
This application allows 3 options for users. A user can upload a new document and (1) create a new vector store, (2) merge and update an existing vector store with the new text chunks, or (3) create a new vector store by merging an existing vector store with the new text chunks.
When the “Save vector store” button is clicked, the following processes are executed for the uploaded document.. Find the detailed functions in the file rag_functions.py. The notebook experiment section above covers the discussion of the functions.
if save_button:
# Read the uploaded file
if document.name[-4:] == ".pdf":
document = rag_functions.read_pdf(document)
elif document.name[-4:] == ".txt":
document = rag_functions.read_txt(document)
else:
st.error("Check if the uploaded file is .pdf or .txt")
# Split document
split = rag_functions.split_doc(document, chunk_size, chunk_overlap)
# Check whether to create new vector store
create_new_vs = None
if existing_vector_store == "<New>" and new_vs_name != "":
create_new_vs = True
elif existing_vector_store != "<New>" and new_vs_name != "":
create_new_vs = False
else:
st.error(
"""Check the 'Vector Store to Merge the Knowledge'
and 'New Vector Store Name'""")
# Embeddings and storing
rag_functions.embedding_storing(
instruct_embeddings, split, create_new_vs, existing_vector_store, new_vs_name
)
This part demonstrates the use of the RAG deployed in Streamlit. Let’s start the conversation by saying hi to the chatbot. The chatbot then replies by reminding the user to insert the Hugging Face token. It is important to load the LLM. After inserting the token, the chatbot can work well.
The first answer is relevant, but actually, there is a small mistake. Examine the source documents that the boa constrictor and green anaconda are actually viviparous, not ovoviviparous as the chatbot
answers.
The second question, “How about king cobra?”, expects the chatbot to reply about whether a king cobra will abandon the eggs. But, the question is too general. As a result, the answer fails to capture the context from the previous chat history. It even answers with external knowledge. Check the source documents to find that the answer is not from there.
The third question asks the same thing again. This time the chatbot understands that the word “them” refers to eggs. It then can reply correctly.
Move on to the second page, “Document Embedding”. The following demonstration uploads a pdf file.
Process the PDF file and export it as a vector store named “test”. Once the green success message appears, check the “vector store” folder. Notice that a new vector store named “test” is ready.
If the user does not name the new vector store, the application will display an error message.
It is possible to create a chatbot with memory that is Streamlit! Here is a summary of the methodology and initial resources to help you get started:
The Retrieval-Augmented Generation (RAG) core concept
RAG is a method that combines memory-based retrieval and generation for chatbots. This is how it works:
Keep Conversation History: A database is used to keep track of previous discussions.
Procedure User Input: A new prompt is created by combining the user’s message with a section of the discussion history, such as the most recent messages.
Retrieval: The conversation history database is searched for pertinent prior discussions using the prompt.
Generation: A large language model (LLM) creates a response based on the information that has been retrieved and the user’s current message.
Tools and Libraries:
Streamlit: This builds your chatbot’s web application interface.
Langchain: This library makes it easier to retrieve data and communicate with the LLM as part of the RAG procedure.
Database: For more complex applications, external databases are an option, or in-memory storage for more basic demos. Cloud database solutions or SQLite are common options.
Large Language Model (LLM): For answer creation, you can select an LLM service such as OpenAI’s API.
LLM is an advanced AI technology capable of understanding and producing human-like natural language. It includes tasks like text classification, generation, and translation. Retrieval Augment Generation (RAG) enhances LLMs by integrating custom data sources, allowing them to answer questions based on specific information. Examples of LLMs designed for RAG include “tiiuae/falcon-7b-instruct,” “mistralai/Mistral-7B-Instruct-v0.2,” and “bigscience/bloom.” Building a RAG system involves splitting documents, embedding and storing them, and retrieving answers. The primary library used for LLM applications is LangChain, which ensures continuity in conversations across interactions with its memory feature. In this article we saw how to develop RAG and Streamlit chatbot and chat with documents using LLM. Also, We have talk about the Stremlit chatbot with memory and how it performs so, we you can check out in this article.
A. Large Language Model (LLM) is the Artificial Intelligence (AI) that can comprehend and generate human natural language (generative AI), including performing Natural Language Processing (NLP) tasks, such as text classification, text generation, or translation.
A. Retrieval Augment Generation (RAG) is the approach of improving LLM by providing custom data sources so that it can answer questions referring to the provided data.
A. “tiiuae/falcon-7b-instruct”, “mistralai/Mistral-7B-Instruct-v0.2”, and “bigscience/bloom”.
A. LangChain, the main library of this article, is the library for developing LLM-based applications
If you find this article interesting and would like to connect with me on LinkedIn, please find my profile here.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
If u want to see a live demo head to hugging face nexas/virtual-tutor
Hi I am getting an error when I upload the pdf to document embedding and click on save vector score it shows ImportError: Dependencies for InstructorEmbedding not found. Traceback: File "c:\users\m osama nusrat\appdata\local\programs\python\python39\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 534, in _run_script exec(code, module.__dict__) File "C:\Users\M Osama Nusrat\Downloads\LLM-RAG-main\LLM-RAG-main\pages\document_embedding.py", line 58, in rag_functions.embedding_storing(instruct_embeddings, split, create_new_vs, existing_vector_store, new_vs_name) File "C:\Users\M Osama Nusrat\Downloads\LLM-RAG-main\LLM-RAG-main\pages\backend\rag_functions.py", line 49, in embedding_storing instructor_embeddings = HuggingFaceInstructEmbeddings( File "c:\users\m osama nusrat\appdata\local\programs\python\python39\lib\site-packages\langchain_community\embeddings\huggingface.py", line 171, in __init__ raise ImportError("Dependencies for InstructorEmbedding not found.") from e