In an era where technology continues to transform the way we interact with information, the concept of a PDF chatbot brings a new level of convenience and efficiency to the table. This article delves into the intriguing realm of creating a PDF chatbot using Langchain and Ollama, where open-source models become accessible with minimal configuration. Say goodbye to the complexities of framework selection and model parameter adjustments, as we embark on a journey to unlock the potential of PDF chatbots. Discover how to seamlessly install Ollama, download models, and craft a PDF chatbot that provides intelligent responses to your queries. Let’s explore this exciting fusion of technology and document processing, making information retrieval easier than ever.
This article was published as a part of the Data Science Blogathon.
To understand this article properly, you need:
Langchain provides various types of functionalities for creating LLM apps. It deserves a separate article itself. If you don’t know what is Langchain, I suggest you go through some articles or tutorials about Langchain. You can see this video too.
Ollama gives you the ability to download the open-source model and use it locally. It automatically downloads the models from the best source. If you have a dedicated GPU on your computer, it will run the model with GPU acceleration. You don’t need to set it manually. You can even customize the model by changing the prompt (yes, you don’t need Langchain for that). Ollama is also available as a docker image so that you can deploy your own model as a docker container. Exciting, right? Now let’s see how you can install Ollama in your machine.
Unfortunately, Ollama is only available for MacOS and Linux. But don’t worry, there is also a way for Windows users to use Ollama – WSL2. If you don’t have WSL2 on your computer, go through this article. Here I explained everything about WSL2 and how to use it in VS Code. If you have already installed it, open Ubuntu and run the below command in the terminal.
curl https://ollama.ai/install.sh | sh
This will install Ollama in WSL2. If you are using MacOS, visit here. Now you are ready to download a model using Ollama. Keep the terminal open, we are not done yet.
Ollama provides various models – llama2, llama2-uncensored, codellama, orca-mini etc. If you’d like to know about all the models available, you can go to this website. Here you will download the orca-mini 3b model. It is a Llama model trained on orca-style datasets created using the approaches defined in the Orca paper. Though this model is small (1.9 GB), it still gives a good response. To download this model, run the below command:
ollama run orca-mini
This command will download and run the orca-mini model in the terminal. Make sure that you have at least 8GB of RAM on your computer before you run this model.
If you are running the model, ask it some questions and see how it’s responding. Here is an example:
You can see from the above example that it sometimes gives irrelevant information. We can fix this by changing the prompt. We will change the prompt when we create the chatbot.
Now that the model works correctly, let’s create the PDF chatbot using it.
If you don’t know how to create a project in WSL2, refer to this article’s last part. There I explained everything. Here we use VS Code as an IDE.
Before we start, let’s install the required libraries. Create a file named requirements.txt and write the dependencies below.
langchain
pymupdf
huggingface-hub
faiss-cpu
sentence-transformers
After that, just open the terminal in VS Code and run the below command.
>>> pip install -r requirements.txt
Now it will take some time. You don’t need any package for Ollama as this is an app. Now we can start coding our chatbot.
To create a chatbot in Langchain, we have to follow some steps:
These are the steps for a basic PDF chatbot. Now if you want to create a complex chatbot, some additional steps are also needed. For example, adding memory, including routing, etc. Now let’s create some functions for every step so that we don’t have to repeat the code multiple times for testing. First, we import the necessary packages:
# Importing the necessary packages
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
import textwrap
After that, we create our first function which will load the PDF file. Here you will read the PDF file using PyMuPDFLoader from Langchain.
# This will load the PDF file
def load_pdf_data(file_path):
# Creating a PyMuPDFLoader object with file_path
loader = PyMuPDFLoader(file_path=file_path)
# loading the PDF file
docs = loader.load()
# returning the loaded document
return docs
Then we have to split the documents into several chunks. Here we will use RecursiveCharacterTextSplitter from Langchain which is the most popular for splitting the texts.
# Responsible for splitting the documents into several chunks
def split_docs(documents, chunk_size=1000, chunk_overlap=20):
# Initializing the RecursiveCharacterTextSplitter with
# chunk_size and chunk_overlap
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size,
chunk_overlap=chunk_overlap
)
# Splitting the documents into chunks
chunks = text_splitter.split_documents(documents=documents)
# returning the document chunks
return chunks
Now it’s time for the embedding. For this, we have to first load the embedding model using HuggingFaceEmbedding from Langchain, then we use FAISS to create the vector store. The model used for embedding is all-MiniLM-L6-v2, which you will see later.
# function for loading the embedding model
def load_embedding_model(model_path, normalize_embedding=True):
return HuggingFaceEmbeddings(
model_name=model_path,
model_kwargs={'device':'cpu'}, # here we will run the model with CPU only
encode_kwargs = {
'normalize_embeddings': normalize_embedding # keep True to compute cosine similarity
}
)
# Function for creating embeddings using FAISS
def create_embeddings(chunks, embedding_model, storing_path="vectorstore"):
# Creating the embeddings using FAISS
vectorstore = FAISS.from_documents(chunks, embedding_model)
# Saving the model in current directory
vectorstore.save_local(storing_path)
# returning the vectorstore
return vectorstore
Let’s create a custom prompt template so that the chatbot will work as expected. The default prompt for the orca-mini model is given below.
prompt = """
### System:
You are an AI Assistant that follows instructions extreamly well. \
Help as much as you can.
### User:
{prompt}
### Response:
"""
So, keeping the prompt format in that way gives us better responses. Here is the modified prompt template:
template = """
### System:
You are an respectful and honest assistant. You have to answer the user's \
questions using only the context provided to you. If you don't know the answer, \
just say you don't know. Don't try to make up an answer.
### Context:
{context}
### User:
{question}
### Response:
"""
Now we must create the Question Answering chain. Here we use RetrievalQA from Langchain. RetrievalQA doesn’t give chatbot memory i.e. It just answers your questions but doesn’t memorise the previous conversations.
# Creating the chain for Question Answering
def load_qa_chain(retriever, llm, prompt):
return RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever, # here we are using the vectorstore as a retriever
chain_type="stuff",
return_source_documents=True, # including source documents in output
chain_type_kwargs={'prompt': prompt} # customizing the prompt
)
We will also create a function that will prettify the response. Here is the code:
# Prettifying the response
def get_response(query, chain):
# Getting response from chain
response = chain({'query': query})
# Wrapping the text for better output in Jupyter Notebook
wrapped_text = textwrap.fill(response['result'], width=100)
print(wrapped_text)
Now as we created all the necessary functions, it’s time to create the chatbot.
First, create a Jupyter Notebook in your directory. For this, create a new file with an extension .ipynb. Also don’t forget to install the jupyter from pip. Otherwise, the notebook will not run. Just run the below command to install jupyter.
>>> pip install jupyter
If VS Code doesn’t detect the kernel for Jupyter Notebook, you will see an option Select Kernel on the top right side. You will see some options below.
Select Python Environments. It will again give you some options.
Select the Starred environment. After that, you are ready to use the Jupyter Notebook.
Now, we are going to import the necessary libraries. Here we will import three libraries:
Let’s import these libraries:
from lang_funcs import *
from langchain.llms import Ollama
from langchain import PromptTemplate
Now we have to load the orca-mini model and the embedding model named all-MiniLM-L6-v2. This embedding model is small but effective.
# Loading orca-mini from Ollama
llm = Ollama(model="orca-mini", temperature=0)
# Loading the Embedding Model
embed = load_embedding_model(model_path="all-MiniLM-L6-v2")
Ollama models are locally hosted in the port 11434. We don’t have to specify as it is already specified in the Ollama() class of langchain. If the embedding model is not downloaded on your computer, it will do this automatically from Huggingface. Just wait for some time and you are good to go.
It’s time to load the data and create the embedding. Here we will use a PDF book on ML.
# loading and splitting the documents
docs = load_pdf_data(file_path="data/ml_book.pdf")
documents = split_docs(documents=docs)
# creating vectorstore
vectorstore = create_embeddings(documents, embed)
# converting vectorstore to a retriever
retriever = vectorstore.as_retriever()
It will take some time to create the embeddings. So, keep patience. Programming is all about patience, isn’t it?
I know you have come a long way. Now we have to create the chain and start testing our chatbot. So let’s do this.
# Creating the prompt from the template which we created before
prompt = PromptTemplate.from_template(template)
# Creating the chain
chain = load_qa_chain(retriever, llm, prompt)
Now we are ready to test our chatbot.
Now let’s ask some questions to our chatbot. Here are some examples:
>>>get_response("What is random forest?", chain)
>>> Random Forest is an ensemble learning technique that uses decision trees as base classifiers to
create a new classifier. It is a type of ensemble learning method that combines multiple decision
trees to improve the accuracy and reduce overfitting of the final model. The Random Forest algorithm
introduces extra randomness when growing trees, instead of searching for the very best feature when
splitting a node (like regular Decision Trees do). This results in a greater tree diversity, which
trades a higher bias for a lower variance. It is also more computationally efficient than other
ensemble learning methods like Bagging and Boosting.
>>>get_response("What is Voting Classifier?", chain)
>>> A voting classifier is a type of machine learning algorithm that predicts the class of a given
input data point based on the majority vote of multiple classifiers. In other words, it takes the
predictions of several different classifiers and predicts the class that gets the most votes. This
approach can be used to improve the accuracy of machine learning models by using the diversity of
predictions from different classifiers. The hard voting classifier is a type of voting classifier
that predicts the class with the highest majority vote.
>>>get_response("What is the difference between voting classifier and random forest?", chain)
>>> Voting classifiers and Random Forests are two different approaches to creating an ensemble of
classifiers. Voting classifiers use majority vote to predict the class that receives the most
votes from the classifiers in the ensemble. This approach is based on the idea that the majority
opinion is likely to be correct, and that by aggregating the predictions of multiple classifiers, we
can improve our accuracy. The hard voting classifier is a simple implementation of this approach.
On the other hand, Random Forests is an ensemble learning method that uses decision trees as base
classifiers. It is trained via the bagging method, where each tree is grown from a randomly selected
subset of the training data. The Random Forest classifier is more convenient and optimized for
Decision Trees than voting classifiers, and it generally yields an overall better model.
It really gives good responses. You can ask some more questions to see how it responds.
I hope now you have a clear understanding about how to create a PDF Chatbot using Langchain and Ollama. Ollama is a new kid in this domain and it really makes our life easier. You already see how we initialized the orca-mini model in just one line. Otherwise, you have to use Hugging face Pipeline from Langchain.
You will get all the codes used in this Article Here.
A. It totally depends on your use case. But according to me, if you create a chatbot just like we created in this article, try to take a relatively small model. Otherwise, the performance of the chatbot will not be good. Here we use Orca-mini as this is not very large but it gives good responses.
A. If you want to read a single-column PDF, then you can use PyPDFLoader. If you are going to read a PDF which is multi-column, you should use PyMuPDFLoader. Now this is my preference. You can try other pdf loaders as well.
A. For running WSL2, you need at least 4GB RAM in your computer. If you have Windows 11 installed on your computer, then the process is much smoother.
A. Langchain provides us with various customisations. More customisable means you have to write more code. You can use Llamaindex as this gives you the ability to create chatbot much easily.
A. When I am writing this article, Embedchain does not support Ollama. But if you want to create this chatbot without Ollama, you can do that.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.