A Step-by-Step Guide to PDF Chatbots with Langchain and Ollama

Subhradeep Last Updated : 26 Dec, 2024

10 min read

In an era where technology continues to transform the way we interact with information, the concept of a PDF chatbot brings a new level of convenience and efficiency to the table. This article delves into the intriguing realm of creating a PDF chatbot using Langchain and Ollama, where open-source models become accessible with minimal configuration. Say goodbye to the complexities of framework selection and model parameter adjustments, as we embark on a journey to unlock the potential of PDF chatbots. Discover how to seamlessly install Ollama, download models, and craft a PDF chatbot that provides intelligent responses to your queries. Let’s explore this exciting fusion of technology and document processing, making information retrieval easier than ever.

Learning Objectives

Understand how to install Ollama on your computer.
Learn how to download and run an open-source model using Ollama.
Discover the process of creating a PDF chatbot using Langchain and Ollama.

This article was published as a part of the Data Science Blogathon.

Prerequisites
What is Ollama?
How to Install Ollama?
Downloading a Model
Creating the Chatbot
Conclusion
Frequently Asked Questions

Prerequisites

To understand this article properly, you need:

Good knowledge of Python and,
Basic knowledge of Langchain, i.e. chains, vectorstores etc.

Langchain provides various types of functionalities for creating LLM apps. It deserves a separate article itself. If you don’t know what is Langchain, I suggest you go through some articles or tutorials about Langchain. You can see this video too.

What is Ollama?

Ollama gives you the ability to download the open-source model and use it locally. It automatically downloads the models from the best source. If you have a dedicated GPU on your computer, it will run the model with GPU acceleration. You don’t need to set it manually. You can even customize the model by changing the prompt (yes, you don’t need Langchain for that). Ollama is also available as a docker image so that you can deploy your own model as a docker container. Exciting, right? Now let’s see how you can install Ollama in your machine.

How to Install Ollama?

Unfortunately, Ollama is only available for MacOS and Linux. But don’t worry, there is also a way for Windows users to use Ollama – WSL2. If you don’t have WSL2 on your computer, go through this article. Here I explained everything about WSL2 and how to use it in VS Code. If you have already installed it, open Ubuntu and run the below command in the terminal.

curl https://ollama.ai/install.sh | sh

This will install Ollama in WSL2. If you are using MacOS, visit here. Now you are ready to download a model using Ollama. Keep the terminal open, we are not done yet.

Downloading a Model

Ollama provides various models – llama2, llama2-uncensored, codellama, orca-mini etc. If you’d like to know about all the models available, you can go to this website. Here you will download the orca-mini 3b model. It is a Llama model trained on orca-style datasets created using the approaches defined in the Orca paper. Though this model is small (1.9 GB), it still gives a good response. To download this model, run the below command:

ollama run orca-mini

This command will download and run the orca-mini model in the terminal. Make sure that you have at least 8GB of RAM on your computer before you run this model.

If you are running the model, ask it some questions and see how it’s responding. Here is an example:

downloading a model | Langchain and Ollama

You can see from the above example that it sometimes gives irrelevant information. We can fix this by changing the prompt. We will change the prompt when we create the chatbot.

Now that the model works correctly, let’s create the PDF chatbot using it.

Creating the Chatbot

Set up the Project Directory

If you don’t know how to create a project in WSL2, refer to this article’s last part. There I explained everything. Here we use VS Code as an IDE.

Installing Necessary Libraries

Before we start, let’s install the required libraries. Create a file named requirements.txt and write the dependencies below.

langchain
pymupdf
huggingface-hub
faiss-cpu
sentence-transformers

After that, just open the terminal in VS Code and run the below command.

>>> pip install -r requirements.txt

Now it will take some time. You don’t need any package for Ollama as this is an app. Now we can start coding our chatbot.

Creating Necessary Functions

To create a chatbot in Langchain, we have to follow some steps:

Reading the PDF file using any PDF loader from Langchain.
If the document is really big, it’s a good idea to break it into smaller parts, also called chunks. This way, we can make sure the model gets the right information for your question without using too many resources. It’s like giving the model just the pieces it needs instead of overwhelming it with everything at once. Every model has a token limit, isn’t it?
After that, we have to convert those chunks into vector embeddings so that the model gets a better understanding of the data. Then we have to create the vector store where we can store these embeddings and retrieve them efficiently when needed. Here we will use the FAISS vectorstore.
Now we are ready to query the PDF file. For this, we will use RetrievalQA from Langchain. It will take the vector store as a retriever and the model we are going to use. Some other parameters are also used according to our needs.

Import the Necessary Packages

These are the steps for a basic PDF chatbot. Now if you want to create a complex chatbot, some additional steps are also needed. For example, adding memory, including routing, etc. Now let’s create some functions for every step so that we don’t have to repeat the code multiple times for testing. First, we import the necessary packages:

# Importing the necessary packages
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
import textwrap

After that, we create our first function which will load the PDF file. Here you will read the PDF file using PyMuPDFLoader from Langchain.

# This will load the PDF file
def load_pdf_data(file_path):
    # Creating a PyMuPDFLoader object with file_path
    loader = PyMuPDFLoader(file_path=file_path)
    
    # loading the PDF file
    docs = loader.load()
    
    # returning the loaded document
    return docs

Then we have to split the documents into several chunks. Here we will use RecursiveCharacterTextSplitter from Langchain which is the most popular for splitting the texts.

# Responsible for splitting the documents into several chunks
def split_docs(documents, chunk_size=1000, chunk_overlap=20):
    
    # Initializing the RecursiveCharacterTextSplitter with
    # chunk_size and chunk_overlap
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap
    )
    
    # Splitting the documents into chunks
    chunks = text_splitter.split_documents(documents=documents)
    
    # returning the document chunks
    return chunks

Now it’s time for the embedding. For this, we have to first load the embedding model using HuggingFaceEmbedding from Langchain, then we use FAISS to create the vector store. The model used for embedding is all-MiniLM-L6-v2, which you will see later.

# function for loading the embedding model
def load_embedding_model(model_path, normalize_embedding=True):
    return HuggingFaceEmbeddings(
        model_name=model_path,
        model_kwargs={'device':'cpu'}, # here we will run the model with CPU only
        encode_kwargs = {
            'normalize_embeddings': normalize_embedding # keep True to compute cosine similarity
        }
    )


# Function for creating embeddings using FAISS
def create_embeddings(chunks, embedding_model, storing_path="vectorstore"):
    # Creating the embeddings using FAISS
    vectorstore = FAISS.from_documents(chunks, embedding_model)
    
    # Saving the model in current directory
    vectorstore.save_local(storing_path)
    
    # returning the vectorstore
    return vectorstore

Let’s create a custom prompt template so that the chatbot will work as expected. The default prompt for the orca-mini model is given below.

prompt = """
### System:
You are an AI Assistant that follows instructions extreamly well. \
Help as much as you can.

### User:
{prompt}

### Response:

"""

So, keeping the prompt format in that way gives us better responses. Here is the modified prompt template:

template = """
### System:
You are an respectful and honest assistant. You have to answer the user's \
questions using only the context provided to you. If you don't know the answer, \
just say you don't know. Don't try to make up an answer.

### Context:
{context}

### User:
{question}

### Response:
"""

Now we must create the Question Answering chain. Here we use RetrievalQA from Langchain. RetrievalQA doesn’t give chatbot memory i.e. It just answers your questions but doesn’t memorise the previous conversations.

# Creating the chain for Question Answering
def load_qa_chain(retriever, llm, prompt):
    return RetrievalQA.from_chain_type(
        llm=llm,
        retriever=retriever, # here we are using the vectorstore as a retriever
        chain_type="stuff",
        return_source_documents=True, # including source documents in output
        chain_type_kwargs={'prompt': prompt} # customizing the prompt
    )

We will also create a function that will prettify the response. Here is the code:

# Prettifying the response
def get_response(query, chain):
    # Getting response from chain
    response = chain({'query': query})
    
    # Wrapping the text for better output in Jupyter Notebook
    wrapped_text = textwrap.fill(response['result'], width=100)
    print(wrapped_text)

Now as we created all the necessary functions, it’s time to create the chatbot.

Creating Jupyter Notebook

First, create a Jupyter Notebook in your directory. For this, create a new file with an extension .ipynb. Also don’t forget to install the jupyter from pip. Otherwise, the notebook will not run. Just run the below command to install jupyter.

>>> pip install jupyter

If VS Code doesn’t detect the kernel for Jupyter Notebook, you will see an option Select Kernel on the top right side. You will see some options below.

creating Jupyter Notebook | Langchain and Ollama

Select Python Environments. It will again give you some options.

Python environment | Langchain and Ollama

Select the Starred environment. After that, you are ready to use the Jupyter Notebook.

Importing Necessary Libraries

Now, we are going to import the necessary libraries. Here we will import three libraries:

The Python script in which we write all the functions. I gave the file name lang_funcs.py
Ollama from langchain.llms and,
PromptTemplate from langchain.

Let’s import these libraries:

from lang_funcs import *
from langchain.llms import Ollama
from langchain import PromptTemplate

Loading Models

Now we have to load the orca-mini model and the embedding model named all-MiniLM-L6-v2. This embedding model is small but effective.

# Loading orca-mini from Ollama
llm = Ollama(model="orca-mini", temperature=0)

# Loading the Embedding Model
embed = load_embedding_model(model_path="all-MiniLM-L6-v2")

Ollama models are locally hosted in the port 11434. We don’t have to specify as it is already specified in the Ollama() class of langchain. If the embedding model is not downloaded on your computer, it will do this automatically from Huggingface. Just wait for some time and you are good to go.

Loading the Data and Creating the Vectorstore

It’s time to load the data and create the embedding. Here we will use a PDF book on ML.

# loading and splitting the documents
docs = load_pdf_data(file_path="data/ml_book.pdf")
documents = split_docs(documents=docs)

# creating vectorstore
vectorstore = create_embeddings(documents, embed)

# converting vectorstore to a retriever
retriever = vectorstore.as_retriever()

It will take some time to create the embeddings. So, keep patience. Programming is all about patience, isn’t it?

You are Almost There!

I know you have come a long way. Now we have to create the chain and start testing our chatbot. So let’s do this.

# Creating the prompt from the template which we created before
prompt = PromptTemplate.from_template(template)

# Creating the chain
chain = load_qa_chain(retriever, llm, prompt)

Now we are ready to test our chatbot.

Testing the Chatbot

Now let’s ask some questions to our chatbot. Here are some examples:

>>>get_response("What is random forest?", chain)
>>> Random Forest is an ensemble learning technique that uses decision trees as base classifiers to
create a new classifier. It is a type of ensemble learning method that combines multiple decision
trees to improve the accuracy and reduce overfitting of the final model. The Random Forest algorithm
introduces extra randomness when growing trees, instead of searching for the very best feature when
splitting a node (like regular Decision Trees do). This results in a greater tree diversity, which
trades a higher bias for a lower variance. It is also more computationally efficient than other
ensemble learning methods like Bagging and Boosting.

>>>get_response("What is Voting Classifier?", chain)
>>> A voting classifier is a type of machine learning algorithm that predicts the class of a given
input data point based on the majority vote of multiple classifiers. In other words, it takes the
predictions of several different classifiers and predicts the class that gets the most votes. This
approach can be used to improve the accuracy of machine learning models by using the diversity of
predictions from different classifiers. The hard voting classifier is a type of voting classifier
that predicts the class with the highest majority vote.

>>>get_response("What is the difference between voting classifier and random forest?", chain)
>>> Voting classifiers and Random Forests are two different approaches to creating an ensemble of
classifiers.   Voting classifiers use majority vote to predict the class that receives the most
votes from the classifiers in the ensemble. This approach is based on the idea that the majority
opinion is likely to be correct, and that by aggregating the predictions of multiple classifiers, we
can improve our accuracy. The hard voting classifier is a simple implementation of this approach.
On the other hand, Random Forests is an ensemble learning method that uses decision trees as base
classifiers. It is trained via the bagging method, where each tree is grown from a randomly selected
subset of the training data. The Random Forest classifier is more convenient and optimized for
Decision Trees than voting classifiers, and it generally yields an overall better model.

It really gives good responses. You can ask some more questions to see how it responds.

Checkout this article about how to Run LLM Models Locally with Ollama

Conclusion

I hope now you have a clear understanding about how to create a PDF Chatbot using Langchain and Ollama. Ollama is a new kid in this domain and it really makes our life easier. You already see how we initialized the orca-mini model in just one line. Otherwise, you have to use Hugging face Pipeline from Langchain.

Key Takeaways

Ollama Simplifies Model Deployment: Ollama simplifies the deployment of open-source models by providing an easy way to download and run them on your local computer.
PDF Chatbot Development: Learn the steps involved in creating a PDF chatbot, including loading PDF documents, splitting them into chunks, and creating a chatbot chain.
Customization for Better Responses: Understand how to customize prompts and templates to improve the responses of your chatbot.

You will get all the codes used in this Article Here.

Frequently Asked Questions

Q1. How to know which model I have to use?

A. It totally depends on your use case. But according to me, if you create a chatbot just like we created in this article, try to take a relatively small model. Otherwise, the performance of the chatbot will not be good. Here we use Orca-mini as this is not very large but it gives good responses.

Q2. Which PDFLoader should I use?

A. If you want to read a single-column PDF, then you can use PyPDFLoader. If you are going to read a PDF which is multi-column, you should use PyMuPDFLoader. Now this is my preference. You can try other pdf loaders as well.

Q3. What are the requirements of running WSL2 on my computer?

A. For running WSL2, you need at least 4GB RAM in your computer. If you have Windows 11 installed on your computer, then the process is much smoother.

Q4. Though I am using Ollama, it looks like the chatbot creation process is still tedious. Why is that?

A. Langchain provides us with various customisations. More customisable means you have to write more code. You can use Llamaindex as this gives you the ability to create chatbot much easily.

Q5. Can I create this chatbot using Embedchain?

A. When I am writing this article, Embedchain does not support Ollama. But if you want to create this chatbot without Ollama, you can do that.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Subhradeep

Recently pursuing M.Tech in Artificial Intelligence and love to do anything about Data Science, Machine Learning, and AI. I also like to share my knowledge through Blogs. Ask me anything about Data Science, Machine Learning, and AI at [email protected].

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

A Step-by-Step Guide to PDF Chatbots with Langchain and Ollama

Learning Objectives

Table of contents

Prerequisites

What is Ollama?

How to Install Ollama?

Downloading a Model

Creating the Chatbot

Set up the Project Directory

Installing Necessary Libraries

Creating Necessary Functions

Import the Necessary Packages

Creating Jupyter Notebook

Importing Necessary Libraries

Loading Models

Loading the Data and Creating the Vectorstore

You are Almost There!

Testing the Chatbot

Conclusion

Key Takeaways

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at