How to Build a PDF Chatbot Without Langchain?

Sunil Kumar Last Updated : 12 Sep, 2023

13 min read

Introduction

Since the release of Chatgpt, the pace of progress in the AI space shows no signs of slowing down, new tools and technologies are being developed every day. Sure, It’s a great thing for businesses and the AI space in general, but as a programmer, do you need to learn all of them to build something? Well, the answer is No. A rather pragmatic approach to this would be to learn about things that you need. There are a lot of tools and technologies that promise to make things easier, and to some extent they do. But also at times, we do not need them at all. Using large frameworks for simple use cases only ends up making your code a bloated mess. So, in this article, we are going to explore by building a CLI PDF chatbot without langchain and understand why we do not always need AI frameworks.

Learning Objectives

Why you do not need AI frameworks like Langchain, and Llama Index
Understand when you need frameworks
Learn about Vector Databases and Indexing
Build a CLI Q&A chatbot from scratch in Python

This article was published as a part of the Data Science Blogathon.

Introduction
Can you do without Langchain?
When do you Need Langchain?
Building a QA Chatbot
What are Vector Databases and indexes?
Build Project Environment
Utility Functions for Chatbot CLI
Chatbot CLI
Python Argparse
Building the CLI
Real-world Use Cases
Conclusion
Frequently Asked Question

Can you do without Langchain?

Over the recent months, frameworks such as Langchain and LLama Index have experienced a remarkable surge in popularity, primarily due to their exceptional capacity to facilitate convenient development of LLM apps by developers. But for a lot of usecases these frameworks might become overkill. It’s like bringing a bazooka to a gun fight.

They ship with things that may not be required in your project. Python is already infamous for being bloated. On top of that, adding dependencies that you hardly need will only make your environment messier. One such use case is document querying. If your project does not involve an AI agent or other such complicated stuff, you can ditch Langchain and make the workflow from scratch, thus reducing unnecessary bloat. Besides this, Langchain or Llama Index-like frameworks are under rapid development; any code refactoring might break your build.

When do you Need Langchain?

If you have an higher order need such as building an Agent to automate complicated software, or projects that require longer engineering hours to build from scratch, it makes sense to use prebuilt solutions. Never reinvent the wheel, unless you need a better wheel. There are other such countless examples where using readymade solutions with minor tweaks makes absolute sense.

Building a QA Chatbot

One of the most sought-after use cases of LLMs has been Document question and answering. And after OpenAI made their ChatGPT endpoints public, it has become much easier to build an interactive conversational bot with any text data sources. In this article, we will build an LLM Q&A CLI app from scratch. So, how do we approach the problem? Before building it let’s understand what we need to do.

A typical workflow will involve

Processing the provided PDF file to extract texts.
We also need to be careful about the context window of the LLM. So, we need to make chunks of those texts.
To query relevant chunks of text, we need to get embeddings of those text chunks. For this, we need an embedding model. For this project, we will use the Huggingface MiniLM-L6-V2 model, you can go with any model you wish such as OpenAI, Cohere, or Google Palm.
For storing and retrieving embeddings, we will use a Vector database such as Chroma. There are many different Vector Databases you can opt for such as Qdrant, Weaviate, Milvus, and many more.
When a user sends a query, it will get converted to embeddings by the same model, and the chunks with similar meaning to the query will be fetched.
The fetched chunks will be concatenated with the query at the end and will be fed to the LLM via an API.
The fetched answer from the model will be returned to the user.

All these things will require a user-facing interface. For this article, we will build a simple Command Line Interface with Python Argparse.

Here is a workflow diagram of our CLI chatbot:

CLI Chatbot | PDF Chatbot without Langchain

Before going into the coding part, let’s understand a thing or two about vector Databases and Indexes.

What are Vector Databases and indexes?

As the name suggests, vector databases store vectors or embeddings. So, why do we need Vector Databases? Building any AI application requires embeddings of real-world data as the Machine learning models cannot directly process these raw data such as texts, images, or audio. When you are dealing with a large amount of this data that will be used repeatedly, it will need to be stored somewhere. So, why can’t we use a traditional database for this? Well, you can use traditional databases for your search needs, but vector databases offer a significant advantage: they can perform vector similarity search in addition to lexical search.

In our case, whenever a user sends a query, the vector DB will perform a vector similarity search over all the embeddings and fetch the K nearest neighbors. The search mechanism is superfast as it employs an algorithm called HNSW.

HNSW stands for Hierarchical Navigable Small World. It is a graph-based algorithm and indexing method for Approximate Nearest Neighbor search (ANN). ANN is a type of search that finds the k most similar items to a given item.

HNSW works by building a graph of the data points. The nodes in the graph represent the data points, and the edges in the graph represent the similarity between the data points. The graph is then traversed to find the k most similar items to the given item.

The HNSW algorithm is fast, reliable, and scalable. Most of the Vector Databases use HNSW as the default search algorithm.

Now, we are all set to delve into codes.

Build Project Environment

As with any Python project, start with creating a virtual environment. This keeps the development environment nice and tidy. Refer to this article for choosing the right Python environment for your project.

The project file structure is simple, we will have two Python files one for defining the CLI and the other for processing, storing, and querying data. Also, create a .env file to store your OpenAI API key.

This is the requirements.txt file install it before getting started.

#requiremnets.txt
openai
chromadb
PyPDF2
dotenv

Now, import the necessary classes and functions.

import os
import openai
import PyPDF2
import re
from chromadb import Client, Settings
from chromadb.utils import embedding_functions
from PyPDF2 import PdfReader
from typing import List, Dict
from dotenv import load_dotenv

Load the OpenAI API key from the .env file.

load_dotenv()
key = os.environ.get('OPENAI_API_KEY')
openai.api_key = key

Utility Functions for Chatbot CLI

To store text embeddings and their metadata, we will create a collection with ChromaDB.

ef = embedding_functions.ONNXMiniLM_L6_V2()
client = Client(settings = Settings(persist_directory="./", is_persistent=True))
collection_ = client.get_or_create_collection(name="test", embedding_function=ef)

As an embedding model, we are using MiniLM-L6-V2 with ONNX runtime. It is small yet capable and on top of that open-sourced.

Next, we will define a function to verify if a provided file path belongs to a valid PDF file.

def verify_pdf_path(file_path):
    try:
        # Attempt to open the PDF file in binary read mode
        with open(file_path, "rb") as pdf_file:
            # Create a PDF reader object using PyPDF2
            pdf_reader = PyPDF2.PdfReader(pdf_file)
            
            # Check if the PDF has at least one page
            if len(pdf_reader.pages) > 0:
                # If it has pages, the PDF is not empty, so do nothing (pass)
                pass
            else:
                # If it has no pages, raise an exception indicating that the PDF is empty
                raise ValueError("PDF file is empty")
    except PyPDF2.errors.PdfReadError:
        # Handle the case where the PDF cannot be read (e.g., it's corrupted or not a valid PDF)
        raise PyPDF2.errors.PdfReadError("Invalid PDF file")
    except FileNotFoundError:
        # Handle the case where the specified file doesn't exist
        raise FileNotFoundError("File not found, check file address again")
    except Exception as e:
        # Handle other unexpected exceptions and display the error message
        raise Exception(f"Error: {e}")

One of the major parts of a PDF Q&A app is to get text chunks. So, we need to define a function that gets us the required chunks of text.

def get_text_chunks(text: str, word_limit: int) -> List[str]:
    """
    Divide a text into chunks with a specified word limit 
    while ensuring each chunk contains complete sentences.
    
    Parameters:
        text (str): The entire text to be divided into chunks.
        word_limit (int): The desired word limit for each chunk.
    
    Returns:
        List[str]: A list containing the chunks of text with 
        the specified word limit and complete sentences.
    """
    sentences = re.split(r'(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)\s', text)
    chunks = []
    current_chunk = []

    for sentence in sentences:
        words = sentence.split()
        if len(" ".join(current_chunk + words)) <= word_limit:
            current_chunk.extend(words)
        else:
            chunks.append(" ".join(current_chunk))
            current_chunk = words

    if current_chunk:
        chunks.append(" ".join(current_chunk))

    return chunks

We have defined a basic algorithm for getting chunks. The idea is to let users create as many words as they want in a single text chunk. And every text chunk will end with a complete sentence, even if it breaches the limit. This is a simple algorithm. You may create something on your own.

Create a Dictionary

Now, we need a function to load texts from PDFs and create a dictionary to keep track of text chunks belonging to a single page.

def load_pdf(file: str, word: int) -> Dict[int, List[str]]:
    # Create a PdfReader object from the specified PDF file
    reader = PdfReader(file)
    
    # Initialize an empty dictionary to store the extracted text chunks
    documents = {}
    
    # Iterate through each page in the PDF
    for page_no in range(len(reader.pages)):
        # Get the current page
        page = reader.pages[page_no]
        
        # Extract text from the current page
        texts = page.extract_text()
        
        # Use the get_text_chunks function to split the extracted text into chunks of 'word' length
        text_chunks = get_text_chunks(texts, word)
        
        # Store the text chunks in the documents dictionary with the page number as the key
        documents[page_no] = text_chunks
    
    # Return the dictionary containing page numbers as keys and text chunks as values
    return documents

ChromaDB Collection

Now, we need to store the data in a ChromaDB collection.

def add_text_to_collection(file: str, word: int = 200) -> None:
    # Load the PDF file and extract text chunks
    docs = load_pdf(file, word)
    
    # Initialize empty lists to store data
    docs_strings = []  # List to store text chunks
    ids = []  # List to store unique IDs
    metadatas = []  # List to store metadata for each text chunk
    id = 0  # Initialize ID
    
    # Iterate through each page and text chunk in the loaded PDF
    for page_no in docs.keys():
        for doc in docs[page_no]:
            # Append the text chunk to the docs_strings list
            docs_strings.append(doc)
            
            # Append metadata for the text chunk, including the page number
            metadatas.append({'page_no': page_no})
            
            # Append a unique ID for the text chunk
            ids.append(id)
            
            # Increment the ID
            id += 1

    # Add the collected data to a collection
    collection_.add(
        ids=[str(id) for id in ids],  # Convert IDs to strings
        documents=docs_strings,  # Text chunks
        metadatas=metadatas,  # Metadata
    )
    
    # Return a success message
    return "PDF embeddings successfully added to collection"

In Chromadb, the metadata field stores additional information regarding the documents. In this case, the page number of a text chunk is its metadata. After extracting metadata from each text chunk, we can store them in the collection we created earlier. This is required only when the user provides a valid file path to a PDF file.

We will now define a function that processes user queries to fetch data from the database.

def query_collection(texts: str, n: int) -> List[str]:
    result = collection_.query(
                  query_texts = texts,
                  n_results = n,
                 )
    documents = result["documents"][0]
    metadatas = result["metadatas"][0]
    resulting_strings = []
    for page_no, text_chunk in zip(metadatas, documents):
        resulting_strings.append(f"Page {page_no['page_no']}: {text_chunk}")
    return resulting_strings

The above function uses a query method to retrieve “n” relevant data from the database. We then create a formatted string that starts with the page number of the text chunk.

Now, the only major thing remaining is to feed the LLM with information.

def get_response(queried_texts: List[str],) -> List[Dict]:
    global messages
    messages = [
                {"role": "system", "content": "You are a helpful assistant.\
                 And will always answer the question asked in 'ques:' and \
                 will quote the page number while answering any questions,\
                 It is always at the start of the prompt in the format 'page n'."},
                {"role": "user", "content": ''.join(queried_texts)}
          ]

    response = openai.ChatCompletion.create(
                            model = "gpt-3.5-turbo",
                            messages = messages,
                            temperature=0.2,               
                     )
    response_msg = response.choices[0].message.content
    messages = messages + [{"role":'assistant', 'content': response_msg}]
    return response_msg

The global variable messages store the context of the conversation. We have defined a system message to print the page number from where the LLM gets the answer.

Lastly, the ultimate utility function combines obtained text chunks with the user query, feeds it into the get_response() function, and returns the resulting answer string.

def get_answer(query: str, n: int):
    queried_texts = query_collection(texts = query, n = n)
    queried_string = [''.join(text) for text in queried_texts]
    queried_string = queried_string[0] + f"ques: {query}"
    answer = get_response(queried_texts = queried_string,)
    return answer

We are done with our utility functions. Let’s move on to building CLI.

Chatbot CLI

To use the chatbot on-demand, we need an interface. This could be a web app, a mobile app, or a CLI. In this article, we will build a CLI for our chatbot. If you want to build a nice-looking demo web app, you can use tools like Gradio or Streamlit. Check out this article on building a chatbot for PDF.

Build a ChatGPT for PDFs with Langchain

To build a CLI, we will need the Argparse library. Argparse is a potent library that lets you create CLIs in Python. It has a simple and easy syntax to create commands, sub-commands, and flags. So, before delving into it, here is a small primer on Argparse.

Python Argparse

The Argparse module was first released in Python 3.2, providing a quick and convenient way to build CLI applications with Python without relying on third-party installations. It allows us to parse command line arguments, create sub-commands in CLIs, and many more features, making it a reliable tool for building CLIs.

Here’s a small example of Argparse in action,

import argparse

parser = argparse.ArgumentParser()
parser.add_argument("-f", "--filename", help="The name of the file to read.")
parser.add_argument("-n", "--number", help="The number of lines to print.", type=int)
parser.add_argument("-s", "--sort", help="Sort the lines in the file.", action="store_true")

args = parser.parse_args()

with open(args.filename) as f:
    lines = f.readlines()

if args.sort:
    lines.sort()

for line in lines:
    print(line)

The add_argument method lets us define sub-commands with checks and balances. We can define the type of argument or the action it needs to undertake when a flag is provided and a help parameter that explains the use case of a particular sub-command. The help subcommand will display all the flags and their use cases.

On a similar note, we will define sub-commands for the chatbot CLI.

Building the CLI

Import Argparse and necessary utility functions.

import argparse
from utils import (
    add_text_to_collection, 
    get_answer, 
    verify_pdf_path, 
    clear_coll
  )

Define Argument parser and add arguments.

def main():
    # Create a command-line argument parser with a description
    parser = argparse.ArgumentParser(description="PDF Processing CLI Tool")
    
    # Define command-line arguments
    parser.add_argument("-f", "--file", help="Path to the input PDF file")
    
    parser.add_argument(
        "-c", "--count",
        default=200, 
        type=int, 
        help="Optional integer value for the number of words in a single chunk"
    )
    
    parser.add_argument(
        "-q", "--question", 
        type=str,
        help="Ask a question"
    )
    
    parser.add_argument(
        "-cl", "--clear", 
        type=bool, 
        help="Clear existing collection data"
    )
    
    parser.add_argument(
        "-n", "--number", 
        type=int, 
        default=1, 
        help="Number of results to be fetched from the collection"
    )

    # Parse the command-line arguments
    args = parser.parse_args()

We have defined a few sub-commands, such as –file, –value, –question, etc.

–file: The string file path of a PDF.
–value: An optional parameter value that defines the number of words in a text chunk.
–question: Takes a user query as a parameter.
— number: Number of similar chunks to be fetched.
–clear: Clears the current Chromadb collection.

Now, we process the arguments;

 if args.file is not None:
        verify_pdf_path(args.file)
        confirmation = add_text_to_collection(file = args.file, word = args.value)
        print(confirmation)

 if args.question is not None:
        if args.number:
            n = args.number
        answer = get_answer(args.question, n = n)
        print("Answer:", answer)

 if args.clear:
        clear_coll()
        return "Current collection cleared successfully"

Putting everything together.

import argparse
from app import (
    add_text_to_collection, 
    get_answer, 
    verify_pdf_path, 
    clear_coll
)

def main():
    # Create a command-line argument parser with a description
    parser = argparse.ArgumentParser(description="PDF Processing CLI Tool")
    
    # Define command-line arguments
    parser.add_argument("-f", "--file", help="Path to the input PDF file")
    
    parser.add_argument(
        "-c", "--count",
        default=200, 
        type=int, 
        help="Optional integer value for the number of words in a single chunk"
    )
    
    parser.add_argument(
        "-q", "--question", 
        type=str,
        help="Ask a question"
    )
    
    parser.add_argument(
        "-cl", "--clear", 
        type=bool, 
        help="Clear existing collection data"
    )
    
    parser.add_argument(
        "-n", "--number", 
        type=int, 
        default=1, 
        help="Number of results to be fetched from the collection"
    )

    # Parse the command-line arguments
    args = parser.parse_args()
    
    # Check if the '--file' argument is provided
    if args.file is not None:
        # Verify the PDF file path and add its text to the collection
        verify_pdf_path(args.file)
        confirmation = add_text_to_collection(file=args.file, word=args.count)
        print(confirmation)

    # Check if the '--question' argument is provided
    if args.question is not None:
        n = args.number if args.number else 1  # Set 'n' to the specified number or default to 1
        answer = get_answer(args.question, n=n)
        print("Answer:", answer)

    # Check if the '--clear' argument is provided
    if args.clear:
        clear_coll()
        print("Current collection cleared successfully")

if __name__ == "__main__":
    main()

Now open your terminal and run the below script.

 python cli.py -f "path/to/file.pdf" -v 1000 -n 1  -q "query"

To delete the collection, type

python cli.py -cl True

If the provided file path does not belong to a PDF, it will raise a FileNotFoundError.

File not found error | PDF Chatbot without Langchain

The GitHub Repository: https://github.com/sunilkumardash9/pdf-cli-chatbot

Real-world Use Cases

A chatbot running as a CLI tool can be used in many real-world applications, such as

Academic Research: Researchers often deal with numerous research papers and articles in PDF format. A CLI chatbot could help them extract relevant information, create bibliographies, and organize their references efficiently.

Language Translation: Language professionals can use the chatbot to extract text from PDFs, translate it, and then generate translated documents, all from the command line.

Educational Institutions: Teachers and educators can extract content from educational resources to create customized learning materials or to prepare course content. Students can extract useful information from large PDFs from the chatbot CLI.

Open Source Project Management: CLI chatbots can help open-source software projects manage documentation, extract code snippets, and generate release notes from PDF manuals.

Conclusion

So, this was all about building a PDF Q&A chatbot with a Command Line Interface built without using frameworks such as the Langchain and Llama Index. Here is a quick summary of things we covered.

Langchain and other AI frameworks can be a great way to get started with AI development. However, it’s important to remember that they are not a silver bullet. They can make your code more complex and can cause bloat, so use them only when you need them.
The use of frameworks makes sense when the complexity of projects requires longer engineering hours if done from scratch.
A document Q&A workflow can be designed from scratch without a framework like Langchain from the first principle.

Frequently Asked Question

Q1. What is a chatbot pdf?

A. A chatbot PDF is an interactive bot specially designed to retrieve information from PDFs.

Q2. What is Langchain used for?

A. LangChain is an open-source framework that simplifies the creation of applications using large language models. It can be used for a variety of tasks, including chatbots, document analysis, code analysis, question answering, and generative tasks.

Q3. Is chatbot an AI tool?

A. Yes, chatbots are AI tools. They use artificial intelligence (AI) and natural language processing (NLP) to simulate human conversation. Chatbots can be used to provide customer service, answer questions, and even generate creative content.

Q4. What are Chatbots for PDFs used for?

A. Chatbots for PDF are tools that allow you to interact with PDF files using natural language. You can ask questions about the PDF, and Chatbot for PDF will try to answer them. You can also ask a PDF Chatbot to summarize the PDF or to extract specific information from it.

Q5. Can I chat with a PDF?

A. Yes, with the advent of capable Large Language Models and vector stores, it is possible to chat with PDFs.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Sunil Kumar

Meet your author Sunil kumar Dash, a developer and a writer. Has diverse interests in tech, pop culture, wellness, philosophy and Anime. Exploring underrated music is his hobby. And loves to doom scroll Twitter when bored.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

How to Build a PDF Chatbot Without Langchain?

Introduction

Learning Objectives

Table of contents

Can you do without Langchain?

When do you Need Langchain?

Building a QA Chatbot

What are Vector Databases and indexes?

Build Project Environment

Utility Functions for Chatbot CLI

Create a Dictionary

ChromaDB Collection

Chatbot CLI

Python Argparse

Building the CLI

Real-world Use Cases

Conclusion

Frequently Asked Question

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp