Self Hosting RAG Applications On Edge Devices with Langchain and Ollama–Part II

Subhadeep Mandal Last Updated : 27 Aug, 2024

12 min read

Introduction

In the second part of our series on building a RAG application on a Raspberry Pi, we’ll expand on the foundation we laid in the first part, where we created and tested the core pipeline. In the first part, we created the core pipeline and tested it to ensure everything worked as expected. Now, we’re going to take things a step further by building a FastAPI application to serve our RAG pipeline and creating a Reflex app to give users a simple and interactive way to access it. This part will guide you through setting up the FastAPI back-end, designing the front-end with Reflex, and getting everything up and running on your Raspberry Pi. By the end, you’ll have a complete, working application that’s ready for real-world use.

Learning Objectives

Set up a FastAPI back-end to integrate with the existing RAG pipeline and process queries efficiently.
Design a user-friendly interface using Reflex to interact with the FastAPI back-end and the RAG pipeline.
Create and test API endpoints for querying and document ingestion, ensuring smooth operation with FastAPI.
Deploy and test the complete application on a Raspberry Pi, ensuring both back-end and front-end components function seamlessly.
Understand the integration between FastAPI and Reflex for a cohesive RAG application experience.
Implement and troubleshoot FastAPI and Reflex components to provide a fully operational RAG application on a Raspberry Pi.

If you missed the previous edition, be sure to check it out here: Self-Hosting RAG Applications on Edge Devices with Langchain and Ollama – Part I.

Creating Python Environment
Developing the Back-End with FastAPI
Designing the Front-End with Reflex
Testing and Deployment
Frequently Asked Question

This article was published as a part of the Data Science Blogathon.

Creating Python Environment

Before we start with creating the application we need to setup the environment. Create an environment and install the below dependencies:

deeplake 
boto3==1.34.144 
botocore==1.34.144 
fastapi==0.110.3 
gunicorn==22.0.0 
httpx==0.27.0 
huggingface-hub==0.23.4 
langchain==0.2.6 
langchain-community==0.2.6 
langchain-core==0.2.11 
langchain-experimental==0.0.62 
langchain-text-splitters==0.2.2 
langsmith==0.1.83 
marshmallow==3.21.3 
numpy==1.26.4 
pandas==2.2.2 
pydantic==2.8.2 
pydantic_core==2.20.1 
PyMuPDF==1.24.7 
PyMuPDFb==1.24.6 
python-dotenv==1.0.1 
pytz==2024.1 
PyYAML==6.0.1 
reflex==0.5.6 
requests==2.32.3
reflex==0.5.6
reflex-hosting-cli==0.1.13

Once the required packages are installed, we need to have the required models present in the device. We will do this using Ollama. Follow the steps from Part-1 of this article to download both the language and embedding models. Finally, create two directories for the back-end and front-end applications.

Once the models are pulled using Ollama, we are ready to build the final application.

Developing the Back-End with FastAPI

In the Part-1 of this article, we have built the RAG pipeline having both the Ingestion and QnA modules. We have tested both the pipelines using some documents and they were perfectly working. Now we need to wrap the pipeline with FastAPI to create consumable API. This will help us integrate it with any front-end application like Streamlit, Chainlit, Gradio, Reflex, React, Angular etc. Let’s start by building a structure for the application. Following the structure is completely optional, but make sure to check the dependency imports if you follow a different structure to create the app.

Below is the tree structure we will follow:

backend
├── app.py
├── requirements.txt
└── src
    ├── config.py
    ├── doc_loader
    │   ├── base_loader.py
    │   ├── __init__.py
    │   └── pdf_loader.py
    ├── ingestion.py
    ├── __init__.py
    └── qna.py

Let’s start with the config.py. This file will contain all the configurable options for the application, like the Ollama URL, LLM name and the embeddings model name. Below is an example:

LANGUAGE_MODEL_NAME = "phi3"
EMBEDDINGS_MODEL_NAME = "nomic-embed-text"
OLLAMA_URL = "http://localhost:11434"

The base_loader.py file contains the parent document loader class that will be inherited by children document loader. In this application we are only working with PDF files, so a Child PDFLoader class will be
created that will inherit the BaseLoader class.

Below are the contents of base_loader.py and pdf_loader.py:

# base_loader.py
from abc import ABC, abstractmethod

class BaseLoader(ABC):
    def __init__(self, file_path: str) -> None:
        self.file_path = file_path

    @abstractmethod
    async def load_document(self):
        pass


# pdf_loader.py
import os

from .base_loader import BaseLoader
from langchain.schema import Document
from langchain.document_loaders.pdf import PyMuPDFLoader
from langchain.text_splitter import CharacterTextSplitter


class PDFLoader(BaseLoader):
    def __init__(self, file_path: str) -> None:
        super().__init__(file_path)

    async def load_document(self):
        self.file_name = os.path.basename(self.file_path)
        loader = PyMuPDFLoader(file_path=self.file_path)

        text_splitter = CharacterTextSplitter(
            separator="\n",
            chunk_size=1000,
            chunk_overlap=200,
        )
        pages = await loader.aload()
        total_pages = len(pages)
        chunks = []
        for idx, page in enumerate(pages):
            chunks.append(
                Document(
                    page_content=page.page_content,
                    metadata=dict(
                        {
                            "file_name": self.file_name,
                            "page_no": str(idx + 1),
                            "total_pages": str(total_pages),
                        }
                    ),
                )
            )

        final_chunks = text_splitter.split_documents(chunks)
        return final_chunks

We have discussed the working of pdf_loader in the Part-1 of the article.

Next, let’s build the Ingestion class. This is same as the one we built in the Part-1 of this article.

Code for Ingestion Class

import os
import config as cfg

from pinecone import Pinecone
from langchain.vectorstores.deeplake import DeepLake
from langchain.embeddings.ollama import OllamaEmbeddings
from .doc_loader import PDFLoader

class Ingestion:
    """Document Ingestion pipeline."""
    def __init__(self):
        try:
            self.embeddings = OllamaEmbeddings(
                model=cfg.EMBEDDINGS_MODEL_NAME,
                base_url=cfg.OLLAMA_URL,
                show_progress=True,
            )
            self.vector_store = DeepLake(
                dataset_path="data/text_vectorstore",
                embedding=self.embeddings,
                num_workers=4,
                verbose=False,
            )
        except Exception as e:
            raise RuntimeError(f"Failed to initialize Ingestion system. ERROR: {e}")

    async def create_and_add_embeddings(
        self,
        file: str,
    ):
        try:
            loader = PDFLoader(
                file_path=file,
            )

            chunks = await loader.load_document()
            size = await self.vector_store.aadd_documents(documents=chunks)
            return len(size)
        except (ValueError, RuntimeError, KeyError, TypeError) as e:
            raise Exception(f"ERROR: {e}")

Now that we have setup the Ingestion class, we’ll go forward with creating the QnA class. This too is same as the one we created in the Part-1 of this article.

Code for QnA Class

import os
import config as cfg

from pinecone import Pinecone
from langchain.vectorstores.deeplake import DeepLake
from langchain.embeddings.ollama import OllamaEmbeddings
from langchain_community.llms.ollama import Ollama
from .doc_loader import PDFLoader

class QnA:
    """Document Ingestion pipeline."""
    def __init__(self):
        try:
            self.embeddings = OllamaEmbeddings(
                model=cfg.EMBEDDINGS_MODEL_NAME,
                base_url=cfg.OLLAMA_URL,
                show_progress=True,
            )
            self.model = Ollama(
                model=cfg.LANGUAGE_MODEL_NAME,
                base_url=cfg.OLLAMA_URL,
                verbose=True,
                temperature=0.2,
            )
            self.vector_store = DeepLake(
                dataset_path="data/text_vectorstore",
                embedding=self.embeddings,
                num_workers=4,
                verbose=False,
            )
            self.retriever = self.vector_store.as_retriever(
                search_type="similarity",
                search_kwargs={
                    "k": 10,
                },
            )
        except Exception as e:
            raise RuntimeError(f"Failed to initialize Ingestion system. ERROR: {e}")

    def create_rag_chain(self):
        try:
            system_prompt = """<Instructions>\n\nContext: {context}"
            """
            prompt = ChatPromptTemplate.from_messages(
                [
                    ("system", system_prompt),
                    ("human", "{input}"),
                ]
            )
            question_answer_chain = create_stuff_documents_chain(self.model, prompt)
            rag_chain = create_retrieval_chain(self.retriever, question_answer_chain)

            return rag_chain
        except Exception as e:
            raise RuntimeError(f"Failed to create retrieval chain. ERROR: {e}")

With this we have finished creating the code functionalities of the RAG app. Now let’s wrap the app with FastAPI.

Code for the FastAPI Application

import sys
import os
import uvicorn

from src import QnA, Ingestion
from fastapi import FastAPI, Request, File, UploadFile
from fastapi.responses import StreamingResponse

app = FastAPI()

ingestion = Ingestion()
chatbot = QnA()
rag_chain = chatbot.create_rag_chain()


@app.get("/")
def hello():
    return {"message": "API Running in server 8089"}


@app.post("/query")
async def ask_query(request: Request):
    data = await request.json()
    question = data.get("question")

    async def event_generator():
        for chunk in rag_chain.pick("answer").stream({"input": question}):
            yield chunk

    return StreamingResponse(event_generator(), media_type="text/plain")


@app.post("/ingest")
async def ingest_document(file: UploadFile = File(...)):
    try:
        os.makedirs("files", exist_ok=True)
        file_location = f"files/{file.filename}"
        with open(file_location, "wb+") as file_object:
            file_object.write(file.file.read())

        size = await ingestion.create_and_add_embeddings(file=file_location)
        return {"message": f"File ingested! Document count: {size}"}
    except Exception as e:
        return {"message": f"An error occured: {e}"}


if __name__ == "__main__":
    try:
        uvicorn.run(app, host="0.0.0.0", port=8089)
    except KeyboardInterrupt as e:
        print("App stopped!")

Let’s breakdown the app by each endpoints:

First we initialize the FastAPI app, the Ingestion and the QnA objects. We then create a RAG chain using the create_rag_chain method of QnA class.
Our first endpoint is a simple GET method. This will help us know whether the app is healthy or not. Think of it like a ‘Hello World’ endpoint.
The second is the query endpoint. This is a POST method and will be used to run the chain. It takes in a request parameter, from which we extract the user’s query. Then we create a asynchronous method that acts as an asynchronous wrapper around the chain.stream function call. We need to do this to allow FastAPI to handle the LLM’s stream function call, to get a ChatGPT-like experience in the chat interface. We then wrap the asynchronous method with StreamingResponse class and return it.
The third endpoint is the ingestion endpoint. It also is a POST method that takes in the entire file as bytes as input. We store this file in the local directory and then ingest it using the create_and_add_embeddings method of Ingestion class.

Finally, we run the app using uvicorn package, using host and port. To test the app, simply run the application using the following command:

python app.py

Use a API testing IDE like Postman, Insomnia or Bruno for testing the application. You can also use Thunder Client extension to do the same.

Testing the Ingestion endpoint:

Testing the query endpoint:

Designing the Front-End with Reflex

We have successfully created a FastAPI app for the backend of our RAG application. It’s time to build our front-end. You can chose any front-end library for this, but for this particular article we will build the front-end using Reflex. Reflex is a python-only front-end library, created to build web applications, purely using python. It proves us with templates for common applications like calculator, image generation and chatbot. We will use the chatbot application template as a start for our user interface. Our final app will have the following structure, so let’s have it here for reference.

Frontend Directory

We will have a frontend directory for this:

frontend
├── assets
│   └── favicon.ico
├── docs
│   └── demo.gif
├── chat
│   ├── components
│   │   ├── chat.py
│   │   ├── file_upload.py
│   │   ├── __init__.py
│   │   ├── loading_icon.py
│   │   ├── modal.py
│   │   └── navbar.py
│   ├── __init__.py
│   ├── chat.py
│   └── state.py
├── requirements.txt
├── rxconfig.py
└── uploaded_files

Steps for Final App

Follow the steps to prepare the grounding for the final app.

Step1: Clone the chat template repository in the frontend directory

git clone https://github.com/reflex-dev/reflex-chat.git .

Step2: Run the following command to initialize the directory as a reflex app

reflex init

This will setup the reflex app and will be ready to run and develop.

Step3: Test the app, use the following command from inside the frontend directory

reflex run

Let’s start modifying the components. First let’s modify the chat.py file.

Below is the code for the same:

import reflex as rx
from reflex_demo.components import loading_icon
from reflex_demo.state import QA, State

message_style = dict(
    display="inline-block",
    padding="0 10px",
    border_radius="8px",
    max_width=["30em", "30em", "50em", "50em", "50em", "50em"],
)


def message(qa: QA) -> rx.Component:
    """A single question/answer message.

    Args:
        qa: The question/answer pair.

    Returns:
        A component displaying the question/answer pair.
    """
    return rx.box(
        rx.box(
            rx.markdown(
                qa.question,
                background_color=rx.color("mauve", 4),
                color=rx.color("mauve", 12),
                **message_style,
            ),
            text_align="right",
            margin_top="1em",
        ),
        rx.box(
            rx.markdown(
                qa.answer,
                background_color=rx.color("accent", 4),
                color=rx.color("accent", 12),
                **message_style,
            ),
            text_align="left",
            padding_top="1em",
        ),
        width="100%",
    )


def chat() -> rx.Component:
    """List all the messages in a single conversation."""
    return rx.vstack(
        rx.box(rx.foreach(State.chats[State.current_chat], message), width="100%"),
        py="8",
        flex="1",
        width="100%",
        max_width="50em",
        padding_x="4px",
        align_self="center",
        overflow="hidden",
        padding_bottom="5em",
    )


def action_bar() -> rx.Component:
    """The action bar to send a new message."""
    return rx.center(
        rx.vstack(
            rx.chakra.form(
                rx.chakra.form_control(
                    rx.hstack(
                        rx.input(
                            rx.input.slot(
                                rx.tooltip(
                                    rx.icon("info", size=18),
                                    content="Enter a question to get a response.",
                                )
                            ),
                            placeholder="Type something...",
                            id="question",
                            width=["15em", "20em", "45em", "50em", "50em", "50em"],
                        ),
                        rx.button(
                            rx.cond(
                                State.processing,
                                loading_icon(height="1em"),
                                rx.text("Send", font_family="Ubuntu"),
                            ),
                            type="submit",
                        ),
                        align_items="center",
                    ),
                    is_disabled=State.processing,
                ),
                on_submit=State.process_question,
                reset_on_submit=True,
            ),
            rx.text(
                "ReflexGPT may return factually incorrect or misleading responses. Use discretion.",
                text_align="center",
                font_size=".75em",
                color=rx.color("mauve", 10),
                font_family="Ubuntu",
            ),
            rx.logo(margin_top="-1em", margin_bottom="-1em"),
            align_items="center",
        ),
        position="sticky",
        bottom="0",
        left="0",
        padding_y="16px",
        backdrop_filter="auto",
        backdrop_blur="lg",
        border_top=f"1px solid {rx.color('mauve', 3)}",
        background_color=rx.color("mauve", 2),
        align_items="stretch",
        width="100%",
    )

The changes are minimal from the one present natively in the template.

Next, we will edit the chat.py app. This is the main chat component.

Code for Main Chat Component

Below is the code for it:

import reflex as rx
from reflex_demo.components import chat, navbar, upload_form
from reflex_demo.state import State


@rx.page(route="/chat", title="RAG Chatbot")
def chat_interface() -> rx.Component:
    return rx.chakra.vstack(
        navbar(),
        chat.chat(),
        chat.action_bar(),
        background_color=rx.color("mauve", 1),
        color=rx.color("mauve", 12),
        min_height="100vh",
        align_items="stretch",
        spacing="0",
    )


@rx.page(route="/", title="RAG Chatbot")
def index() -> rx.Component:
    return rx.chakra.vstack(
        navbar(),
        upload_form(),
        background_color=rx.color("mauve", 1),
        color=rx.color("mauve", 12),
        min_height="100vh",
        align_items="stretch",
        spacing="0",
    )


# Add state and page to the app.
app = rx.App(
    theme=rx.theme(
        appearance="dark",
        accent_color="jade",
    ),
    stylesheets=["https://fonts.googleapis.com/css2?family=Ubuntu&display=swap"],
    style={
        "font_family": "Ubuntu",
    },
)
app.add_page(index)
app.add_page(chat_interface)

This is the code for the chat interface. We have only added the Font family to the app config, the rest of the code is the same.

Next let’s edit the state.py file. This is where the frontend will make call to the API endpoints for response.

Editing state.py File

import requests
import reflex as rx


class QA(rx.Base):
    question: str
    answer: str


DEFAULT_CHATS = {
    "Intros": [],
}


class State(rx.State):
    chats: dict[str, list[QA]] = DEFAULT_CHATS
    current_chat = "Intros"
    url: str = "http://localhost:8089/query"
    question: str
    processing: bool = False
    new_chat_name: str = ""

    def create_chat(self):
        """Create a new chat."""
        # Add the new chat to the list of chats.
        self.current_chat = self.new_chat_name
        self.chats[self.new_chat_name] = []

    def delete_chat(self):
        """Delete the current chat."""
        del self.chats[self.current_chat]
        if len(self.chats) == 0:
            self.chats = DEFAULT_CHATS
        self.current_chat = list(self.chats.keys())[0]

    def set_chat(self, chat_name: str):
        """Set the name of the current chat.

        Args:
            chat_name: The name of the chat.
        """
        self.current_chat = chat_name

    @rx.var
    def chat_titles(self) -> list[str]:
        """Get the list of chat titles.

        Returns:
            The list of chat names.
        """
        return list(self.chats.keys())

    async def process_question(self, form_data: dict[str, str]):
        # Get the question from the form
        question = form_data["question"]

        # Check if the question is empty
        if question == "":
            return

        model = self.openai_process_question

        async for value in model(question):
            yield value

    async def openai_process_question(self, question: str):
        """Get the response from the API.

        Args:
            form_data: A dict with the current question.
        """
        # Add the question to the list of questions.
        qa = QA(question=question, answer="")
        self.chats[self.current_chat].append(qa)
        payload = {"question": question}

        # Clear the input and start the processing.
        self.processing = True
        yield

        response = requests.post(self.url, json=payload, stream=True)

        # Stream the results, yielding after every word.
        for answer_text in response.iter_content(chunk_size=512):
            # Ensure answer_text is not None before concatenation
            answer_text = answer_text.decode()
            if answer_text is not None:
                self.chats[self.current_chat][-1].answer += answer_text
            else:
                answer_text = ""
                self.chats[self.current_chat][-1].answer += answer_text
            self.chats = self.chats
            yield

        # Toggle the processing flag.
        self.processing = False

In this file, we have defined the URL for the query endpoint. We have also modified the openai_process_question method to send a POST request to the query endpoint and get the streaming
response, which will be displayed in the chat interface.

Writing Contents of the file_upload.py File

Finally, let’s write the contents of the file_upload.py file. This component will be displayed in the beginning which will allow us to upload the file for ingestion.

import reflex as rx
import os
import time

import requests


class UploadExample(rx.State):
    uploading: bool = False
    ingesting: bool = False
    progress: int = 0
    total_bytes: int = 0
    ingestion_url = "http://127.0.0.1:8089/ingest"

    async def handle_upload(self, files: list[rx.UploadFile]):
        self.ingesting = True
        yield
        for file in files:
            file_bytes = await file.read()
            file_name = file.filename
            files = {
                "file": (os.path.basename(file_name), file_bytes, "multipart/form-data")
            }
            response = requests.post(self.ingestion_url, files=files)
            self.ingesting = False
            yield
            if response.status_code == 200:
                # yield rx.redirect("/chat")
                self.show_redirect_popup()

    def handle_upload_progress(self, progress: dict):
        self.uploading = True
        self.progress = round(progress["progress"] * 100)
        if self.progress >= 100:
            self.uploading = False

    def cancel_upload(self):
        self.uploading = False
        return rx.cancel_upload("upload3")


def upload_form():
    return rx.vstack(
        rx.upload(
            rx.flex(
                rx.text(
                    "Drag and drop file here or click to select file",
                    font_family="Ubuntu",
                ),
                rx.icon("upload", size=30),
                direction="column",
                align="center",
            ),
            id="upload3",
            border="1px solid rgb(233, 233,233, 0.4)",
            margin="5em 0 10px 0",
            background_color="rgb(107,99,246)",
            border_radius="8px",
            padding="1em",
        ),
        rx.vstack(rx.foreach(rx.selected_files("upload3"), rx.text)),
        rx.cond(
            ~UploadExample.ingesting,
            rx.button(
                "Upload",
                on_click=UploadExample.handle_upload(
                    rx.upload_files(
                        upload_id="upload3",
                        on_upload_progress=UploadExample.handle_upload_progress,
                    ),
                ),
            ),
            rx.flex(
                rx.spinner(size="3", loading=UploadExample.ingesting),
                rx.button(
                    "Cancel",
                    on_click=UploadExample.cancel_upload,
                ),
                align="center",
                spacing="3",
            ),
        ),
        rx.alert_dialog.root(
            rx.alert_dialog.trigger(
                rx.button("Continue to Chat", color_scheme="green"),
            ),
            rx.alert_dialog.content(
                rx.alert_dialog.title("Redirect to Chat Interface?"),
                rx.alert_dialog.description(
                    "You will be redirected to the Chat Interface.",
                    size="2",
                ),
                rx.flex(
                    rx.alert_dialog.cancel(
                        rx.button(
                            "Cancel",
                            variant="soft",
                            color_scheme="gray",
                        ),
                    ),
                    rx.alert_dialog.action(
                        rx.button(
                            "Continue",
                            color_scheme="green",
                            variant="solid",
                            on_click=rx.redirect("/chat"),
                        ),
                    ),
                    spacing="3",
                    margin_top="16px",
                    justify="end",
                ),
                style={"max_width": 450},
            ),
        ),
        align="center",
    )

This component will allow us to upload a file and ingest it into the vector store. It uses the ingest endpoint of our FastAPI app to upload and ingest the file. After ingestion, the user can simply move
to the chat interface for asking queries.

With this we have completed building the front-end for our application. Now we will need to test the application using some document.

Testing and Deployment

Now let’s test the application on some manuals or documents. To use the application, we need to run both the back-end app and the reflex app separately. Run the back-end app from it’s directory using the
following command:

python app.py

Wait for the FastAPI to start running. Then in another terminal instance run the front-end app using the following command:

reflex run

One the apps are up and running, got to the following URL to access the reflex app. Initially we would be in the File Upload page. Upload a file and press the upload button.

The file will be uploaded and ingested. This will take a while depending on the document size and
the device specs. Once it’s done, click on the ‘Continue to Chat’ button to move to the chat interface. Write your query and press Send.

Conclusion

In this two part series, you’ve now built a complete and functional RAG application on a Raspberry Pi, from creating the core pipeline to wrapping it with a FastAPI back-end and developing a Reflex-based front-end. With these tools, your RAG pipeline is accessible and interactive, providing real-time query processing through a user-friendly web interface. By mastering these steps, you’ve gained valuable experience in building and deploying end-to-end applications on a compact, efficient platform. This setup opens the door to countless possibilities for deploying AI-driven applications on resource-constrained devices like the Raspberry Pi, making cutting-edge technology more accessible and practical for everyday use.

Key Takeaways

A detailed guide is provided on setting up the development environment, including installing necessary dependencies and models using Ollama, ensuring the application is ready for the final build.
The article explains how to wrap the RAG pipeline in a FastAPI application, including setting up endpoints for querying the model and ingesting documents, making the pipeline accessible via a web API.
The front-end of the RAG application is built using Reflex, a Python-only front-end library. The article demonstrates how to modify the chat application template to create a user-friendly interface for interacting with the RAG pipeline.
The article guides on integrating the FastAPI backend with the Reflex front-end and deploying the complete application on a Raspberry Pi, ensuring seamless operation and user accessibility.
Practical steps are provided for testing both the ingestion and query endpoints using tools like Postman or Thunder Client, along with running and testing the Reflex front-end to ensure the entire application functions as expected.

Frequently Asked Question

Q1: How can I make the app accessible to myself from anywhere in the World without compromising security?

A. There is a platform named Tailscale that allows your devices to be connected to a private secure network, accessible only to you. You can add your Raspberry Pi and other devices to Tailscale devices and connect to the VPN to access your apps, from anywhere within the world.

Q2: My application is very slow in terms of ingestion and QnA.

A. That is the constraint due to low hardware specifications of Raspberry Pi. The article is just a head up tutorial on how to start building RAG app using Raspberry Pi and Ollama.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Subhadeep Mandal

A Machine Learning and Deep Learning practitioner with a background in Computer Science Engineering. My work interests include Machine Learning, Deep Learning, Computer Vision and NLP, with expertise in Generative AI and Retrieval Augmented Generation.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Self Hosting RAG Applications On Edge Devices with Langchain and Ollama–Part II

Introduction

Learning Objectives

Table of contents

Creating Python Environment

Developing the Back-End with FastAPI

Code for Ingestion Class

Code for QnA Class

Code for the FastAPI Application

Designing the Front-End with Reflex

Frontend Directory

Steps for Final App

Step1: Clone the chat template repository in the frontend directory

Step2: Run the following command to initialize the directory as a reflex app

Step3: Test the app, use the following command from inside the frontend directory

Code for Main Chat Component

Editing state.py File

Writing Contents of the file_upload.py File

Testing and Deployment

Conclusion

Key Takeaways

Frequently Asked Question

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit