Building a Financial Report Retrieval System with LlamaIndex and Gemini 2.0

Avijit Biswas Last Updated : 13 Mar, 2025

9 min read

Financial reports are critical for assessing a company’s health. They span hundreds of pages, making it difficult to extract specific insights efficiently. Analysts and investors spend hours sifting through balance sheets, income statements and footnotes just to answer simple questions such as – What was the company’s revenue in 2024? With recent advancements in LLM models and vector search technologies, we can automate financial report analysis using LlamaIndex and related frameworks. This blog post explores how we can use LlamaIndex, ChromaDB, Gemini2.0, and Ollama to build a robust financial RAG system that answers queries from lengthy reports with precision.

Learning Objectives

Understand the need for financial report retrieval systems for efficient analysis.
Learn how to preprocess and vectorize financial reports using LlamaIndex.
Explore ChromaDB for building a robust vector database for document retrieval.
Implement query engines using Gemini 2.0 and Llama 3.2 for financial data analysis.
Discover advanced query routing techniques using LlamaIndex for enhanced insights.

This article was published as a part of the Data Science Blogathon.

Why do we need a Financial Report Retrieval System?
Project Implementation
Documents Processing with Llamaindex
Building the Vector Database with ChromaDB
Query Financial Data with Gemini 2.0
Local Query Using Llama 3.2
Advanced Query Routing with LlamaIndex
Conclusion
Frequently Asked Questions

Why do we need a Financial Report Retrieval System?

Financial reports contain critical insights about a company’s performance, including revenue, expenses, liabilities, and profitability. However, these reports are huge, lengthy, and full of technical jargon, making it extremely time consuming for analysts, investors, and executives to extract relevant information manually.

A financial Report Retrieval System can automate this process by enabling natural language queries. Instead of searching through PDFs, users can simply ask questions like, “What was the revenue in 2023?” or “Summarize the liquidity concerns for 2023.” The system quickly retrieves and summarizes relevant sections, saving hours of manual effort.

Project Implementation

For project implementation we need to first set up the environment and install the required libraries:

Step 1: Setting Up the Environment

We will start by creating and conda env for our development work.

$conda create --name finrag python=3.12

$conda activate finrag

Step 2: Install essential Python libraries

Installing libraires is the crucial step for any project implementation:

$pip install llama-index llama-index-vector-stores-chroma chromadb
$pip install llama-index-llms-gemini llama-index-llms-ollama
$pip install llama-index-embeddings-gemini llama-index-embeddings-ollama
$pip install python-dotenv nest-asyncio pypdf

Step 3: Creating Project Directory

Now create a project directory and create a file named .env and on that file put all your API keys for secure API key management.

# on .env file

GOOGLE_API_KEY="<your-api-key>"

We load the environment variable from that .env file to store the sensitive API key securely. This ensures that our Gemini API or Google API remains protected.

We will do our project using Jupyter Notebook.
Create a Jupyter Notebook file and start implementing step by step.

Step 4: Loading API key

Now we will load the API key below:

import os
from dotenv import load_dotenv

load_dotenv()

GEMINI_API_KEY = os.getenv("GOOGLE_API_KEY")

# Only to check .env is accessing properly or not.
# print(f"GEMINI_API_KEY: {GEMINI_API_KEY}")

Now, our enviroment ready so we can go to the next most important phase.

Documents Processing with Llamaindex

Collecting Motorsport Games Inc. financial report from AnnualReports website.

Download Link here.

First page looks like:

This reports have a total of 123 pages, but I just take the financial statements of the reports and create a new PDF for our project.

How I do it? It is very easy with PyPDF libraries.

from pypdf import PdfReader
from pypdf import PdfWriter

reader = PdfReader("NASDAQ_MSGM_2023.pdf")
writer = PdfWriter()

# page 66 to 104 have financial statements.
page_to_extract = range(66, 104)

for page_num in page_to_extract:
    writer.add_page(reader.pages[page_num])


output_pdf = "Motorsport_Games_Financial_report.pdf"
with open(output_pdf, "wb") as outfile:
    writer.write(output_pdf)

print(f"New PDF created: {output_pdf}")

The new report file has only 38 pages, which will help us to embed the document quickly.

Loading and Splitting Financial Reports

In your project data directory, put your newly created Motorsport_Games_Financial_report.pdf file, which will be indexed for the project.

Financial reports are typically in PDF format, containing extensive tabular data, footnotes, and legal statements. We use LlamaIndex’s SimpleDirectoryReader to load these documents and convert them to documents.

from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data").load_data()

Since reports are very large to process as a single documents, we slit them into smaller chunk or nodes. Each chunk corresponds to a page or section, it helps retrieval more efficiently.

from copy import deepcopy
from llama_index.core.schema import TextNode

def get_page_nodes(docs, separator="\n---\n"):
    """Split each document into page node, by separator."""
    nodes = []
    for doc in docs:
        doc_chunks = doc.text.split(separator)
        for doc_chunk in doc_chunks:
            node = TextNode(
                text=doc_chunk,
                metadata=deepcopy(doc.metadata),
            )
            nodes.append(node)

    return nodes

To understand the process of the document ingestion see below diagram.

document-ingestion-flow — Source: Author

Now our financial data is ready for vectorizing and storing for retrieval.

Building the Vector Database with ChromaDB

We will use ChromaDB for fast, accurate, and local vector database. Our embedded representation of financial text will be stored into ChromaDB.

We initialize the vector database and configure Nomic-embed-text model using Ollama for local embedding generation.

import chromadb
from llama_index.llms.gemini import Gemini
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import Settings

embed_model = OllamaEmbedding(model_name="nomic-embed-text")

chroma_client = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = chroma_client.get_or_create_collection("financial_collection")

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

Finally, we create a Vector Index using LLamaIndex’s VectorStoreIndex. This index links our vector database to LlamaIndex’s query engine.

from llama_index.core import VectorStoreIndex, StorageContext

storage_context = StorageContext.from_defaults(vector_store=vector_store)
vector_index = VectorStoreIndex.from_documents(documents=documents, storage_context=storage_context, embed_model=embed_model)

The above code will create the Vector Index using nomic-embed-text from financial text documents. It will take time, depending on your local system specification.

When your indexing is done, then you can use the code for reuse that is embedded when necessary without re-indexing again.

vector_index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store, embed_model=embed_model
)

This will allow you use chromadb embedding file from the storage.

Now our heavy loading was done, time for query the report and relax.

Query Financial Data with Gemini 2.0

Once our financial data is indexed, we can ask natural language questions and receive accurate answers. For querying we will use Gemini-2.0 Flash model which interacts with our vector database to fetch relevant sections and generate insights responses.

Setting-up Gemini-2.0

from llama_index.llms.gemini import Gemini

llm = Gemini(api_key=GEMINI_API_KEY, model_name="models/gemini-2.0-flash")

Initiate query engine using Gemini 2.0 with vector index

query_engine = vector_index.as_query_engine(llm=llm, similarity_top_k=5)

Example Queries and Response

Below we have multiple queries with different responses:

Query-1

response = query_engine.query("what is the revenue of on 2022 Year Ended December 31?")

print(str(response))

Response

Corresponding image from Report:

Query-2

response = query_engine.query(
    "what is the Net Loss Attributable to Motossport Games Inc. on 2022 Year Ended December 31?"
)

print(str(response))

Response

Corresponding image from Report:

netloss: Financial Report Retrieval System — Source: Author

Query-3

response = query_engine.query(
    "What are the Liquidity and Going concern for the Company on December 31, 2023"
)

print(str(response))

Response

Query-4

response = query_engine.query(
    "Summarise the Principal versus agent considerations of the company?"
)

print(str(response))

Response

Corresonding image from Report:

Query-5

response = query_engine.query(
    "Summarise the Net Loss Per Common Share of the company with financial data?"
)

print(str(response))

Response

Corresonding image from Report:

Query-6

response = query_engine.query(
    "Summarise Property and equipment consist of the following balances as of December 31, 2023 and 2022 of the company with financial data?"
)

print(str(response))

Response

Corresponding image from Report:

Query-7

response = query_engine.query(
    "Summarise The Intangible Assets on December 21, 2023 of the company with financial data?"
)

print(str(response))

Response

Query-8

response = query_engine.query(
    "What are leases of the company with yearwise financial data?"
)

print(str(response))

Response

Corresponding image from Report:

Local Query Using Llama 3.2

Leverage Llama 3.2 locally to query financial reports without relying on cloud-based models.

Seting Up Llama 3.2:1b

local_llm = Ollama(model="llama3.2:1b", request_timeout=1000.0)
local_query_engine = vector_index.as_query_engine(llm=local_llm, similarity_top_k=3)

Query-9

response = local_query_engine.query(
    "Summary of chart of Accrued expenses and other liabilities using the financial data of the company"
)

print(str(response))

Response

Corresonding image from Report:

accrued expenses: Financial Report Retrieval System — Source: Author

Advanced Query Routing with LlamaIndex

Sometimes, we need both detailed retrieval and summarized insights. We can do this by combining both vector index and summary index.

Vector Index for precise document retrieval
Summary Index for concise financial summaries

We have already built the Vector Index, now we will create a summary Index that uses a hierarchical approach to summarizing financial statements.

from llama_index.core import SummaryIndex

summary_index = SummaryIndex(nodes=page_nodes)

Then integrate RouterQueryEngine, which conditionally decides whether to retrieve data from the summary index or the vector index based on the query type.

from llama_index.core.tools import QueryEngineTool
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector

Now creating summary query engine

summary_query_engine = summary_index.as_query_engine(
    llm=llm, response_mode="tree_summarize", use_async=True
)

This summary query engine goes into the summary tool. and the vector query engine into the vector tool.

# Creating summary tool
summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "Useful for summarization questions related to Motorsport Games Company."
    ),
)


# Creating vector tool

vector_tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    description=(
        "Useful for retriving specific context from the Motorsport Games Company."
    ),
)

Both of the tools is done now we connect these tools through Router so that when query ass through the router it will decide which tool to use by analyzing user query.

# Router Query Engine

adv_query_engine = RouterQueryEngine(
    llm=llm,
    selector=LLMSingleSelector.from_defaults(llm=llm),
    query_engine_tools=[summary_tool, vector_tool],
    verbose=True,
)

Our advanced query system is fully set up, now query our newly favored advanced query engine.

Query-10

response = adv_query_engine.query(
    "Summarize the charts describing the revenure of the company."
)
print(str(response))

Response

You can see that our intelligent router will decide to use the summary tool because in the query user asks for summary.

Query-11

response = adv_query_engine.query("What is the Total Assets of the company Yearwise?")
print(str(response))

Response

And here the Router selects Vector tool because the user asks for specific information, not summary.

All the code used in this article is here

Conclusion

We can efficiently analyze the financial reports with LlamaIndex, ChromaDB and Advanced LLMs. This system enables automated financial insights, real-time querying, and powerful summarization. This type of system makes financial analysis more accessible and efficient to take better decisions during investing, trading, and doing business.

Key Takeaways

LLM powered document retrieval system can drastically reduce the time spent on analyzing complex financial reports.
A hybrid approach using cloud and local LLMs ensures a cost-effective, privacy, and flexible way to design a system.
LlamaIndex’s modular framework allows for an easy way to automate the financial report rag workflows
This type of system can be adapted for different domains such as legal documents, medical reports and regulatory filing, which makes it a versatile RAG solution.

Frequently Asked Questions

Q1. How does the system handles different financial reports?

A. The system is designed to process any structured financial documents by breaking them into text chunks, embedding them and storing them in ChromaDB. New reports can be added dynamically without requiring a complete re-indexing.

Q2. Can this be extended to generate financial charts and visualizations?

A. Yes, by integrating Matplotlib, Pandas and Streamlit, you can visualize trends such as revenue growth, net loss analysis, or asset distribution.

Q3. How does the query routing system improve accuracy?

A. The RouterQueryEngine automatically detects whether a query requires a summarized response or specific financial data retrieval. This reduces irrelevant outputs and ensures precision in responses.

Q4. In this system suitable for real-time financial analysis?

A. It can, but it depends on how frequently the vector store is updated. You can use OpenAI embedding API for continuous ingestion pipeline for real-time financial report query dynamically.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Avijit Biswas

A self-taught, project-driven learner, love to work on complex projects on deep learning, Computer vision, and NLP. I always try to get a deep understanding of the topic which may be in any field such as Deep learning, Machine learning, or Physics. Love to create content on my learning. Try to share my understanding with the worlds.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Building a Financial Report Retrieval System with LlamaIndex and Gemini 2.0

Learning Objectives

Table of contents

Why do we need a Financial Report Retrieval System?

Project Implementation

Step 1: Setting Up the Environment

Step 2: Install essential Python libraries

Step 3: Creating Project Directory

Step 4: Loading API key

Documents Processing with Llamaindex

Loading and Splitting Financial Reports

Building the Vector Database with ChromaDB

Query Financial Data with Gemini 2.0

Setting-up Gemini-2.0

Initiate query engine using Gemini 2.0 with vector index

Example Queries and Response

Query-1

Response

Query-2

Response

Query-3

Response

Query-4

Response

Query-5

Response

Query-6

Response

Query-7

Response

Query-8

Response

Local Query Using Llama 3.2

Seting Up Llama 3.2:1b

Query-9

Response

Advanced Query Routing with LlamaIndex

Query-10

Response

Query-11

Response

Conclusion

Key Takeaways

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I