Guide to Build a Math Problem Solver Chat App with LangChain

Gourav Lohar Last Updated : 25 Nov, 2024

10 min read

Ever wished you had a personal tutor to help you solve tricky math problems? In this article, we’ll explore how to build a math problem solver chat app using LangChain, Gemma 9b, Llama 3.2 Vision and Streamlit. Our app will not only understand and solve text-based math problems but also able to solve image-based questions. Let’s look at the problem statement and explore how to approach and solve this problem step-by-step.

Learning Outcomes

Learn to create a powerful, interactive Chat App using LangChain to integrate external tools and solve tasks.
Master the process of building a Chat App with LangChain that can efficiently solve complex math problems.
Explore the use of APIs and environment variables to securely interact with large language models.
Gain hands-on experience in designing a user-friendly web app with dynamic question-solving capabilities.
Discover techniques for seamless interaction between frontend interfaces and backend AI models.

This article was published as a part of the Data Science Blogathon.

Learning Outcomes
Defining the Challenge: Business Case and Objectives
Proposed Solution: Approach and Implementation Strategy
Visualizing the Approach: Flow Diagram of the Solution
Setting Up the Foundation
Processing Text-Based Inquiries
Processing Image-Based Inquiries
Conclusion
- Key Takeaways
Frequently Asked Questions

Defining the Challenge: Business Case and Objectives

We are an EdTech company looking to develop an innovative AI-powered application that can solve both text-based and image-based math problems in real-time. The app should provide solutions with step-by-step explanations to enhance learning and engagement for students, educators, and independent learners.

We are tasking you to design and build this application using latest AI technologies. The app must be scalable, user-friendly, and capable of processing both textual inputs and images with a seamless experience.

Proposed Solution: Approach and Implementation Strategy

We will now discuss proposed solutions below:

Gemma2-9B It

It is an open source large language model from Google designed to process and generate human-like text with remarkable accuracy. In this application:

Role: It serves as the “brain” for solving math problems presented in text format.
How It Works: When a user inputs a text-based math problem, Gemma2-9B understands the question, applies the necessary mathematical logic, and generates a solution.

Llama 3.2 Vision

It is an open source Model from Meta AI, capable of processing and analyzing images, including handwritten or printed math problems.

Role: Enables the app to “see” and interpret math problems provided in image format and generate the response.
How It Works: When users upload an image, Llama 3.2 Vision Model identifies the mathematical expressions or questions within it, converts them into a format suitable for problem-solving.

LangChain

It is a framework specifically designed for building applications that involve interactions between language models and external systems.

Role: Acts as the intermediary between the app’s interface and the AI models, managing the flow of information.
How It Works:
- It coordinates how the user’s input (text or image) is processed.
- It ensures the smooth exchange of data between Gemma2-9B, Llama 3.2 Vision Model, and the app interface.

Streamlit

It is an open-source Python library for creating interactive web applications quickly and easily.

Role: It is used to write frontend in Python
How It Works:
- Developers can use Streamlit to design and deploy a web interface where users input text or upload images.
- The interface interacts seamlessly with LangChain and the underlying AI models to display results.

Visualizing the Approach: Flow Diagram of the Solution

The process begins by setting up the environment, checking the Groq API key, and configuring the Streamlit page settings. It then initializes the Text LLM (ChatGroq) and integrates tools like Wikipedia and a Calculator to enhance the text agent’s capabilities. A welcome message and sidebar navigation guide the user through the interface, where they can input either text or image-based queries. The text section collects user questions and processes them using the text agent, which utilizes the LLM and external tools to generate answers. Similarly, for image queries, the image section allows users to upload images, which are then processed by the image-specific LLM (ChatGroq).

Once the text or image query is processed, the respective agent generates and displays the appropriate answers. The system ensures smooth interaction by alternating between handling text and image queries. After displaying the answers, the process concludes, and the system is ready for the next query. This flow creates an intuitive, multi-modal experience where users can ask both text and image-based questions, with the system providing accurate and efficient responses.

Visualizing the Approach: Flow Diagram of the Solution

Setting Up the Foundation

Setting up the foundation is a crucial step in ensuring a seamless integration of tools and processes, laying the groundwork for the successful operation of the system.

Environment Setup

First things first, set up your development environment. Make sure you have Python installed and create a virtual environment to keep your project dependencies organized.

# Create a Environment
python -m venv env

# Activate it on Windows
.\env\Scripts\activate

# Activate in MacOS/Linux
source env/bin/activate

Install Dependencies

Install the necessary libraries using

pip install -r https://raw.githubusercontent.com/Gouravlohar/Math-Solver/refs/heads/master/requirements.txt

Get the Groq API

To access the llama and Gemma Model we will use Groq .
Get your Free API Key from here .

Import Necessary Libraries

import streamlit as st
import os
import base64
from dotenv import load_dotenv
from langchain_groq import ChatGroq
from langchain.chains import LLMMathChain, LLMChain
from langchain.prompts import PromptTemplate
from langchain_community.utilities import WikipediaAPIWrapper
from langchain.agents.agent_types import AgentType
from langchain.agents import Tool, initialize_agent
from langchain_community.callbacks.streamlit import StreamlitCallbackHandler
from groq import Groq

These imports collectively set up the necessary libraries and modules to create a Streamlit web application that interacts with language models for solving mathematical problems and answering questions based on text and image inputs.

Load Environment Variables

load_dotenv()
groq_api_key = os.getenv("GROQ_API_KEY")

if not groq_api_key:
    st.error("Groq API Key not found in .env file")
    st.stop()

This section of the code is responsible for loading environment variables and ensuring that the necessary API key for Groq is available

Set up the Both LLM’s

st.set_page_config(page_title="Math Solver", page_icon="👨‍🔬")
st.title("Math Solver")
llm_text = ChatGroq(model="gemma2-9b-it", groq_api_key=groq_api_key)
llm_image = ChatGroq(model="llama-3.2-90b-vision-preview", groq_api_key=groq_api_key)

This section of the code sets up the Streamlit application by configuring its page title and icon. It then initializes two different language models (LLMs) from llm_text for handling text-based questions using the “gemma2-9b-it” model, and llm_image for handling questions that include images using the “llama-3.2-90b-vision-preview” model. Both models are authenticated using the previously retrieved Groq API key.

Initialize Tools and Prompt Template

wikipedia_wrapper = WikipediaAPIWrapper()
wikipedia_tool = Tool(
    name="Wikipedia",
    func=wikipedia_wrapper.run,
    description="A tool for searching the Internet to find various information on the topics mentioned."
)
math_chain = LLMMathChain.from_llm(llm=llm_text)
calculator = Tool(
    name="Calculator",
    func=math_chain.run,
    description="A tool for solving mathematical problems. Provide only the mathematical expressions."
)

prompt = """
You are a mathematical problem-solving assistant tasked with helping users solve their questions. Arrive at the solution logically, providing a clear and step-by-step explanation. Present your response in a structured point-wise format for better understanding.
Question: {question}
Answer:
"""

prompt_template = PromptTemplate(
    input_variables=["question"],
    template=prompt
)
# Combine all the tools into a chain for text questions
chain = LLMChain(llm=llm_text, prompt=prompt_template)

reasoning_tool = Tool(
    name="Reasoning Tool",
    func=chain.run,
    description="A tool for answering logic-based and reasoning questions."
)

# Initialize the agents for text questions
assistant_agent_text = initialize_agent(
    tools=[wikipedia_tool, calculator, reasoning_tool],
    llm=llm_text,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=False,
    handle_parsing_errors=True
)

This part of the code initializes various tools and configurations required to handle text-based questions in the Streamlit application. It sets up the tool for Wikipedia search using the WikipediaAPIWrapper, which allows the application to fetch information from the internet, and initializes a mathematical tool using the LLMMathChain class, which uses the llm_text model for solving math problems, configured on calculator specifically for mathematical expressions. It also defines a prompt template to structure questions and expected answers in a clear, step-by-step manner. This template guides the language model to generate a logical and well-explained response to each user query.

Streamlit Session State

if "messages" not in st.session_state:
    st.session_state["messages"] = [
        {"role": "assistant", "content": "Welcome! I am your Assistant. How can I help you today?"}
    ]

for msg in st.session_state.messages:
    if msg["role"] == "user" and "image" in msg:
        st.chat_message(msg["role"]).write(msg['content'])
        st.image(msg["image"], caption='Uploaded Image', use_column_width=True)
    else:
        st.chat_message(msg["role"]).write(msg['content'])

The code initializes chat messages in the session state if they do not exist, starting with a default welcome message from the assistant. Subsequently, it loops through messages in st.session_state and prints each into the chat interface. For a message that is from a user and carries an image, the text content along with uploaded image will be rendered with a caption. If the message does not contain an image, it displays only the text content. All chat messages-besides any uploaded images-to be displayed inside the chat interface are also correct.

Sidebar and Response Cleaning

st.sidebar.header("Navigation")
if st.sidebar.button("Text Question"):
    st.session_state["section"] = "text"
if st.sidebar.button("Image Question"):
    st.session_state["section"] = "image"

if "section" not in st.session_state:
    st.session_state["section"] = "text"

def clean_response(response):
    if "```" in response:
        response = response.split("```")[1].strip()
    return response

This Section of code makes the sidebar for Text Section and Image Section and the function clean_response cleaning the response from LLM.

Processing Text-Based Inquiries

Processing text-based inquiries focuses on handling and addressing user questions in text form, utilizing language models to generate precise responses based on the input provided.

if st.session_state["section"] == "text":
    st.header("Text Question")
    st.write("Please enter your mathematical question below, and I will provide a detailed solution.")
    question = st.text_area("Your Question:", "Example: I have 5 apples and 3 oranges. If I eat 2 apples, how many fruits do I have left?")

    if st.button("Get Answer"):
        if question:
            with st.spinner("Generating response..."):
                st.session_state.messages.append({"role": "user", "content": question})
                st.chat_message("user").write(question)

                st_cb = StreamlitCallbackHandler(st.container(), expand_new_thoughts=False)
                try:
                    response = assistant_agent_text.run(st.session_state.messages, callbacks=[st_cb])
                    cleaned_response = clean_response(response)
                    st.session_state.messages.append({'role': 'assistant', "content": cleaned_response})
                    st.write('### Response:')
                    st.success(cleaned_response)
                except ValueError as e:
                    st.error(f"An error occurred: {e}")
        else:
            st.warning("Please enter a question to get an answer.")

This section of the code handles the functionality of the “Text Question” section in the Streamlit application. When the section is active, it provides a header and a space to input any question related to mathematics. On the click of the “Get Answer” button, if the question is entered in the text area, it displays a spinner that indicates a response is being generated. The question entered by the user is added to the session state messages and rendered in the chat interface.

Processing Image-Based Inquiries

Processing image-based inquiries involves analyzing and interpreting images uploaded by users, using advanced models to generate accurate responses or insights based on the visual content.

elif st.session_state["section"] == "image":
    st.header("Image Question")
    st.write("Please enter your question below and upload an image. I will provide a detailed solution.")
    question = st.text_area("Your Question:", "Example: What will be the answer?")
    uploaded_file = st.file_uploader("Upload an image", type=["jpg", "jpeg", "png"])

    if st.button("Get Answer"):
        if question and uploaded_file is not None:
            with st.spinner("Generating response..."):
                image_data = uploaded_file.read()
                image_data_url = f"data:image/jpeg;base64,{base64.b64encode(image_data).decode()}"
                st.session_state.messages.append({"role": "user", "content": question, "image": image_data})
                st.chat_message("user").write(question)
                st.image(image_data, caption='Uploaded Image', use_column_width=True)

This section of the code handles the “Image Question” functionality in the Streamlit application. When the “Image Question” section is active, it displays a header, a text area for users to input their questions, and an option to upload an image. Upon clicking the “Get Answer” button, if both a question and an image are provided, it shows a spinner indicating that a response is being generated. The uploaded image is read and encoded in base64 format. The user’s question and the image data are appended to the session state messages and displayed in the chat interface, with the image shown alongside the question. This setup ensures that both the text and image inputs are correctly captured and displayed for further processing.

Initialize Groq Client for Llama 3.2 Vision Model

                client = Groq()

                messages = [
                    {
                        "role": "user",
                        "content": [
                            {
                                "type": "text",
                                "text": question
                            },
                            {
                                "type": "image_url",
                                "image_url": {
                                    "url": image_data_url
                                }
                            }
                        ]
                    }
                ]

This section will prepare the message for Llama vision model

Groq API Call

                 try:
                    completion = client.chat.completions.create(
                        model="llama-3.2-90b-vision-preview",
                        messages=messages,
                        temperature=1,
                        max_tokens=1024,
                        top_p=1,
                        stream=False,
                        stop=None,
                    )

This setup sends the user’s question and image to the Groq API, which processes the inputs using the specified model and returns a generated response.

Response from Image Model

                    response = completion.choices[0].message.content
                    cleaned_response = clean_response(response)
                    st.session_state.messages.append({'role': 'assistant', "content": cleaned_response})
                    st.write('### Response:')
                    st.success(cleaned_response)
                except ValueError as e:
                    st.error(f"An error occurred: {e}")
        else:
            st.warning("Please enter a question and upload an image to get an answer.")

This section of the code processes the response from the Groq API after generating a completion. It extracts the content of the response from the first choice in the completion result and cleans it using the clean_response function. The system appends the cleaned response to the session state messages with the role of “assistant” and displays it in the chat interface. The response appears under a “Response” header with a success message. If a ValueError occurs, the system displays an error message. If either the question or the image is not provided, a warning prompts the user to enter both to get an answer.

Check the Full Code in GitHub Repo Here.

Response from Image Model: Chat App with LangChain

Output

Input for Text Section

A tank has three pipes attached to it. Pipe A can fill the tank in 4 hours, Pipe B can fill it in 6 hours, and Pipe C can empty the tank in 3 hours. If all three pipes are opened together, how long will it take to fill the tank completely?

Input for Image Section

Conclusion

By combining the powers of Gemma 9b, Llama 3.2 Vision, LangChain, and Streamlit, it is possible to create a robust and user-friendly math problem-solving app that can revolutionize how students learn and engage with mathematics, providing step-by-step solutions and real-time feedback. This helps overcome not only the complexity issues within mathematical concepts but, more importantly, offers a scalable and accessible solution for learners at all levels.

This is one example of many ways such large language models and AI can be used in education. As we continue to develop these technologies, even more creative and impactful applications will emerge to change how we learn and teach.

What do you think of such a concept? Have you ever tried to develop AI-based edutainment applications? Share your experiences and ideas in the comments below!

Key Takeaways

You can build a powerful math problem solver using advanced AI models like Gemma 2 9b and Llama 3.2.
Combine text and image processing to create an app that can handle various types of math problems.
Learn how to integrate LangChain with various tools to create a powerful Math Problem Solver Chat App that enhances user experience.
Leverage Groq acceleration to ensure your app delivers quick responses.
Streamlit makes it easy to build an intuitive and engaging user interface.
Consider the ethical implications and design your app to promote learning and understanding.

Frequently Asked Questions

Q1. What is Gemma 2 9b?

A. Gemma 2 9b is a powerful language model developed by Google, capable of understanding and solving complex math problems presented in text form.

Q2. How does the app handle image-based problems?

A. The app uses the Meta Llama 3.2 vision model to interpret math problems in images. It then extracts the problem and generate the response.

Q3. Can the app show step-by-step solutions?

A. Yes, you can design the app to display the steps involved in solving a problem, which can be a valuable learning tool for users.

Q4. What are the ethical considerations for such an app?

A. It’s important to ensure the app is used responsibly and doesn’t facilitate cheating or hinder genuine learning. Design features that promote understanding and encourage users to engage with the problem-solving process.

Q5. Where can I learn more about the technologies used in this app?

A. You can find more information about Gemma 2 9b, Llama 3.2, Groq, LangChain, and Streamlit on Analytics Vidhya, their respective official websites and documentation pages.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

blogathon

Gourav Lohar

Hi I'm Gourav, a Data Science Enthusiast with a medium foundation in statistical analysis, machine learning, and data visualization. My journey into the world of data began with a curiosity to unravel insights from datasets.

Advanced Generative AI LLMs Streamlit

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Data analyst Learning Path

Tableau Learning Path

NLP Learning Path

Data Scientist Learning Path

Data Engineer Learning Path

MLOps Learning Path

AI Engineer Learning Path

Computer Vision Learning Path

Generative AI Learning Path

Generative AI Roadmap for Enterprises

LLMs Roadmap

Prompt Engineer Leaning Path

Guide to Build a Math Problem Solver Chat App with LangChain

Learning Outcomes

Table of contents

Defining the Challenge: Business Case and Objectives

Proposed Solution: Approach and Implementation Strategy

Gemma2-9B It

Llama 3.2 Vision

LangChain

Streamlit

Visualizing the Approach: Flow Diagram of the Solution

Setting Up the Foundation

Environment Setup

Install Dependencies

Get the Groq API

Import Necessary Libraries

Load Environment Variables

Set up the Both LLM’s

Initialize Tools and Prompt Template

Streamlit Session State

Sidebar and Response Cleaning

Processing Text-Based Inquiries

Processing Image-Based Inquiries

Initialize Groq Client for Llama 3.2 Vision Model

Groq API Call

Response from Image Model

Output

Input for Text Section

Input for Image Section

Conclusion

Key Takeaways

Frequently Asked Questions

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Congratulations, You Did It!

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth