Ever wished you had a personal tutor to help you solve tricky math problems? In this article, we’ll explore how to build a math problem solver chat app using LangChain, Gemma 9b, Llama 3.2 Vision and Streamlit. Our app will not only understand and solve text-based math problems but also able to solve image-based questions. Let’s look at the problem statement and explore how to approach and solve this problem step-by-step.
This article was published as a part of the Data Science Blogathon.
We are an EdTech company looking to develop an innovative AI-powered application that can solve both text-based and image-based math problems in real-time. The app should provide solutions with step-by-step explanations to enhance learning and engagement for students, educators, and independent learners.
We are tasking you to design and build this application using latest AI technologies. The app must be scalable, user-friendly, and capable of processing both textual inputs and images with a seamless experience.
We will now discuss proposed solutions below:
It is an open source large language model from Google designed to process and generate human-like text with remarkable accuracy. In this application:
It is an open source Model from Meta AI, capable of processing and analyzing images, including handwritten or printed math problems.
It is a framework specifically designed for building applications that involve interactions between language models and external systems.
It is an open-source Python library for creating interactive web applications quickly and easily.
The process begins by setting up the environment, checking the Groq API key, and configuring the Streamlit page settings. It then initializes the Text LLM (ChatGroq) and integrates tools like Wikipedia and a Calculator to enhance the text agent’s capabilities. A welcome message and sidebar navigation guide the user through the interface, where they can input either text or image-based queries. The text section collects user questions and processes them using the text agent, which utilizes the LLM and external tools to generate answers. Similarly, for image queries, the image section allows users to upload images, which are then processed by the image-specific LLM (ChatGroq).
Once the text or image query is processed, the respective agent generates and displays the appropriate answers. The system ensures smooth interaction by alternating between handling text and image queries. After displaying the answers, the process concludes, and the system is ready for the next query. This flow creates an intuitive, multi-modal experience where users can ask both text and image-based questions, with the system providing accurate and efficient responses.
Setting up the foundation is a crucial step in ensuring a seamless integration of tools and processes, laying the groundwork for the successful operation of the system.
First things first, set up your development environment. Make sure you have Python installed and create a virtual environment to keep your project dependencies organized.
# Create a Environment
python -m venv env
# Activate it on Windows
.\env\Scripts\activate
# Activate in MacOS/Linux
source env/bin/activate
Install the necessary libraries using
pip install -r https://raw.githubusercontent.com/Gouravlohar/Math-Solver/refs/heads/master/requirements.txt
import streamlit as st
import os
import base64
from dotenv import load_dotenv
from langchain_groq import ChatGroq
from langchain.chains import LLMMathChain, LLMChain
from langchain.prompts import PromptTemplate
from langchain_community.utilities import WikipediaAPIWrapper
from langchain.agents.agent_types import AgentType
from langchain.agents import Tool, initialize_agent
from langchain_community.callbacks.streamlit import StreamlitCallbackHandler
from groq import Groq
These imports collectively set up the necessary libraries and modules to create a Streamlit web application that interacts with language models for solving mathematical problems and answering questions based on text and image inputs.
load_dotenv()
groq_api_key = os.getenv("GROQ_API_KEY")
if not groq_api_key:
st.error("Groq API Key not found in .env file")
st.stop()
This section of the code is responsible for loading environment variables and ensuring that the necessary API key for Groq is available
st.set_page_config(page_title="Math Solver", page_icon="👨🔬")
st.title("Math Solver")
llm_text = ChatGroq(model="gemma2-9b-it", groq_api_key=groq_api_key)
llm_image = ChatGroq(model="llama-3.2-90b-vision-preview", groq_api_key=groq_api_key)
This section of the code sets up the Streamlit application by configuring its page title and icon. It then initializes two different language models (LLMs) from llm_text for handling text-based questions using the “gemma2-9b-it” model, and llm_image for handling questions that include images using the “llama-3.2-90b-vision-preview” model. Both models are authenticated using the previously retrieved Groq API key.
wikipedia_wrapper = WikipediaAPIWrapper()
wikipedia_tool = Tool(
name="Wikipedia",
func=wikipedia_wrapper.run,
description="A tool for searching the Internet to find various information on the topics mentioned."
)
math_chain = LLMMathChain.from_llm(llm=llm_text)
calculator = Tool(
name="Calculator",
func=math_chain.run,
description="A tool for solving mathematical problems. Provide only the mathematical expressions."
)
prompt = """
You are a mathematical problem-solving assistant tasked with helping users solve their questions. Arrive at the solution logically, providing a clear and step-by-step explanation. Present your response in a structured point-wise format for better understanding.
Question: {question}
Answer:
"""
prompt_template = PromptTemplate(
input_variables=["question"],
template=prompt
)
# Combine all the tools into a chain for text questions
chain = LLMChain(llm=llm_text, prompt=prompt_template)
reasoning_tool = Tool(
name="Reasoning Tool",
func=chain.run,
description="A tool for answering logic-based and reasoning questions."
)
# Initialize the agents for text questions
assistant_agent_text = initialize_agent(
tools=[wikipedia_tool, calculator, reasoning_tool],
llm=llm_text,
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=False,
handle_parsing_errors=True
)
This part of the code initializes various tools and configurations required to handle text-based questions in the Streamlit application. It sets up the tool for Wikipedia search using the WikipediaAPIWrapper, which allows the application to fetch information from the internet, and initializes a mathematical tool using the LLMMathChain class, which uses the llm_text model for solving math problems, configured on calculator specifically for mathematical expressions. It also defines a prompt template to structure questions and expected answers in a clear, step-by-step manner. This template guides the language model to generate a logical and well-explained response to each user query.
if "messages" not in st.session_state:
st.session_state["messages"] = [
{"role": "assistant", "content": "Welcome! I am your Assistant. How can I help you today?"}
]
for msg in st.session_state.messages:
if msg["role"] == "user" and "image" in msg:
st.chat_message(msg["role"]).write(msg['content'])
st.image(msg["image"], caption='Uploaded Image', use_column_width=True)
else:
st.chat_message(msg["role"]).write(msg['content'])
The code initializes chat messages in the session state if they do not exist, starting with a default welcome message from the assistant. Subsequently, it loops through messages in st.session_state and prints each into the chat interface. For a message that is from a user and carries an image, the text content along with uploaded image will be rendered with a caption. If the message does not contain an image, it displays only the text content. All chat messages-besides any uploaded images-to be displayed inside the chat interface are also correct.
st.sidebar.header("Navigation")
if st.sidebar.button("Text Question"):
st.session_state["section"] = "text"
if st.sidebar.button("Image Question"):
st.session_state["section"] = "image"
if "section" not in st.session_state:
st.session_state["section"] = "text"
def clean_response(response):
if "```" in response:
response = response.split("```")[1].strip()
return response
This Section of code makes the sidebar for Text Section and Image Section and the function clean_response cleaning the response from LLM.
Processing text-based inquiries focuses on handling and addressing user questions in text form, utilizing language models to generate precise responses based on the input provided.
if st.session_state["section"] == "text":
st.header("Text Question")
st.write("Please enter your mathematical question below, and I will provide a detailed solution.")
question = st.text_area("Your Question:", "Example: I have 5 apples and 3 oranges. If I eat 2 apples, how many fruits do I have left?")
if st.button("Get Answer"):
if question:
with st.spinner("Generating response..."):
st.session_state.messages.append({"role": "user", "content": question})
st.chat_message("user").write(question)
st_cb = StreamlitCallbackHandler(st.container(), expand_new_thoughts=False)
try:
response = assistant_agent_text.run(st.session_state.messages, callbacks=[st_cb])
cleaned_response = clean_response(response)
st.session_state.messages.append({'role': 'assistant', "content": cleaned_response})
st.write('### Response:')
st.success(cleaned_response)
except ValueError as e:
st.error(f"An error occurred: {e}")
else:
st.warning("Please enter a question to get an answer.")
This section of the code handles the functionality of the “Text Question” section in the Streamlit application. When the section is active, it provides a header and a space to input any question related to mathematics. On the click of the “Get Answer” button, if the question is entered in the text area, it displays a spinner that indicates a response is being generated. The question entered by the user is added to the session state messages and rendered in the chat interface.
Processing image-based inquiries involves analyzing and interpreting images uploaded by users, using advanced models to generate accurate responses or insights based on the visual content.
elif st.session_state["section"] == "image":
st.header("Image Question")
st.write("Please enter your question below and upload an image. I will provide a detailed solution.")
question = st.text_area("Your Question:", "Example: What will be the answer?")
uploaded_file = st.file_uploader("Upload an image", type=["jpg", "jpeg", "png"])
if st.button("Get Answer"):
if question and uploaded_file is not None:
with st.spinner("Generating response..."):
image_data = uploaded_file.read()
image_data_url = f"data:image/jpeg;base64,{base64.b64encode(image_data).decode()}"
st.session_state.messages.append({"role": "user", "content": question, "image": image_data})
st.chat_message("user").write(question)
st.image(image_data, caption='Uploaded Image', use_column_width=True)
This section of the code handles the “Image Question” functionality in the Streamlit application. When the “Image Question” section is active, it displays a header, a text area for users to input their questions, and an option to upload an image. Upon clicking the “Get Answer” button, if both a question and an image are provided, it shows a spinner indicating that a response is being generated. The uploaded image is read and encoded in base64 format. The user’s question and the image data are appended to the session state messages and displayed in the chat interface, with the image shown alongside the question. This setup ensures that both the text and image inputs are correctly captured and displayed for further processing.
client = Groq()
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": question
},
{
"type": "image_url",
"image_url": {
"url": image_data_url
}
}
]
}
]
This section will prepare the message for Llama vision model
try:
completion = client.chat.completions.create(
model="llama-3.2-90b-vision-preview",
messages=messages,
temperature=1,
max_tokens=1024,
top_p=1,
stream=False,
stop=None,
)
This setup sends the user’s question and image to the Groq API, which processes the inputs using the specified model and returns a generated response.
response = completion.choices[0].message.content
cleaned_response = clean_response(response)
st.session_state.messages.append({'role': 'assistant', "content": cleaned_response})
st.write('### Response:')
st.success(cleaned_response)
except ValueError as e:
st.error(f"An error occurred: {e}")
else:
st.warning("Please enter a question and upload an image to get an answer.")
This section of the code processes the response from the Groq API after generating a completion. It extracts the content of the response from the first choice in the completion result and cleans it using the clean_response function. The system appends the cleaned response to the session state messages with the role of “assistant” and displays it in the chat interface. The response appears under a “Response” header with a success message. If a ValueError occurs, the system displays an error message. If either the question or the image is not provided, a warning prompts the user to enter both to get an answer.
Check the Full Code in GitHub Repo Here.
A tank has three pipes attached to it. Pipe A can fill the tank in 4 hours, Pipe B can fill it in 6 hours, and Pipe C can empty the tank in 3 hours. If all three pipes are opened together, how long will it take to fill the tank completely?
By combining the powers of Gemma 9b, Llama 3.2 Vision, LangChain, and Streamlit, it is possible to create a robust and user-friendly math problem-solving app that can revolutionize how students learn and engage with mathematics, providing step-by-step solutions and real-time feedback. This helps overcome not only the complexity issues within mathematical concepts but, more importantly, offers a scalable and accessible solution for learners at all levels.
This is one example of many ways such large language models and AI can be used in education. As we continue to develop these technologies, even more creative and impactful applications will emerge to change how we learn and teach.
What do you think of such a concept? Have you ever tried to develop AI-based edutainment applications? Share your experiences and ideas in the comments below!
A. Gemma 2 9b is a powerful language model developed by Google, capable of understanding and solving complex math problems presented in text form.
A. The app uses the Meta Llama 3.2 vision model to interpret math problems in images. It then extracts the problem and generate the response.
A. Yes, you can design the app to display the steps involved in solving a problem, which can be a valuable learning tool for users.
A. It’s important to ensure the app is used responsibly and doesn’t facilitate cheating or hinder genuine learning. Design features that promote understanding and encourage users to engage with the problem-solving process.
A. You can find more information about Gemma 2 9b, Llama 3.2, Groq, LangChain, and Streamlit on Analytics Vidhya, their respective official websites and documentation pages.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.