Implementing an automatic grading system for handwritten answer sheets using a multi-agent framework streamlines evaluation, reduces manual effort, and enhances consistency. A multi-agent system (MAS) consists of autonomous agents that extract information, grade answers, and suggest improvements. By leveraging Handwritten Answer Evaluation using Griptape, educators can automate grading, ensuring accuracy and efficiency. This approach allows teachers to focus more on personalized feedback and student development while maintaining fairness and reliability in assessments with Handwritten Answer Evaluation using Griptape.
This article was published as a part of the Data Science Blogathon.
Multi-Agent Systems (MAS) are complex systems composed of multiple interacting intelligent agents, each with its own specialized capabilities and goals. These agents can be software programs, robots, drones, sensors, humans, or a combination thereof. MAS leverage collective intelligence, cooperation, and coordination among agents to solve problems that are too complex for a single agent to handle alone.
MAS can adapt to changing environments by adding or removing agents, making them highly scalable for complex problem-solving. Decentralized control ensures continued system operation despite component failures. MAS can tackle large-scale tasks by combining the expertise of multiple agents, outperforming single-agent systems.
The core components of multi-agent systems include agents, which are autonomous entities with specific roles and goals, acting as the cognitive core of the system. Tasks represent specific jobs assigned to these agents, ensuring that their efforts are directed towards achieving the system’s objectives. Tools extend the capabilities of agents, allowing them to interact with external systems and perform specialized tasks efficiently. Additionally, processes outline how agents interact and coordinate actions, ensuring that tasks are executed in harmony. The environment provides the context in which agents operate, influencing their decisions and actions.
Finally, communication protocols enable agents to share information and negotiate, fostering collaboration or competition depending on the system’s design. These components work together to enable complex problem-solving and adaptability in multi-agent systems.
Multi-agent AI systems can be incredibly useful in a variety of applications across different industries. Here are some examples:
Griptape is a modular Python framework designed to build and operate multi-agent systems, which are crucial components of agentic AI systems. These systems enable large language models to autonomously handle complex tasks by integrating multiple AI agents that work together seamlessly. Griptape simplifies the creation of such systems by providing structures like agents, pipelines, and workflows, allowing developers to build business logic using Python and ensuring better security, performance, and cost efficiency.
With the increasing prevalence of online classes and various modes of education, there is a growing shortage of staff to evaluate students’ exams. The slow pace of evaluation remains a major bottleneck in improving instructors’ productivity. Teachers often spend a significant amount of time grading hundreds of answer sheets, time that could be better utilized for tasks like projects, research, or directly assisting students. This issue is particularly relevant as multiple-choice exams are not always effective in assessing a student’s understanding of a subject. In this article, we will develop a multi-agent system designed to automatically grade handwritten papers.
Implementing a multi-agent system for automatic grading of handwritten answer sheets can significantly streamline the evaluation process for educators. This system utilizes specialized agents to extract relevant information from the sheets, assess the answers based on predefined criteria, and even provide suggestions for improved responses. By automating these tasks, teachers can focus on more critical aspects of education, such as personalized feedback and student development. This technology can also enhance grading consistency and reduce the time spent on manual evaluation.
We will build this system using GripTape on Google Colab with T4 GPU (Free Tier).
Automating handwritten answer evaluation with a multi-agent system can improve accuracy, efficiency, and consistency. By leveraging Griptape, educators can streamline grading, reduce manual effort, and ensure fair assessments.
The code below installs necessary dependencies for working with Griptape, Ollama, and Langchain, followed by importing various modules to facilitate creating and managing agents, tasks, and tools for handling different data types and web searches. It prepares the environment to execute a multi-agent system using AI models and external tools like file management and web search.
!pip install griptape
!sudo apt update
!sudo apt install -y pciutils
!pip install langchain-ollama
!curl -fsSL https://ollama.com/install.sh | sh
!pip install ollama==0.4.2
!pip install "duckduckgo-search>=7.0.1"
import os
from griptape.drivers.prompt.ollama import OllamaPromptDriver
import requests
from griptape.drivers.file_manager.local import LocalFileManagerDriver
from griptape.drivers.prompt.openai import OpenAiChatPromptDriver
from griptape.loaders import ImageLoader
from griptape.structures import Agent
from griptape.tools import FileManagerTool, ImageQueryTool
from griptape.tasks import PromptTask, StructureRunTask
from griptape.drivers.structure_run.local import LocalStructureRunDriver
from griptape.structures import Agent, Workflow
from griptape.drivers.web_search.duck_duck_go import DuckDuckGoWebSearchDriver
from griptape.structures import Agent
from griptape.tools import PromptSummaryTool, WebSearchTool
The following code starts the ollama server. We also pull “minicpm-v” model from ollama so that this vision model can be used to extract text from handwritten notes.
import threading
import subprocess
import time
def run_ollama_serve():
subprocess.Popen(["ollama", "serve"])
thread = threading.Thread(target=run_ollama_serve)
thread.start()
time.sleep(5)
!ollama pull minicpm-v
import os
os.environ["OPENAI_API_KEY"] = ""
The code below defines the “image_dir” that stores our images or handwritten answer sheets. Also a a function reading_answersheet is defined that initializes an agent with tools for managing files and querying images using a vision language model (“minicpm-v”). The agent uses a file manager and an image query tool to process images.
images_dir = os.getcwd()
def reading_answersheet():
driver = LocalFileManagerDriver(workdir=images_dir)
return Agent(
tools=[
FileManagerTool(file_manager_driver=driver),
ImageQueryTool(
prompt_driver=OllamaPromptDriver(model="minicpm-v"), image_loader=ImageLoader(file_manager_driver=driver)
),
]
This code defines a function evaluation_answer that creates an agent with tools for web search using DuckDuckGo.
def evaluation_answer():
return Agent(
tools=[WebSearchTool(web_search_driver=DuckDuckGoWebSearchDriver()), PromptSummaryTool(off_prompt=False)],
)
We use this image for automatic evaluation. We save it in our current working directory as “sample.jpg”. Its a handwritten answer sheet. This agentic system will first extract the hand written answers, then evaluate which answers are correct, score them and finally suggest improvements.
In the following code blocks, we define three different tasks –
image_file_name = "sample.jpg"
team = Workflow()
research_task = StructureRunTask(
(
"""Extract IN TEXT FORMAT ALL THE LINES GIVEN IMAGE %s"""%(image_file_name),
),
id="research",
structure_run_driver=LocalStructureRunDriver(
create_structure=reading_answersheet,
),
)
evaluate_task =StructureRunTask(
(
"""Verify whether all the ANSWER containing lines in the TEXT {{ parent_outputs["research"] }} is correct and Score only on FACTUAL CORRECTNESS FOR each of these lines on a scale of 1 to 10 based on the correctness of the line.
DONT BE too strict in evaluation. IGNORE LINES WHICH DO NOT FIT IN THE CONTEXT AND MAY BE JUNK.
""",
),id="evaluate",
structure_run_driver=LocalStructureRunDriver(
create_structure=evaluation_answer,
)
)
answer_improvement = StructureRunTask(
(
"""ADD TO THE PREVIOUS OUTPUT, SUGGESTIONS ON HOW THE ANSWERS IN THE ANSWER containing lines in the TEXT {{ parent_outputs["research"] }} CAN BE IMPROVED BY PROVIDING BETTER OR MORE ACCURATE ANSWERS FOR THOSE ANSWERS THAT DO NOT HAVE 10 SCORE BASED ON THE OUTPUT {{ parent_outputs["evaluate"] }}.
DO INCLUDE THE WHOLE OUTPUT FROM THE PREVIOUS AGENT {{ parent_outputs["evaluate"] }} AS WELL IN THE FINAL OUTPUT.
""",
),
structure_run_driver=LocalStructureRunDriver(
create_structure=evaluation_answer,
)
)
This code adds tasks as child tasks to a parent workflow (research_task and evaluate_task), then runs the workflow with the tasks (research_task, evaluate_task, and answer_improvement) and prints the output of the result. The workflow orchestrates task execution and captures the final output. Here, the “answer_improvement” task in added as a child to both “evaluate_task” and “research_task” so that it is run post the first two tasks.
research_task.add_child(evaluate_task)
evaluate_task.add_child(answer_improvement)
research_task.add_child(answer_improvement)
team = Workflow(
tasks=[research_task,evaluate_task,answer_improvement],
)
answer = team.run()
print(answer.output)
Input Image
Output:
As seen from the output, this agentic system not only scores each of the answers but also suggests improvement for each of the answers which can be very helpful to both teachers and students, whoever is using this system.
Another Example
Output
As seen from the output, this agentic system not only scores each of the answers but also suggests improvement for each of the answers. We only see for the second answer, the system is not able to verify from the web and therefore scores it as 5 out of 10. We would need a human intervention for answers like these at the end of the loop. Despite this, agentic systems like this can for sure help teachers speed up evaluation of hundreds of answer sheet.
The implementation of a Handwritten Answer Evaluation using Griptape for the automatic grading of handwritten answer sheets offers a transformative solution to the educational sector. By automating the grading process, educators can save valuable time, ensure more consistent evaluations, and focus on providing personalized feedback to students. Leveraging frameworks like Handwritten Answer Evaluation using Griptape further enhances the flexibility and scalability of the system, making it a highly effective tool for modernizing assessments. This approach not only benefits teachers but also improves the overall fairness and reliability of academic evaluations.
A. A Multi-Agent System (MAS) is a decentralized framework composed of multiple autonomous agents that interact with each other within a shared environment to achieve individual or collective goals. These agents can be software programs, robots, sensors, or other intelligent entities that make decisions based on their local data.
A. The automatic grading system uses multiple specialized agents to perform tasks such as extracting information from handwritten answer sheets, grading the answers based on predefined criteria, and suggesting improvements. These tasks are carried out independently but cooperatively, helping educators evaluate answers more efficiently and consistently.
A. The main benefits of using MAS for grading are reduced manual grading time, enhanced consistency and fairness in assessments, the ability for educators to focus on personalized feedback and student development, and improved reliability in the grading process.
A. GripTape is a modular Python framework that simplifies the creation of multi-agent systems. It provides structures like agents, pipelines, and workflows to help developers design complex AI architectures. With GripTape, developers can build and operate multi-agent systems efficiently, leveraging tools and engines to handle diverse tasks such as grading and feedback generation.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.