Multimodal agentic systems represent a revolutionary advancement in the field of artificial intelligence, seamlessly combining diverse data types—such as text, images, audio, and video—into a unified system that significantly enhances the capabilities of intelligent technologies. These systems rely on autonomous intelligent agents that can independently process, analyze, and synthesize information from various sources, facilitating a deeper and more nuanced understanding of complex situations.
By merging multimodal inputs with agentic functionality, these systems can dynamically adapt in real time to changing environments and user interactions, offering a more responsive and intelligent experience. This fusion not only boosts operational efficiency across a range of industries but also elevates human-computer interactions, making them more fluid, intuitive, and contextually aware. As a result, multimodal agentic frameworks are set to reshape the way we interact with and utilize technology, driving innovation in countless applications across sectors.
This article was published as a part of the Data Science Blogathon.
Agentic AI systems, fortified with sophisticated image analysis capabilities, are transforming industries by enabling a suite of indispensable functions.
CrewAI is a cutting-edge, open-source framework designed to orchestrate autonomous AI agents into cohesive teams, enabling them to tackle complex tasks collaboratively. Within CrewAI, each agent is assigned specific roles, equipped with designated tools, and driven by well-defined goals, mirroring the structure of a real-world work crew.
The Vision Tool expands CrewAI’s capabilities, allowing agents to process and understand image-based text data, thus integrating visual information into their decision-making processes. Agents can leverage the Vision Tool to extract text from images by simply providing a URL or a file path, enhancing their ability to gather information from diverse sources. After the text is extracted, agents can then utilize this information to generate comprehensive responses or detailed reports, further automating workflows and enhancing overall efficiency. To effectively use the Vision Tool, it’s necessary to set the OpenAI API key within the environment variables, ensuring seamless integration with language models.
We will construct a sophisticated, multi-modal agentic system that will first leverage the Vision Tool from CrewAI designed to interpret and analyze stock charts (presented as images) of two companies. This system will then harness the power of the DeepSeek-R1-Distill-Qwen-7B model to provide detailed explanations of these companies’ stock’s behaviour, offering well-reasoned insights into the two companies’ performance and comparing their behaviour. This approach allows for a comprehensive understanding and comparison of market trends by combining visual data analysis with advanced language models, enabling informed decision-making.
To adapt DeepSeek R1’s advanced reasoning abilities for use in more compact language models, the creators compiled a dataset of 800,000 examples generated by DeepSeek R1 itself. These examples were then used to fine-tune existing models such as Qwen and Llama. The results demonstrated that this relatively simple knowledge distillation method effectively transferred R1’s sophisticated reasoning capabilities to these other models
The DeepSeek-R1-Distill-Qwen-7B model is one of the distilled DeepSeek R1’s models. It is a distilled version of the larger DeepSeek-R1 architecture, designed to offer enhanced efficiency while maintaining robust performance. Here are some key features:
The model excels in mathematical tasks, achieving an impressive score of 92.8% on the MATH-500 benchmark, demonstrating its capability to handle complex mathematical reasoning effectively.
In addition to its mathematical prowess, the DeepSeek-R1-Distill-Qwen-7B performs reasonably well on factual question-answering tasks, scoring 49.1% on GPQA Diamond, indicating a good balance between mathematical and factual reasoning abilities.
We will leverage this model to explain and find reasonings behind the behaviour of stocks of companies post extraction of information from stock chart images.
We will be using Ollama for pulling the LLM models and utilizing T4 GPU on Google Colab for building this multi-modal agentic system.
!pip install crewai crewai_tools
!sudo apt update
!sudo apt install -y pciutils
!pip install langchain-ollama
!curl -fsSL https://ollama.com/install.sh | sh
!pip install ollama==0.4.2
import threading
import subprocess
import time
def run_ollama_serve():
subprocess.Popen(["ollama", "serve"])
thread = threading.Thread(target=run_ollama_serve)
thread.start()
time.sleep(5)
!ollama pull deepseek-r1
import os
from crewai import Agent, Task, Crew, Process, LLM
from crewai_tools import LlamaIndexTool
from langchain_openai import ChatOpenAI
from crewai_tools import VisionTool
vision_tool = VisionTool()
os.environ['OPENAI_API_KEY'] =''
os.environ["OPENAI_MODEL_NAME"] = "gpt-4o-mini"
llm = LLM(
model="ollama/deepseek-r1",
)
def create_crew(image_url,image_url1):
#Agent For EXTRACTNG INFORMATION FROM STOCK CHART
stockchartexpert= Agent(
role="STOCK CHART EXPERT",
goal="Your goal is to EXTRACT INFORMATION FROM THE TWO GIVEN %s & %s stock charts correctly """%(image_url, image_url1),
backstory="""You are a STOCK CHART expert""",
verbose=True,tools=[vision_tool],
allow_delegation=False
)
#Agent For RESEARCH WHY THE STOCK BEHAVED IN A SPECIFIC WAY
stockmarketexpert= Agent(
role="STOCK BEHAVIOUR EXPERT",
goal="""BASED ON THE PREVIOUSLY EXTRACTED INFORMATION ,RESEARCH ABOUT THE RECENT UPDATES OF THE TWO COMPANIES and EXPLAIN AND COMPARE IN SPECIFIC POINTS WHY THE STOCK BEHAVED THIS WAY . """,
backstory="""You are a STOCK BEHAVIOUR EXPERT""",
verbose=True,
allow_delegation=False,llm = llm
)
#Task For EXTRACTING INFORMATION FROM A STOCK CHART
task1 = Task(
description="""Your goal is to EXTRACT INFORMATION FROM THE GIVEN %s & %s stock chart correctly """%((image_url,image_url1)),
expected_output="information in text format",
agent=stockchartexpert,
)
#Task For EXPLAINING WITH ENOUGH REASONINGS WHY THE STOCK BEHAVED IN A SPECIFIC WAY
task2 = Task(
description="""BASED ON THE PREVIOUSLY EXTRACTED INFORMATION ,RESEARCH ABOUT THE RECENT UPDATES OF THE TWO COMPANIES and EXPLAIN AND COMPARE IN SPECIFIC POINTS WHY THE STOCK BEHAVED THIS WAY.""",
expected_output="Reasons behind stock behavior in BULLET POINTS",
agent=stockmarketexpert
)
#Define the crew based on the defined agents and tasks
crew = Crew(
agents=[stockchartexpert,stockmarketexpert],
tasks=[task1,task2],
verbose=True, # You can set it to 1 or 2 to different logging levels
)
result = crew.kickoff()
return result
The below two stock charts were given as input to the crew
text = create_crew("https://www.eqimg.com/images/2024/11182024-chart6-equitymaster.gif","https://www.eqimg.com/images/2024/03262024-chart4-equitymaster.gif")
pprint(text)
Mamaearth's stock exhibited volatility during the year due to internal
challenges that led to significant price changes. These included unexpected
product launches and market controversies which caused both peaks and
troughs in the share price, resulting in an overall fluctuating trend.
On the other hand, Zomato demonstrated a generally upward trend in its share
price over the same period. This upward movement can be attributed to
expanding business operations, particularly with successful forays into
cities like Bengaluru and Pune, enhancing their market presence. However,
near the end of 2024, external factors such as a major scandal or regulatory
issues might have contributed to a temporary decline in share price despite
the overall positive trend.
In summary, Mamaearth's stock volatility stems from internal inconsistencies
and external controversies, while Zomato's upward trajectory is driven by
successful market expansion with minor setbacks due to external events.
As seen from the final output, the agentic system has given quite a good analysis and comparison of the share price behaviours from the stock charts with sufficient reasonings like a foray into cities, and expansion in business operations behind the upward trend of the share price of Zomato.
Let’s check and compare the share price behaviour from stock charts for two more companies – Jubilant Food Works & Bikaji Foods International Ltd. for the year 2024.
text = create_crew("https://s3.tradingview.com/p/PuKVGTNm_mid.png","https://images.cnbctv18.com/uploads/2024/12/bikaji-dec12-2024-12-b639f48761fab044197b144a2f9be099.jpg?im=Resize,width=360,aspect=fit,type=normal")
print(text)
The stock behavior of Jubilant Foodworks and Bikaji can be compared based on
their recent updates and patterns observed in their stock charts.
Jubilant Foodworks:
Cup & Handle Pattern: This pattern is typically bullish, indicating that the
buyers have taken control after a price decline. It suggests potential
upside as the candlestick formation may signal a reversal or strengthening
buy interest.
Breakout Point: The horizontal dashed line marking the breakout point implies
that the stock has reached a resistance level and may now test higher
prices. This is a positive sign for bulls, as it shows strength in the
upward movement.
Trend Line Trend: The uptrend indicated by the trend line suggests ongoing
bullish sentiment. The price consistently moves upwards along this line,
reinforcing the idea of sustained growth.
Volume Correlation: Volume bars at the bottom showing correlation with price
movements indicate that trading volume is increasing alongside upward price
action. This is favorable for buyers as it shows more support and stronger
interest in buying.
Bikaji:
Recent Price Change: The stock has shown a +4.80% change, indicating positive
momentum in the short term.
Year-to-Date Performance: Over the past year, the stock has increased by
61.42%, which is significant and suggests strong growth potential. This
performance could be attributed to various factors such as market
conditions, company fundamentals, or strategic initiatives.
Time Frame: The time axis spans from January to December 2024, providing a
clear view of the stock's performance over the next year.
Comparison:
Both companies' stocks are showing upward trends, but Jubilant Foodworks has
a more specific bullish pattern (Cup & Handle) that supports its current
movement. Bikaji, on the other hand, has demonstrated strong growth over the
past year and continues to show positive momentum with a recent price
increase. The volume in Jubilant Foodworks correlates well with upward
movements, indicating strong buying interest, while Bikaji's performance
suggests sustained or accelerated growth.
The stock behavior reflects different strengths: Jubilant Foodworks benefits
from a clear bullish pattern and strong support levels, whereas Bikaji
stands out with its year-to-date growth. Both indicate positive
developments, but the contexts and patterns differ slightly based on their
respective market positions and dynamics.
As seen from the final output, the agentic system has given quite a good analysis and comparison of the share price behaviours from the stock charts with elaborate explanations on the trends seen like Bikaji’s sustained performance in contrast to Jubilant Foodworks’ bullish pattern.
In conclusion, multimodal agentic frameworks mark a transformative shift in AI by blending diverse data types for better real-time decision-making. These systems enhance adaptive intelligence by integrating advanced image analysis and agentic capabilities. As a result, they optimize efficiency and accuracy across various sectors. The Crew AI Vision Tool and DeepSeek R1 model demonstrate how such frameworks enable sophisticated applications, like analyzing stock behaviour. This advancement highlights AI’s growing role in driving innovation and improving decision-making.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
Ans. Multimodal agentic frameworks combine diverse data types like text, images, audio, and video into a unified AI system. This integration enables intelligent agents to analyze and process multiple forms of data for more nuanced and efficient decision-making.
Ans. Crew AI is an advanced, open-source framework designed to coordinate autonomous AI agents into cohesive teams that work collaboratively to complete complex tasks. Each agent within the system is assigned a specific role, equipped with designated tools, and driven by well-defined goals, mimicking the structure and function of a real-world work crew.
Ans. The Crew AI Vision Tool allows agents to extract and process text from images. This capability enables the system to understand visual data and integrate it into decision-making processes, further improving workflow efficiency.
Ans. These systems are especially beneficial in industries like healthcare, manufacturing, and retail, where real-time analysis and precision in image recognition are critical for tasks such as medical diagnosis and quality control.
Ans. DeepSeek-R1’s distilled models are smaller, more efficient versions of the larger DeepSeek-R1 model, created using a process called distillation, which preserves much of the original model’s reasoning power while reducing computational demands. These distilled models are fine-tuned using data generated by DeepSeek-R1. Some examples of these distilled models are DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B, DeepSeek-R1-Distill-Llama-8B amongst others.