Llama 3.2 90B vs GPT 4o: Image Analysis Comparison

Diksha Kumari Last Updated : 24 Nov, 2024
20 min read

A picture speaks a million words! We come across countless images every day while scrolling through social media or browsing the web. Some of them make us think, some make us laugh, and some mesmerize us, making us wonder what’s the story behind them. Large language models (LLMs) can help us better understand images, explaining their visual setting, context, and meaning. When it comes to image analysis, Llama 3.2 90B Vision and GPT-4o are two powerful LLMs with the potential to understand and analyze complex visual information, delivering highly accurate outputs. In this blog, we’ll compare the outputs of  Llama 3.2 90B vs GPT-4o to uncover which one comes out on top.

Image Analysis with Llama 3.2 90B vs GPT 4o

Get ready to explore an exciting head-to-head comparison of the Llama 3.2 90B vs GPT-4o models across diverse image types. From the raw beauty of wildlife photography to the intricate details of technical diagrams, we’ll see how these models interpret and analyze visuals.

We will be specifically analyzing outputs of these models on these 10 image categories:

  1. Wildlife Photography
  2. Medical Images
  3. Natural Landscape
  4. Technical Diagram
  5. Natural Phenomenon
  6. Food Photography
  7. Infographics
  8. Sports Photography
  9. Cartoons
  10. Architectural Designs

Let’s begin with Wildlife Photography and uncover how these models interpret the beauty and dynamics of the animal kingdom.

1. Wildlife Photography

Llama 3.2 90B vs GPT 4o: Image Analysis Comparison

Using the above image, let me show you how to provide a prompt for image analysis in Llama 3.2 90B and GPT-4.0.  Let’s start with Llama 3.2 90B

Image Analysis Using Llama 3.2 90B

1. Go to https://build.nvidia.com/meta/llama-3.2-90b-vision-instruct. You can access it directly for image analysis, without having to log in.

Llama 3.2 90B Image Analysis

2. At the bottom left, you will see an option to upload images. Here, you can upload your image in .png, .jpg, or .jpeg format.

3. Enter the prompt in the ‘Type text here’ section of the same box.

Image Analysis with Llama  3.2 90B

Now, for our example, we will use the following prompt:

“Describe the animal in the image, including details about its posture, movement, and expressions. Also, highlight the environment depicted in the image.”

Llama 3.2 90B Image Analysis
  1. After typing in your prompt, simply click on Send and wait for the response.

You will see the response in the top-left box on the screen.

Now let’s try the same prompt on GPT 4o.

Image Analysis Using GPT 4o

  1. Visit https://openai.com/index/chatgpt/.
  2. Select try ChatGPT and log in using your Gmail account. You will receive 10 free questions for ChatGPT4o every day.
  3. Click on the paperclip icon and attach the same wildlife photo. 
  4. Type in the same prompt used earlier in Llama 3.2 90B and click on Upward arrow.
GPT 4o Image Analysis

You will get the following response.

Also Read: 3 Ways to Use GPT 4o Like a Pro with Canvas

Now let’s compare the responses we got from both these models.

Llama 3.2 90BGPT 4o
“The image depicts a tiger walking on a dirt path in a wooded area.

The tiger is walking towards the camera, its head turned to the left. It has orange fur with black stripes, and its body is facing the left side of the image. The tiger’s posture suggests that it is alert and aware of its surroundings. Its movement appears slow and deliberate as if it is stalking its prey or navigating through the forest.

The environment depicted in the image is a wooded area with a dirt path running through it. The path is surrounded by trees and greenery, creating a sense of seclusion and isolation. In the background, there are several trees and rocks visible, which add to the natural and rustic atmosphere of the image.

Overall, the image presents a serene and peaceful scene, with the tiger being the main focus of attention. The surroundings create a sense of harmony between the animal and its environment, highlighting the beauty and majesty of nature.”
“The animal in the image is a tiger, identifiable by its distinctive orange coat with black stripes and white underside.
Description:
Posture and Movement: The tiger is standing on all four legs and appears to be in mid-stride, with its body slightly angled and its tail curved upward. This suggests it is actively walking or preparing to move further.
Expression: The tiger’s facial expression is calm yet focused, with its eyes partially closed, possibly due to sunlight or a moment of reflection.
Environment:
The setting is a naturalistic outdoor area with patches of green grass, soil, and rocks. There are scattered trees and shrubs in the background, providing a semi-wild, forest-like ambience. The lighting is bright, indicating it might be daytime, with sunlight casting shadows around the tiger.”

Comparative Analysis

Llama 3.2 offers a broad understanding of the image, focusing on the harmony between the tiger and its environment with a factual storytelling approach. In contrast, GPT-4.0 excels in capturing specific micro-details, such as the tiger’s tail position, facial expression, and the interplay of lighting, resulting in more engaging and well-structured responses aligned with the prompt. I would prefer GPT 4o’s output in this case.

Now let’s analyze some other types of images. With each image, we will be comparing the responses of both models.

2. Medical Images

Prompt: What injury does this medical image depict, and how can it be diagnosed?

Llama 3.2 90BGPT 4o
“Step 1: Identify the type of medical image
This is an X-ray of the lumbar spine, showing the vertebrae in the lower back.
Step 2: Identify the injury
The X-ray shows a fracture of the L1 vertebra, which is located in the lower back. The fracture is a compression fracture, which means that the vertebra has been compressed and is no longer its normal height.
Step 3: Diagnose the injury
The diagnosis of this injury is a compression fracture of the L1 vertebra. This type of fracture is typically caused by osteoporosis, which is a condition that weakens the bones and makes them more susceptible to fractures.
Step 4: Conclusion
The medical image depicts a compression fracture of the L1 vertebra, which is a common injury in people with osteoporosis. The diagnosis of this injury can be made using an X-ray, which shows the fracture and the compression of the vertebra.


Answer: Compression fracture of the L1 vertebra.”

“The medical image shows X-rays of the lumbar spine, likely indicating an injury or abnormality in the vertebral column.

Possible Injuries Depicted:

Spinal Fractures:
Visible misalignment or damage to the vertebrae could suggest a compression fracture or a traumatic injury.
Common in cases of severe trauma or osteoporosis.
Degenerative Changes:
Narrowing between vertebral spaces may point to disc degeneration or herniation.
Bone spurs or irregularities could indicate arthritis.
Post-Surgical Observation:
The presence of screws or rods in the image suggests this could be a post-surgical X-ray of spinal stabilization, possibly following a fracture or scoliosis correction.

Diagnosis:

Clinical Assessment:
Symptoms such as back pain, reduced mobility, or neurological deficits (e.g., numbness or weakness) guide the diagnosis.
Imaging Studies:
X-ray: Used to identify fractures, misalignments, or degenerative changes.
MRI or CT Scan: Recommended for detailed evaluation of soft tissues, discs, and nerves.
Neurological Tests:
To check for nerve compression or spinal cord injury, which might accompany structural damage.
History and Context:
Injury type (trauma, fall, or chronic condition) combined with imaging findings confirms the diagnosis.

Would you like a detailed explanation of a specific injury or condition visible in such X-rays?”

Comparative Analysis

Llama 3.2 90B focuses on precision, clearly identifying the injury and its cause. Its response is ideal for readers seeking concise and direct answers. In contrast, GPT-4.0 offers an engaging and conversational response with a broader narrative style. It explores multiple possibilities and diagnostic methods, making it comprehensive but less precise. GPT-4.0’s approach suits those who want detailed insights and contextual explanations. For this image, I would prefer the response of Llama 3.2 90B.

3. Natural Landscape

Llama 3.2 90B vs GPT 4o: Image Analysis Comparison

Prompt: Describe the landscape image, focusing on the terrain, weather, and time of day or night. Highlight the interplay of colors and lighting to bring the scene to life.

Llama 3.2 90BGPT 4o
“The image features a sun-drenched mountain range adorned with vibrant green grass and trees. The transparent blue sky is dotted with a few wispy clouds, while the meadow boasts sprawling yellow and green grassland and almost endless trees. Several small wooden buildings are scattered in front of the mountains. The sun appears to be at its highest point for the day. The sky and mountain ranges blend seamlessly into a stunning backdrop. The image appears to be a high-quality photo rather than a painting.”“The landscape in the image showcases a breathtaking alpine setting with a rich interplay of natural elements, vibrant colors, and serene lighting.

Terrain:

The foreground features a lush, rolling meadow dotted with scattered wooden cottages, hinting at a peaceful, rural environment.
The terrain transitions into a dense arrangement of evergreen trees, adding depth and contrast to the expansive grassy plains.
In the background, towering, rugged mountain peaks dominate the scene, their rocky textures sharply outlined against the sky, suggesting the presence of a majestic mountain range.

Weather and Time of Day:

The weather appears to be clear and sunny, with a bright blue sky punctuated by a few soft, white clouds. This suggests a warm summer day, perfect for outdoor exploration. The sunlight bathes the landscape, highlighting the greenery and casting gentle shadows that add dimension and texture to the scene. The lighting implies it is mid-morning or early afternoon when the sun is high but not directly overhead.

Interplay of Colors:

The vivid greens of the meadows and forests create a harmonious base, blending naturally with the muted browns and greys of the mountain rocks.
The deep blue sky and the subtle white clouds complement the greenery, creating a vibrant contrast that brings the entire scene to life.
The wooden cottages, in earthy tones of brown, add a rustic charm and human element to the otherwise untouched natural setting.

Lighting:

The sunlight enhances the colors of the scene, making the greens more vibrant and the mountains’ textures more pronounced.
The shadows of the trees and cottages stretch gently, adding depth and perspective to the rolling terrain.

This idyllic alpine landscape perfectly balances tranquility and grandeur with its vivid colors, crisp lighting, and harmonious blend of natural and man-made elements. It evokes a sense of peace and wonder, inviting the viewer to immerse themselves in its beauty.”

Comparative Analysis

GPT-4.0 showcases a stronger contextual understanding by capturing intricate details such as terrain, lighting, weather, and the interplay of colors between elements, offering a richer depiction of the scene. It also excels in creativity. In contrast, Llama 3.2 focuses more on the primary components, lacking the vividness that makes descriptions engaging. I would prefer the response of GPT4 o here.

4. Technical Diagrams

Prompt: Explain this circuit diagram and identify all the symbols of the electrical components used.

Llama 3.2 90BGPT 4o
“Circuit Diagram Explanation

The accompanying circuit diagram illustrates a fundamental electrical setup, comprising a power source, a switch, and a load. The diagram shows:

A DC power source, denoted by the (+) and (-) symbols, which generates an electric potential difference to drive the circuit.
A switch, represented by the pole and throw symbols (|Il), which controls the flow of current to the load.
A load, depicted by the symbol (.) connected between two wires, which consumes the electrical energy provided by the power source.

Electrical Component Symbols
The symbols used in the diagram are:

DC Power Source: (+) and (-)
Switch: |Il
Load: (.)
Wire: Solid lines
Ammeter (measures current): A
Battery: coiled line
Ground: downward-pointing arrow
Monopolar measurements: horizontal downward-pointing arrow with a plus sign (+ or -)

What to Note

Crucial information about a circuit includes the direction of the current, the type of current (AC or DC), and the potential difference (voltage) generated by the power source. The diagram also includes symbols for an ammeter, measuring current, and a ground, indicating where the circuit is earthed.

This diagram and accompanying symbols provide a clear and concise representation of a simple electrical circuit, straddling fundamental electrical concepts such as potential difference, current flow, and circuit components.”
“The circuit diagram shown represents a simple closed electrical circuit. Below is an explanation of the circuit and the identification of the electrical components:

Explanation of the Circuit

Current Flow: The circuit demonstrates the flow of electric current (denoted as III) through the components in a loop.
Voltage Source: The circuit is powered by a battery, which provides the potential difference necessary to drive the current.
Resistor: A resistor (denoted by the zig-zag line) opposes the flow of current, consuming energy and possibly dissipating heat.
Ammeter: An ammeter (denoted by the circle with an “A” inside) is connected in series to measure the current flowing through the circuit.

Key/Switch: The symbol “K” represents a switch that can open or close the circuit. In this diagram, it appears to be closed, allowing current to flow.

Direction of Current: The arrows in the diagram indicate the direction of conventional current flow (from positive to negative terminal of the battery).

Identification of Electrical Components

Battery (Symbol: +H−+H-+H−):
Represents the voltage source.
Supplies energy to the circuit.
Resistor (Symbol: Zig-zag line):
Opposes the flow of current.
Converts electrical energy into heat.
Ammeter (Symbol: Circle with “A”):
Measures the current flowing through the circuit.
Always connected in series.
Key/Switch (Symbol: KKK, open or closed line with a dot):
Allows the circuit to be opened or closed.
A closed switch completes the circuit, enabling current flow.
Wires/Conductors (Lines with arrows):
Connect the components to form a complete circuit.
Provide a path for current flow.
Current Direction (Arrow with III):
Indicates the direction of conventional current (positive to negative).

Summary

This is a basic electrical circuit where the current flows from the battery through the resistor and ammeter, with the flow controlled by the switch. It is a fundamental representation of how electrical components are connected and work together in a circuit.”

Comparative Analysis

Neither of the models has identified the bulb symbol ( electronic component shown in circuit diagram). Llama 3.2 90B explains concepts in detail but struggles to accurately identify some electronic components, such as the switch, battery, and load. In contrast, GPT-4o recognises most of the components apart from the bulb. It delivers clearer and more accurate responses making it my choice for this image.

5. Natural Phenomenon

Llama 3.2 90B vs GPT 4o: Image Analysis Comparison

Prompt: What natural phenomenon is depicted in this image, and what are the factors that cause it?

Llama 3.2 90BGPT 4o
“The image depicts the aurora borealis, also known as the northern lights, which is a natural phenomenon that occurs when charged particles from the sun interact with the Earth’s magnetic field and atmosphere. The factors that cause the aurora borealis include:

Solar winds: The sun emits solar winds, which are streams of charged particles that travel through space.
Earth’s magnetic field: When the solar winds reach the Earth, they are deflected by the planet’s magnetic field, which acts as a shield against the charged particles.
Atmospheric gases: The deflected solar winds collide with the Earth’s atmosphere, specifically with gases such as oxygen and nitrogen.
Excitation and emission: The collision of the solar winds with the atmospheric gases excites the atoms and molecules, causing them to emit light at specific wavelengths.
Altitude and density: The color and intensity of the aurora depend on the altitude and density of the atmospheric gases where the collision occurs.

The combination of these factors results in the spectacular display of the aurora borealis, which can be seen in the polar regions of the Northern Hemisphere.”
“The image depicts the Aurora Borealis, commonly known as the Northern Lights, a stunning natural phenomenon.

What Causes the Aurora Borealis?

Solar Wind: The phenomenon begins with the sun emitting charged particles (solar wind) during solar flares or coronal mass ejections.
Earth’s Magnetic Field: These charged particles interact with Earth’s magnetosphere, a protective magnetic shield that directs them towards the polar regions.
Atmospheric Collision: As the charged particles enter the Earth’s atmosphere, they collide with gas molecules such as oxygen and nitrogen.
Light Emission: These collisions excite the gas molecules, causing them to release energy in the form of colorful light emissions.
Green and Yellow: Oxygen molecules at lower altitudes.
Red: Oxygen at higher altitudes.
Blue and Purple: Nitrogen molecules.
Polar Regions: The phenomenon is most visible near the poles because the magnetic field lines converge there, channeling the charged particles.

Why is it Special?

The interplay of colors, patterns, and motion creates an ethereal display that varies depending on solar activity, geographic location, and atmospheric conditions. This makes auroras a breathtaking spectacle and a bucket-list experience for many.”

Comparative Analysis

Llama 3.2 90B focuses more on scientific factors (e.g., density and emission) and is better suited for technical or academic audiences. Its response provides strong technical depth but lacks a detailed visual focus. On the other hand, GPT-4.0 delivers a more descriptive and engaging explanation, painting a vivid picture of the aurora’s beauty while explaining the reasons behind the different colors in the Aurora Borealis. GPT-4.0 strikes a balance by emphasizing both scientific accuracy and visual appeal.I would prefer the response of Llama 3.2 90B because of this scientific clarity.

6. Food Photography

Llama 3.2 90B vs GPT 4o: Image Analysis Comparison

Prompt: Identify the food in the image and list its main ingredients. Provide step-by-step instructions for preparing it, including details about techniques, ingredient measurements, and tips to enhance its flavor and presentation. Ensure the explanation is clear, engaging, and highlights what makes the dish special.

Llama 3.2 90BGPT 4o
“Pancakes with Strawberries and Caramel Sauce

Description
These pancakes are a delicious breakfast or brunch option, featuring fluffy pancakes, sweet strawberries, and a rich caramel sauce.

Ingredients

1 cup all-purpose flour
2 tablespoons sugar
2 teaspoons baking powder
1/4 teaspoon salt
1 cup milk
1 large egg
2 tablespoons unsalted butter, melted
2 cups sliced strawberries
1/2 cup caramel sauce (homemade or store-bought)
Whipped cream or yogurt (optional)

Instructions

Whisk together flour, sugar, baking powder, and salt.
In a separate bowl, combine milk, egg, and melted butter.
Add wet ingredients to dry ingredients and stir until just combined.
Heat a non-stick skillet or griddle over medium heat.
Drop batter by 1/4 cupfuls onto the skillet or griddle.
Cook for 2-3 minutes, until bubbles appear on the surface and edges start to dry.
Flip and cook for an additional 1-2 minutes, until golden brown.
Serve pancakes warm with sliced strawberries and drizzle with caramel sauce.
Top with whipped cream or yogurt, if desired.

Tips

Use fresh strawberries for the best flavor and texture.
Make caramel sauce from scratch by melting sugar and heavy cream over low heat, stirring constantly.
Add a pinch of salt to balance the sweetness of the caramel sauce and strawberries.
Experiment with different toppings, such as chopped nuts, chocolate chips, or fresh berries.
Make pancakes ahead of time and freeze for up to 2 months.
Simply toast or microwave when ready to serve.”
“The dish in the image is a stack of pancakes topped with caramel sauce, fresh strawberries, nuts, powdered sugar, and a sprig of mint, served alongside whipped cream. Pancakes are a classic breakfast or brunch treat, loved for their fluffy texture and versatility.

Main Ingredients:

For the Pancakes:

1 cup (120g) all-purpose flour
2 tablespoons granulated sugar
1 teaspoon baking powder
1/2 teaspoon baking soda
1/4 teaspoon salt
3/4 cup (180ml) milk
1 egg
2 tablespoons unsalted butter (melted) or vegetable oil
1 teaspoon vanilla extract

For Toppings:

Fresh strawberries (sliced)
Caramel sauce
Chopped nuts (e.g., walnuts or pecans)Powdered sugar (for dusting)Whipped creamFresh mint leaves (optional, for garnish)

Preparation Instructions:

Step 1: Prepare the Batter

Mix Dry Ingredients: In a large bowl, whisk together the flour, sugar, baking powder, baking soda, and salt.
Combine Wet Ingredients: In another bowl, whisk the milk, egg, melted butter, and vanilla extract.
Mix Together: Gradually pour the wet ingredients into the dry ingredients. Mix gently until just combined (some lumps are fine). Overmixing can make the pancakes tough.

Step 2: Cook the Pancakes

Heat a non-stick skillet or griddle over medium heat.
Lightly grease it with butter or oil.
Pour about 1/4 cup of batter onto the skillet for each pancake. Spread lightly into a circle if needed.
Cook for 2-3 minutes or until bubbles form on the surface and the edges look set.
Flip and cook the other side for 1-2 minutes until golden brown.
Repeat with the remaining batter.

Step 3: Prepare the Toppings

Warm the caramel sauce slightly to make it pourable.
Slice the strawberries and chop the nuts for garnish.

Step 4: Assemble and Serve

Stack the pancakes on a plate (around 3-5 per stack).
Drizzle the warm caramel sauce over the pancakes, allowing it to cascade down the sides.
Add sliced strawberries and nuts on top.
Dust with powdered sugar and garnish with a sprig of mint for a touch of elegance.
Serve with a side of whipped cream for dipping.

Tips for Enhanced Flavor and Presentation:

Buttermilk Substitute: Replace regular milk with buttermilk for extra fluffy pancakes and a subtle tang.
Caramel Sauce Tip: Make your own caramel sauce by simmering sugar, butter, and cream for a richer, homemade flavor.
Texture Boost: Toast the nuts lightly for added crunch and a deeper flavor.
Presentation: Use a white plate to contrast the vibrant colors of the toppings and drizzle extra caramel sauce around the edges for an artistic touch.

This dish stands out for its balance of warm, fluffy pancakes and the sweet, nutty, and fruity toppings, making it a visually stunning and delicious treat!”

Comparative Analysis

Llama 3.2 focuses on practical recipe creation and preparation guidance, making it ideal for readers who want straightforward instructions to recreate the dish. In contrast, GPT-4o goes beyond functionality, offering visually rich descriptions that highlight the dish’s aesthetic and culinary artistry, creating a more engaging and immersive experience for readers. Additionally, GPT-4.0 provides separate, detailed instructions for each component of the dish, enabling greater customization and flexibility. So, i would take up the response of GPT 4o.

7. Infographics

For this example, I am using Canadian Natural Resources Ltd.’s stock data from the past year. I’ve sourced it from Google Finance, a website hosted by Google that provides real-time financial data.

Prompt: Explain the company’s stock graph for a beginner, highlighting key trends, patterns, and significant price changes, and provide insights for potential investors.

Llama 3.2 90BGPT 4o
“The stock graph for Canadian Natural Resources Ltd. shows a significant increase in the stock price over the past year, with some fluctuations. Here’s a breakdown of the key trends, patterns, and significant price changes:
Overall Trend: The stock price has increased by approximately 33% over the past year, indicating a strong upward trend.
Key Patterns:
Uptrend: The stock price has been consistently increasing over the past year, with some minor corrections.
Volatility: The stock price has experienced some volatility, with significant price swings in both directions.
Support and Resistance: The stock price has formed a strong support level around $30 and a resistance level around $35.
Significant Price Changes:
Recent Rally: The stock price has experienced a significant rally over the past few months, with a gain of over 10%.
Correction: The stock price experienced a correction in the middle of the year, with a decline of around 5%.
Breakout: The stock price broke out above the resistance level of $35 in the recent rally, indicating a potential continuation of the uptrend.
Insights for Potential Investors:
Strong Fundamentals: The company’s strong financial performance and increasing demand for its products suggest a positive outlook for the stock.
Volatility: The stock’s volatility may make it suitable for investors who are comfortable with taking on more risk.
Support and Resistance: The strong support and resistance levels may provide opportunities for investors to buy or sell the stock at favorable prices.

Overall, the stock graph for Canadian Natural Resources Ltd. suggests a strong upward trend with some volatility. Potential investors should consider the company’s fundamentals, volatility, and support and resistance levels when making investment decisions.”
“Analysis of the Stock Graph (Canadian Natural Resources Ltd):

1. Key Trends and Patterns:

Current Stock Price: The stock closed at $33.78, showing an increase of 0.33% (+$0.11) from the previous close of $33.67.
Daily Volatility: The stock displayed minor fluctuations throughout the day, ranging approximately between $33.50 and $33.75 before a sharp upward trend towards the close.
Notable Spike: Near the end of the trading day (around 4:30 PM), the stock experienced a noticeable rise, likely due to increased buying interest or positive news.

2. Significant Price Changes:

Morning Movement: After the market opened, the stock showed initial volatility, with some quick upward spikes followed by brief declines.
Stabilization: For the majority of the day, the stock hovered in a stable range, showing no dramatic changes.
End-of-Day Rally: The most significant movement occurred late in the trading day, pushing the stock price above its previous closing value.

3. Insights for Potential Investors:

Consistency: The overall price movement suggests a relatively stable stock with low volatility, making it potentially appealing to risk-averse investors.
End-of-Day Activity: The sharp rise toward the close could indicate positive sentiment or institutional buying, which is worth monitoring in the coming days.
Short-Term View: For day traders, this stock may offer limited opportunities due to its minimal intraday fluctuations.
Long-Term Consideration: It’s essential to review the company’s fundamentals, historical performance, and market trends before making an investment decision.

Would you like a deeper analysis of this company’s historical trends or broader market indicators?”

Comparative Analysis

This chart is an intra-day chart for the stock price of Canadian Natural Resources. Llama 3.2 90B fails to focus just on the image and instead gives us stock analysis for the past years. In contrast, GPT-4o provides precise day-specific analysis along with actionable insights for both short-term and long-term investors, making it ideal for beginners or casual investors. Therefore, I would prefer GPT 4o.

8. Sports Photography

Llama 3.2 90B vs GPT 4o: Image Analysis Comparison

Prompt: Identify the sport depicted in the image. Name five popular international players of this sport and the countries they represent

Llama 3.2 90BGPT 4o
“The sport depicted in the image is soccer. The following are five popular international soccer players along with the countries they represent:

Lionel Messi – Argentina
Cristiano Ronaldo – Portugal
Kylian Mbappé – France
Neymar Jr. – Brazil
Robert Lewandowski – Poland”
“The sport depicted in the image is football (soccer).

Five Popular International Football Players and Their Countries:
Lionel Messi – Argentina
Cristiano Ronaldo – Portugal
Neymar Jr. – Brazil
Kylian Mbappé – France
Kevin De Bruyne – Belgium”

Comparative Analysis

Both Llama 3.2 90B and GPT 4o provide a very similar analysis in this case. Both of them are able to identify the sport’s name and list down the popular international players. One name differs in the output but that can be subjective. I will call it a draw as is often the case with football.

9. Cartoons

Llama 3.2 90B vs GPT 4o: Image Analysis Comparison

Prompt: Identify the character in the image and list its movies with their release years.

Llama 3.2 90BGPT 4o
“The character is Bob, a Minion from the Despicable Me franchise. The films that include the Minion characters are:

Despicable Me (2010)
Despicable Me 2 (2013)
Minions (2015)
Despicable Me 3 (2017)
Minions: The Rise of Gru (2022)”
“The characters in the image are Minions, the iconic yellow creatures from the Despicable Me franchise.

Movies Featuring Minions and Their Release Years:
Despicable Me (2010)
Despicable Me 2 (2013)
Minions (2015)
Despicable Me 3 (2017)
Minions: The Rise of Gru (2022)

Would you like more information about these movies?”

Comparative Analysis

Llama 3.2 90B performs better in specificity (naming “Bob”) making it more precise for identification but did not recognize the other two characters in the image. On the other hand, GPT 4o gives a broader perception (mentions minions as a group) and adds creativity through a conversational tone and descriptive language. Since Llama 3.2 90B was able to find one character, I would go with it.

10. Architectural Designs

Prompt: Describe the architectural style depicted in the image. Mention key features, materials, and design elements that define it.

Llama 3.2 90BGPT 4o
“The architectural style depicted in the image is Islamic architecture, specifically Ottoman architecture. The key features of this style include:

Domed ceilings: The image shows a large dome with intricate designs and patterns, which is a characteristic feature of Islamic architecture.
Arches: The image features pointed arches, which are commonly used in Islamic architecture to create a sense of grandeur and elegance.
Minarets: Although not visible in the image, minarets are typically present in Islamic architecture and are used for calling Muslims to prayer.
Calligraphy: The image features Arabic calligraphy, which is a common element in Islamic architecture and is used to decorate buildings and monuments.
Geometric patterns: The image features geometric patterns, such as stars and circles, which are commonly used in Islamic architecture to create intricate designs and motifs.
Tile work: The image features colorful tiles, which are often used in Islamic architecture to decorate buildings and monuments.
Marble: The image features marble columns and floors, which are commonly used in Islamic architecture for their beauty and durability.

In terms of materials, Islamic architecture often employs:
Stone: Stone is a common material used in Islamic architecture, particularly for building exterior walls and structures.
Brick: Brick is also commonly used in Islamic architecture, particularly for building interior walls and decorative elements.
Tile: Tile is a popular material used in Islamic architecture for decorating buildings and monuments.
Marble: Marble is often used in Islamic architecture for its beauty and durability, particularly for building decorative elements such as columns and floors.

Design elements that define Islamic architecture include:
Symmetry: Islamic architecture often employs symmetry to create a sense of balance and harmony.
Geometric patterns: Geometric patterns are a common element in Islamic architecture and are used to create intricate designs and motifs.
Arabesque: Arabesque is a decorative element that features intricate, curved lines and is commonly used in Islamic architecture to decorate buildings and monuments.
Calligraphy: Calligraphy is a common element in Islamic architecture and is used to decorate buildings and monuments with Arabic script.

Overall, the image depicts a beautiful example of Islamic architecture, specifically Ottoman architecture, with its characteristic domed ceilings, arches, minarets, calligraphy, geometric patterns, and tile work.”
“The architectural style depicted in the image is Islamic architecture, characterized by its intricate and ornate design elements, spiritual ambience, and attention to symmetry and geometry. Below is a detailed description of its key features, materials, and design elements:

Key Features:

1. Geometric Patterns and Symmetry:
The dome and arches prominently feature geometric designs, emphasizing symmetry and repetition, which are hallmarks of Islamic architecture.
The ceiling showcases intricate star-like patterns with golden and blue accents that create a sense of harmony and elegance.

2. Arches and Windows:
The pointed arches with stained glass windows reflect a blend of functionality and aesthetics, allowing natural light to filter through while maintaining a spiritual atmosphere.
The tall windows are framed with ornate calligraphy, often featuring verses from the Quran or other religious motifs.

3. Dome Design:
The central dome is decorated with vivid blue and gold patterns, creating a sense of grandeur and spirituality.
Domes in Islamic architecture are symbolic, representing the vastness of the heavens.

4. Calligraphy:
Arabic calligraphy is visible on the walls and ceiling, blending art with spiritual expression.
The use of calligraphy reflects Islamic traditions, where words are a central form of decoration.

5. Color Palette:
Dominated by shades of blue, gold, and white, which evoke a sense of peace and tranquility.
Blue is a common color in Islamic art, symbolizing the infinite nature of creation.

6. Flooring and Carpeting:
The floor is covered with a beautifully patterned blue carpet, designed with symmetry and precision, reflecting traditional mosque interiors.

Materials:

Stained Glass: Used in the windows to create colorful reflections and enhance the ambience.
Marble and Stone: Commonly used in Islamic architecture for structural elements like columns, arches, and walls.
Ceramic Tiles: Likely used for the intricate patterns and designs on the dome and walls.

Design Elements:

Spiritual Geometry: Reflecting divine order and infinite complexity.
Ornamentation without Figures: Islamic tradition avoids depicting human or animal forms, focusing instead on abstract patterns, calligraphy, and natural motifs.

This design epitomizes the beauty of Islamic architecture, blending functionality, artistry, and spiritual symbolism to create an awe-inspiring space.”

Comparative Analysis

Llama 3.2 90B precisely identifies the architectural style as Ottoman, while GPT-4.0 generalizes it as Islamic architecture. Llama demonstrates a deeper understanding of specific elements, focusing on key features, materials, and design details, making it ideal for readers seeking a straightforward and accurate explanation. On the other hand, GPT-4.0 provides an engaging and holistic description, blending factual details with insights into the spiritual and artistic significance of the features. This comparison highlights Llama’s precision in technical identification versus GPT’s ability to offer a broader contextual and interpretive narrative. I would prefer Llama 3.2 90B for this image.

Final Score

Llama 3.2 90B : 4

GPT 4o: 5

Draw : 1

Conclusion

Both Llama 3.2 90B and GPT-4o excel in distinct areas of image analysis. Llama 3.2 focuses on precision and functionality, offering factual and straightforward descriptions ideal for technical or academic contexts (Exception: Circuit diagram). It provides strong technical depth but lacks creative engagement. In contrast, GPT-4o excels in visually rich and imaginative descriptions. While Llama 3.2 90B suits readers seeking clarity and concise explanations, GPT-4o engages a broader audience with its conversational tone. Together, these models cater to diverse needs, showcasing the versatility of AI in image analysis and interpretation.

Frequently Asked Questions

Q1. What is the difference between Llama 3.2 90B vs GPT-4o?

A. Llama 3.2 90B vision model is an open-source model. With its 90 billion parameters, it excels in tasks such as image understanding, visual reasoning, and image-text retrieval, positioning it as a strong alternative to GPT-4o, whose exact parameter count remains unknown.

Q2. What is the maximum image size in GPT-4 and Llama 3.2 90B vision?

A. The maximum input image size is 20 MB in GPT-4. The Llama 3.2 90B Vision model supports high-resolution image inputs up to 1120×1120 pixels.

Q3. What makes the architecture of Llama 3.2 90B unique?

A. Llama 3.2 90B’s unique architecture integrates text and image processing, enabling advanced visual reasoning and image-text retrieval. It has 90 billion parameters, a vision encoder for high-resolution images, and supports up to 128,000 tokens.

Q4. What image formats are supported in GPT 4o and Llama 3.2 90B?

A. GPT-4 supports PNG (.png), JPEG (.jpeg and .jpg), and non-animated GIF (.gif), while Llama 3.2 90B Vision only supports png, .jpg, or .jpeg format.

Q5. Can these models handle medical images?

A. Llama 3.2 90B and GPT-4o offer precise, step-by-step explanations for identifying medical conditions. However, they may occasionally produce hallucinated outputs, so human evaluation is essential.

As an Instructional Designer at Analytics Vidhya, Diksha has experience creating dynamic educational content on the latest technologies and trends in data science. With a knack for crafting engaging, cutting-edge content, Diksha empowers learners to navigate and excel in the evolving tech landscape, ensuring educational excellence in this rapidly advancing field.

Responses From Readers

Clear

Congratulations, You Did It!
Well Done on Completing Your Learning Journey. Stay curious and keep exploring!

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details