GPT-4o vs Gemini: Comparing Two Powerful Multimodal AI Models

Himanshi Singh Last Updated : 15 May, 2024

7 min read

Introduction

With the release of GPT-4o, this model is getting huge attention for its multimodal capabilities. GPT-4o is known for its advanced language processing skills and has been enhanced to interpret and generate visual content. However, we shouldn’t overlook Gemini, a model that has been highly praised for its multimodal abilities long before GPT-4o arrived. Gemini excels at combining image recognition with strong language understanding, making it a worthy competitor to GPT-4o.

In this article, we’ll compare GPT-4o vs Gemini by examining how well they perform in different tasks. By looking at their performance, we aim to determine which model is superior. This comparison is crucial because the ability to work with both text and images is valuable for many applications, such as automatic content creation and data analysis.

GPT -4o vs Gemini
GPT -4o vs Gemini: Final Verdict

GPT -4o vs Gemini

Let’s compare GPT-4o and Gemini to see which one performs better.

Calculate Sum of Numbers

A very basic thing for a multimodal LLM would be to correctly identify the text/numbers written in a given image. So here we’ve provided an image with certain text and have asked our 2 competitors, GPT-4o and Gemini, to calculate the sum of the given numbers in the image. Let’s see who’s going to win this round.

GPT -4o

Here you can see, GPT-4o provided the correct output. Easy peasy.

Gemini

Not sure what Gemini understood with the given prompt. Although it was pretty simple, Gemini couldn’t understand the context.

Result

GPT-4o won!

Code Game Provided in the Image Attached in Python

In this round, we’ve provided an image of a tic-tac-toe game but we’ve not mentioned specifically in the prompt. The task for our models is to first identify the game and then write a python code to implement it.

GPT-4o

def print_board(board):
for row in board:
     print(" | ".join(row))
     print("-" * 9)
def check_winner(board):
# Check rows
for row in board:
     if row[0] == row[1] == row[2] != " ":
         return row[0]
# Check columns
for col in range(3):
     if board[0][col] == board[1][col] == board[2][col] != " ":
         return board[0][col]
# Check diagonals
if board[0][0] == board[1][1] == board[2][2] != " ":
     return board[0][0]
if board[0][2] == board[1][1] == board[2][0] != " ":
     return board[0][2]
return None
def is_board_full(board):
for row in board:
     if " " in row:
         return False
return True
def main():
board = [
     [" ", " ", "O"],
     ["O", "X", " "],
     [" ", " ", " "]
]
]
current_player = "X"  # X's turn to play
print("Initial Board:")
print_board(board)
while True:
     row = int(input(f"Player {current_player}, enter the row (0, 1, 2): "))
     col = int(input(f"Player {current_player}, enter the column (0, 1, 2): "))
     if board[row][col] == " ":
         board[row][col] = current_player
     else:
         print("Cell already taken! Try again.")
         continue
     print_board(board)
     winner = check_winner(board)
     if winner:
         print(f"Player {winner} wins!")
         break
     if is_board_full(board):
         print("It's a draw!")
         break
     current_player = "X" if current_player == "O" else "O"
if __name__ == "__main__":
main()

Output:

The code provided by GPT-4o gave this above output. Great job in identifying the game correctly and creating a grid as well. Although one of the “o” is misplaced in the given output but otherwise we’ve received a fully functional tic-tac-toe game.

Gemini

def display_board(board):
  """Displays the current board state"""
  for row in board:
for cell in row:
   print(cell, end=" ")
print()
def is_valid_move(board, row, col)
  """Checks if a move is valid (i.e. the cell is empty)"""
  return board[row][col] == " "
def make_move(board, player, row, col):
  """Places a player's mark on the board"""
  board[row][col] = player
def is_winner(board, player):
  """Checks if a player has won"""
  # Check rows
  for row in board:
if all(cell == player for cell in row):
   return True
  # Check columns
  for col in range(len(board[0])):
if all(board[row][col] == player for row in range(len(board))):
   return True
  # Check diagonals
  if all(board[row][row] == player for row in range(len(board))):
return True
  if all(board[row][len(board) - row - 1] == player for row in range(len(board))):
return True
  return False
def is_board_full(board):
  """Checks if the board is full"""
  for row in board:
for cell in row:
   if cell == " ":
     return False
  return True
def main():
  """Main game loop"""
  board = [[" " for _ in range(3)] for _ in range(3)]
  current_player = "X"
  while True:
display_board(board)
# Get player move
while True:
   row = int(input("Player {} enter row (1-3): ".format(current_player))) - 1
   col = int(input("Player {} enter column (1-3): ".format(current_player))) - 1
   if is_valid_move(board, row, col):
     break
   else:
     print("Invalid move. Please try again.")
# Make move and check for winner
make_move(board, current_player, row, col)
if is_winner(board, current_player):
   display_board(board)
   print("Player {} wins!".format(current_player))
   break
# Check for tie
if is_board_full(board):
   display_board(board)
   print("It's a tie!")
   break
# Switch player
current_player = "O" if current_player == "X" else "X"
if __name__ == "__main__":
  main()

Output:

Now, when we run the code provided by Gemini, then it identified the game clearly but as you can see no we did not get any grid. This made it difficult to play the game. Below you can check how the game looks without the grid.

Result

GPT-4o won!

Generate Python Code to Recreate Bar Chart using Matplotlib

In this round we’ve given an image of a bar chart to both our models. They need to analyze the chart and generate the Python code required to recreate it using Matplotlib. Additionally, they should ensure that the code produces the same bar chart when run. Let’s see which model wins this round.

GPT-4o

import matplotlib.pyplot as plt
import numpy as np
# Data
quarters = ['Q1', 'Q2', 'Q3', 'Q4']
sales_A = [10, 15, 17, 12]
sales_B = [10, 15, 20, 15]
# Plotting
fig, ax = plt.subplots()
bar_width = 0.35
index = np.arange(len(quarters))
bar1 = ax.bar(index, sales_A, bar_width, label='A')
bar2 = ax.bar(index, sales_B, bar_width, bottom=sales_A, label='B')
ax.set_xlabel('Quarter')
ax.set_ylabel('Sales')
ax.set_title('Sales by Product & Quarter')
ax.set_xticks(index)
ax.set_xticklabels(quarters)
ax.legend()
plt.show()

Output:

Gemini

import matplotlib.pyplot as plt
# Data
products = ["A", "B"]
sales_q1 = [30, 25]
sales_q2 = [40, 35]
sales_q3 = [20, 30]
sales_q4 = [10, 15]
# Create a bar chart
fig, ax = plt.subplots()
width = 0.35  # the width of the bars
index = range(len(products))  # the x-axis positions for the bars
bar1 = plt.bar(index, sales_q1, width, label='Q1')
bar2 = plt.bar([p + width for p in index], sales_q2, width, label='Q2')
bar3 = plt.bar([p + width * 2 for p in index], sales_q3, width, label='Q3')
bar4 = plt.bar([p + width * 3 for p in index], sales_q4, width, label='Q4')
# Add labels and title
plt.xlabel('Products')
plt.ylabel('Sales')
plt.title('Sales by Product & Quarter')
# X-axis labels
plt.xticks([p + 1.5 * width for p in index], products)
# Legend
plt.legend()
plt.show()

Output:

Result

GPT-4o won!

Explain Code and Provide the Output

Here we’ve given an image input to both our models and they have to understand the code written in the provided screenshot and additionally provide the output for the same. Let’s see how they perform on this test.

GPT-4o

Provided a very long summary but here’s the summary and output:

Gemini

Got the below explanation but no output for the code.

Point goes to GPT-4o for properly understanding the prompt and providing correct output as well.

Result

GPT-4o won!

Identify Buttons and Input Fields in the Given Design

In this prompt the models were asked for a detailed analysis of a user interface (UI) design to locate and describe interactive elements such as buttons and input fields. The goal is to specify what each element is, its purpose, and any associated labels or features.

GPT-4o

Impressive how accurately GPT-4o can identify items in a design with a clear understanding of each button, checkbox and textbox.

Gemini

Gemini got the input fields correct but there was some uncertainty in the submit button which was square in shape.

Result

GPT-4o won!

GPT -4o vs Gemini: Final Verdict

Tasks	Winner
Calculating the sum of numbers from an image.	GPT-4o
Writing Python code for a tic-tac-toe game based on an image.	GPT-4o
Creating Python code to recreate a bar chart from an image.	GPT-4o
Explaining code from a screenshot and providing the output.	GPT-4o
Identifying buttons and input fields in a user interface design.	GPT-4o

In this head-to-head comparison, GPT-4o clearly outperformed Gemini. GPT-4o consistently provided accurate and detailed results across all tasks, from calculating sums and coding games to generating bar charts and analyzing UI designs. It showed a strong ability to understand and process both text and images effectively.

Gemini, on the other hand, struggled in several areas. While it performed adequately in some tasks, it often failed to provide detailed explanations or accurate coding. Its performance was inconsistent, highlighting its limitations compared to GPT-4o.

Overall, GPT-4o proved to be the more reliable and versatile model. Its superior performance across multiple tasks makes it the clear winner in this comparison. If you need a model that can handle both text and images with high accuracy, GPT-4o is the better choice. In this article we explored GPT-4o vs Gemini.

Himanshi Singh

I’m a data lover who enjoys finding hidden patterns and turning them into useful insights. As the Manager - Content and Growth at Analytics Vidhya, I help data enthusiasts learn, share, and grow together.

Thanks for stopping by my profile - hope you found something you liked :)

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Caio Firme

Thanks for this comparative analysis. I have just one question: did you use Gemini Ultra 1.0 or Gemini 1.0 Pro in your analysis?

Show 1 reply

Hi, this was Gemini Pro since it is free to use and GPT-4o is also available with limited use for free users.

partial_andy405

Thanks for the article. It was very interesting. I do think it would be useful to compare GPT-4.0 with Gemini Ultra for a more comprehensive evaluation. Also, it’s worth you clarifying at the start of the article which Gemini model was used and its limitations compared to the Ultra version.

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

GPT-4o vs Gemini: Comparing Two Powerful Multimodal AI Models

Introduction

Table of contents

GPT -4o vs Gemini

Calculate Sum of Numbers

GPT -4o

Gemini

Result

Code Game Provided in the Image Attached in Python

GPT-4o

Gemini

Result

Generate Python Code to Recreate Bar Chart using Matplotlib

GPT-4o

Gemini

Result

Explain Code and Provide the Output

GPT-4o

Gemini

Result

Identify Buttons and Input Fields in the Given Design

GPT-4o

Gemini

Result

GPT -4o vs Gemini: Final Verdict

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory