Is Qwen2.5-Max Better than DeepSeek-R1 and Kimi k1.5?

Anu Madan Last Updated : 07 Apr, 2025

7 min read

It’s Lunar New Year in China and the world is celebrating! Thanks to the launch of one amazing model after the other by Chinese companies. Alibaba too recently launched Qwen2.5-Max – a model that supersedes giants from OpenAI, DeepSeek & Llama. Packed with advanced reasoning, and image & video generation, this model is set to shake the GenAI world. In this blog, we will compare the performance of Qwen2.5-Max, DeepSeek-R1, and Kimi k1.5 on several fronts to find the best LLM at present!

Introduction to Qwen2.5-Max, DeepSeek-R1, and Kimi k1.5
Qwen2.5-Max Vs DeepSeek-R1 Vs Kimi k1.5: Technical Comparison
- Benchmark Performance Comparison
- Feature Comparison
Qwen2.5-Max Vs DeepSeek-R1 Vs Kimi k1.5: Application-based Analysis
Frequently Asked Questions

Introduction to Qwen2.5-Max, DeepSeek-R1, and Kimi k1.5

Qwen2.5-Max: It is a closed-source multimodal LLM by Alibaba Cloud, trained with over 20 trillion parameters and fine-tuned using RLHF. It shows advanced reasoning capabilities with the ability to generate images and videos.
DeepSeek-R1: It is an open-source model by DeepSeek, that has been trained using reinforcement learning with supervised fine-tuning. This model excels in logical thinking, complex problem-solving, mathematics, and coding.
Kimi k1.5: It is an open-source multimodal LLM by Moonshot AI that can process large amounts of content in a simple prompt. It can conduct real-time web searches across 100+ websites and work with multiple files all at once. The model shows great results in tasks involving STEM, coding, and general reasoning.

Qwen2.5-Max Vs DeepSeek-R1 Vs Kimi k1.5: Technical Comparison

Let’s begin comparing Qwen2.5-max, DeepSeek-R1, and Kimi k1.5, starting with their technical details. For this, we will be comparing the benchmark performances and features of these 3 models.

Benchmark Performance Comparison

Based on the available data, here is how Qwen2.5-Max performs against DeepSeek-R1 and Kimi k1 on various standard benchmark tests.

Live Code Bench: This benchmark determines how each model handles coding tasks, including writing, debugging, or understanding code. Kimi k1.5 and Qwen2.5-Max are almost tied, indicating they are both very capable of generating and parsing code snippets.
GPQA (General-Purpose Question Answering): This benchmark evaluates a model’s ability to understand and solve questions around multiple domains like reasoning, context-based understanding, and factual knowledge. On this benchmark, DeepSeek R1 leads over Qwen2.5-Max with a significant margin, highlighting its dominance.
MMLU: This benchmark tests multi-subject knowledge and language understanding across various domains (mathematics, sciences, humanities, etc.). Qwen2.5-Max leads over DeepSeek R1, showing a better grasp of diverse academic and real-world topics.
C-Eval (Comprehensive Evaluation): This benchmark covers nuanced, higher-level reasoning questions. Qwen2.5-Max shows a better performance on this parameter compared to Kimi k1.5, making it a better reasoning model.

Feature Comparison

Each of the three models has its own web chat interface which offers different features. Here’s how Qwen2.5-Max performs against its peers when it comes to features:

Feature	Qwen2.5-Max	DeepSeek-R1	Kimi k1.5
Image Analysis	No	Yes	Yes
Web Interface	Yes	Yes	Yes
Image Generation	Yes	No	No
Web Search	No	Yes	Yes
Artifacts	Yes	No	No
Documents Upload	Single	Multiple	Multiple
Common Phrase	No	No	Yes

Qwen2.5-Max Vs DeepSeek-R1 Vs Kimi k1.5: Application-based Analysis

Now let’s compare the currently reigning top 3 Chinese models – Qwen2.5-max, DeepSeek-R1, and Kimi k1.5 – based on their applications. For this, we will be testing their performance on three different tasks involving reasoning, multi-step processing, and coding.

We will give the same prompt to all three models and compare their responses, giving them scores of 0, 0.5, and 1 based on their performance. At the end, we’ll tally the scores to find out which of them is the best. So let’s get started.

Task 1: Advanced Reasoning

Prompt: “ Prove mathematically that Earth is round”

Outputs

Qwen2.5-Max	DeepSeek-R1	Kimi k1.5
Using geometry, gravity, satellite motion, and historical experiments, we have shown mathematically that the Earth is round. The curvature of the horizon, uniformity of gravity, and consistency of satellite orbits all support this conclusion.The Earth is a sphere (or more precisely, an oblate spheroid).	By calculating Earth’s circumference using angular differences in shadows and distances between two locations, Eratosthenes mathematically proved Earth’s spherical shape. Modern measurements and observations (e.g., horizon curvature, lunar eclipses) further confirm this.	The consistency of measured distances and angles with spherical geometry (e.g., d=Rθ) proves the Earth is round. Flat-Earth models fail to explain these relationships.

My Analysis

Qwen2.5-Max	DeepSeek-R1	Kimi k1.5
The model provided some key observations supporting the claim. It then covered various parameters like the geometry of a sphere, horizon curvature, rotational and centrifugal force, and others that were not required. It did cover “Eratosthenes’ Experiment” as part of those parameters.	The model explained the theory in brief along with all the mathematical calculations to give the proof for the given task. It also gave some additional proofs along with some key observations to establish its result. The model used Eratosthenes’ Experiment to establish the proof.	This model gave the most clear and concise response. It proved the hypothesis with a small mathematical calculation. It used the measurements from Eratosthenes’ Experiment, without explicitly mentioning it.

The most relevant proof of Earth’s curvature would be the one involving Eratosthenes’ method (circa 240 BCE), since it’s one of the oldest and most comprehensive proofs involving the mathematical analysis of the angular difference between two places. All three models used that approach in one way or the other.

Qwen2.5-Max provided 8 different ways to prove that the Earth is round with no proper explanation of any. DeepSeek-R1 took Eratosthenes’ method – explained its theory and mathematics in concise and clear terms. Kimi k 1.5 used the simplest approach, based on the required method without even explicitly mentioning it.

Score: Qwen2.5-Max: 0 | DeepSeek-R1: 0.5 | Kimi k1.5: 1

Task 2: Multi-step Document Processing & Analysis

Prompt: “Summarise the lesson in 1 line, create a flowchart to explain the process happening in the lesson, and then translate the summary into French.
🔗 Lesson“

Outputs

My Analysis

Qwen2.5-Max	DeepSeek-R1	Kimi k1.5
The summary was concise and laid out the topics covered in the lesson.	The summary of the lesson was crisp, concise, and to the point.	The summary covered all the topics and was quite simple, yet a bit long compared to the others.
The flowchart covered all essential headings and their subheadings as required.	The flowchart covered all essential headings but had more than the required content in the sub-headings.	Instead of the flowchart about the lesson, the model generated the flowchart on the process that was covered in the lesson. Overall this flowchart was clear and crisp.

I wanted a simple, crisp, one-line summary of the lesson which was generated by DeepSeek-R1 and Qwen2.5-Max alike. But for the flowchart, while the design and crispness of the result generated by Kimi k1.5 was the exact ask, it lacked details about the flow of the lesson. The flowchart by DeepSeek-R1 was a bit content-heavy while Qwen2.5-Max gave a good flowchart covering all essentials.

Score: Qwen2.5-Max: 1 | DeepSeek-R1: 0.5 | Kimi k1.5: 0.5

Task 3: Coding

Prompt: “Write an HTML code for a wordle kind of an app”

Note: Before you enter your prompt in Qwen2.5-Max, click on artifacts, this way you will be able to visualize the output of your code within the chat interface.

Output:

Qwen2.5-Max:

DeepSeek-R1:

Kimi k1.5:

My Analysis:

Qwen2.5-Max	DeepSeek-R1	Kimi k1.5
The model generates the code quickly and the app itself looks a lot like the actual “Wordle app”. Instead of alphabets listed at the bottom, it presented us the option to directly enter our 5 letters. It would then automatically update those letters in the board.	The model takes some time to generate the code but the output was great! The output it generated was almost the same as the actual “Wordle App”. We can select the alphabets that we wish to try guessing and they would put our selection into the word.	The model generates the code quickly enough. But the output of the code was a distorted version of the actual “Wordle App”. The wordboard was not appearing, neither were all letters. In fact, the enter and delete features were almost coming over the alphabets.
With its artifacts feature, it was super easy to analyze the code right there.	The only issue with it was that I had to copy the code and run it in a different interface.	Besides this, I had to run this code in a different interface to visualize the output.

Firstly, I wanted the app generated to be as similar to the actual Wordle app as possible. Secondly, I wanted to put minimum effort into testing the generated code. The result generated by DeepSeek-R1 was the closest to the ask, while Qwen-2.5’s fairly good result was the easiest to test.

Score: Qwen2.5-Max: 1 | DeepSeek-R1: 1 | Kimi k1.5: 0

Final Score

Qwen2.5-Max: 2 | DeepSeek-R1: 1.5 | Kimi k1.5: 1.5

Conclusion

Qwen2.5-Max is an amazing LLM that gives models like DeepSeek-R1 and Kimi k1.5 tough competition. Its responses were comparable across all different tasks. Although it currently lacks the power to analyze images or search the web, once those features are live; Qwen2.5-Max will be an unbeatable model. It already possesses video generation capabilities that even GPT-4o doesn’t have yet. Moreover, its interface is quite intuitive, with features like artifacts, which make it simpler to run the codes within the same platform. All in all, Qwen2.5-Max by Alibaba is an all-round LLM that is here to redefine how we work with LLMs!

Frequently Asked Questions

Q1. What is Qwen2.5-Max?

A. Qwen2.5-Max is Alibaba’s latest multimodal LLM, optimized for text, image, and video generation with over 20 trillion parameters.

Q2. How does Qwen2.5-Max perform compared to DeepSeek-R1 and Kimi k1.5?

A. Compared to DeepSeek-R1 and Kimi k1.5, it excels in reasoning, multimodal content creation, and programming support, making it a strong competitor in the Chinese AI ecosystem.

Q3. Is Qwen2.5-Max open-source?

A. No, Qwen2.5-Max is a closed-source model, while DeepSeek-R1 and Kimi k1.5 are open-source.

Q4. Can Qwen2.5-Max generate images and videos?

A. Yes! Qwen2.5-Max model supports image and video generation.

Q5. Can Kimi k1.5 and DeepSeek-R1 perform web searches?

A. Yes, both DeepSeek-R1 and Kimi k1.5 support real-time web search, whereas Qwen2.5-Max currently lacks web search capabilities. This gives DeepSeek-R1 and Kimi an edge in retrieving the latest online information.

Q6. Should I choose Qwen2.5-Max, DeepSeek-R1, or Kimi k1.5?

A. Depending on your use case, choose:
– Qwen2.5-Max: If you need multimodal capabilities (text, images, video) and advanced AI reasoning.
– DeepSeek-R1: If you want the flexibility of an open-source model, superior question-answering performance, and web search integration.
– Kimi k1.5: If you need efficient document handling, STEM-based problem-solving, and real-time web access.

Anu Madan

Anu Madan has 5+ years of experience in content creation and management. Having worked as a content creator, reviewer, and manager, she has created several courses and blogs. Currently, she working on creating and strategizing the content curation and design around Generative AI and other upcoming technology.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Is Qwen2.5-Max Better than DeepSeek-R1 and Kimi k1.5?

Table of contents

Introduction to Qwen2.5-Max, DeepSeek-R1, and Kimi k1.5

Qwen2.5-Max Vs DeepSeek-R1 Vs Kimi k1.5: Technical Comparison

Benchmark Performance Comparison

Feature Comparison

Qwen2.5-Max Vs DeepSeek-R1 Vs Kimi k1.5: Application-based Analysis

Task 1: Advanced Reasoning

Outputs

My Analysis

Score: Qwen2.5-Max: 0 | DeepSeek-R1: 0.5 | Kimi k1.5: 1

Task 2: Multi-step Document Processing & Analysis

Outputs

My Analysis

Score: Qwen2.5-Max: 1 | DeepSeek-R1: 0.5 | Kimi k1.5: 0.5

Task 3: Coding

Output:

My Analysis:

Score: Qwen2.5-Max: 1 | DeepSeek-R1: 1 | Kimi k1.5: 0

Final Score

Qwen2.5-Max: 2 | DeepSeek-R1: 1.5 | Kimi k1.5: 1.5

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap