It’s Lunar New Year in China and the world is celebrating! Thanks to the launch of one amazing model after the other by Chinese companies. Alibaba too recently launched Qwen2.5-Max – a model that supersedes giants from OpenAI, DeepSeek & Llama. Packed with advanced reasoning, and image & video generation, this model is set to shake the GenAI world. In this blog, we will compare the performance of Qwen2.5-Max, DeepSeek-R1, and Kimi k1.5 on several fronts to find the best LLM at present!
Let’s begin comparing Qwen2.5-max, DeepSeek-R1, and Kimi k1.5, starting with their technical details. For this, we will be comparing the benchmark performances and features of these 3 models.
Based on the available data, here is how Qwen2.5-Max performs against DeepSeek-R1 and Kimi k1 on various standard benchmark tests.
Each of the three models has its own web chat interface which offers different features. Here’s how Qwen2.5-Max performs against its peers when it comes to features:
Feature | Qwen2.5-Max | DeepSeek-R1 | Kimi k1.5 |
---|---|---|---|
Image Analysis | No | Yes | Yes |
Web Interface | Yes | Yes | Yes |
Image Generation | Yes | No | No |
Web Search | No | Yes | Yes |
Artifacts | Yes | No | No |
Documents Upload | Single | Multiple | Multiple |
Common Phrase | No | No | Yes |
Now let’s compare the currently reigning top 3 Chinese models – Qwen2.5-max, DeepSeek-R1, and Kimi k1.5 – based on their applications. For this, we will be testing their performance on three different tasks involving reasoning, multi-step processing, and coding.
We will give the same prompt to all three models and compare their responses, giving them scores of 0, 0.5, and 1 based on their performance. At the end, we’ll tally the scores to find out which of them is the best. So let’s get started.
Prompt: “ Prove mathematically that Earth is round”
Qwen2.5-Max | DeepSeek-R1 | Kimi k1.5 |
---|---|---|
Using geometry, gravity, satellite motion, and historical experiments, we have shown mathematically that the Earth is round. The curvature of the horizon, uniformity of gravity, and consistency of satellite orbits all support this conclusion.The Earth is a sphere (or more precisely, an oblate spheroid). | By calculating Earth’s circumference using angular differences in shadows and distances between two locations, Eratosthenes mathematically proved Earth’s spherical shape. Modern measurements and observations (e.g., horizon curvature, lunar eclipses) further confirm this. | The consistency of measured distances and angles with spherical geometry (e.g., d=Rθ) proves the Earth is round. Flat-Earth models fail to explain these relationships. |
Qwen2.5-Max | DeepSeek-R1 | Kimi k1.5 |
---|---|---|
The model provided some key observations supporting the claim. It then covered various parameters like the geometry of a sphere, horizon curvature, rotational and centrifugal force, and others that were not required. It did cover “Eratosthenes’ Experiment” as part of those parameters. | The model explained the theory in brief along with all the mathematical calculations to give the proof for the given task. It also gave some additional proofs along with some key observations to establish its result. The model used Eratosthenes’ Experiment to establish the proof. | This model gave the most clear and concise response. It proved the hypothesis with a small mathematical calculation. It used the measurements from Eratosthenes’ Experiment, without explicitly mentioning it. |
The most relevant proof of Earth’s curvature would be the one involving Eratosthenes’ method (circa 240 BCE), since it’s one of the oldest and most comprehensive proofs involving the mathematical analysis of the angular difference between two places. All three models used that approach in one way or the other.
Qwen2.5-Max provided 8 different ways to prove that the Earth is round with no proper explanation of any. DeepSeek-R1 took Eratosthenes’ method – explained its theory and mathematics in concise and clear terms. Kimi k 1.5 used the simplest approach, based on the required method without even explicitly mentioning it.
Prompt: “Summarise the lesson in 1 line, create a flowchart to explain the process happening in the lesson, and then translate the summary into French.
🔗 Lesson“
Qwen2.5-Max | DeepSeek-R1 | Kimi k1.5 |
---|---|---|
The summary was concise and laid out the topics covered in the lesson. | The summary of the lesson was crisp, concise, and to the point. | The summary covered all the topics and was quite simple, yet a bit long compared to the others. |
The flowchart covered all essential headings and their subheadings as required. | The flowchart covered all essential headings but had more than the required content in the sub-headings. | Instead of the flowchart about the lesson, the model generated the flowchart on the process that was covered in the lesson. Overall this flowchart was clear and crisp. |
I wanted a simple, crisp, one-line summary of the lesson which was generated by DeepSeek-R1 and Qwen2.5-Max alike. But for the flowchart, while the design and crispness of the result generated by Kimi k1.5 was the exact ask, it lacked details about the flow of the lesson. The flowchart by DeepSeek-R1 was a bit content-heavy while Qwen2.5-Max gave a good flowchart covering all essentials.
Prompt: “Write an HTML code for a wordle kind of an app”
Note: Before you enter your prompt in Qwen2.5-Max, click on artifacts, this way you will be able to visualize the output of your code within the chat interface.
Qwen2.5-Max:
DeepSeek-R1:
Kimi k1.5:
Qwen2.5-Max | DeepSeek-R1 | Kimi k1.5 |
---|---|---|
The model generates the code quickly and the app itself looks a lot like the actual “Wordle app”. Instead of alphabets listed at the bottom, it presented us the option to directly enter our 5 letters. It would then automatically update those letters in the board. | The model takes some time to generate the code but the output was great! The output it generated was almost the same as the actual “Wordle App”. We can select the alphabets that we wish to try guessing and they would put our selection into the word. | The model generates the code quickly enough. But the output of the code was a distorted version of the actual “Wordle App”. The wordboard was not appearing, neither were all letters. In fact, the enter and delete features were almost coming over the alphabets. |
With its artifacts feature, it was super easy to analyze the code right there. | The only issue with it was that I had to copy the code and run it in a different interface. | Besides this, I had to run this code in a different interface to visualize the output. |
Firstly, I wanted the app generated to be as similar to the actual Wordle app as possible. Secondly, I wanted to put minimum effort into testing the generated code. The result generated by DeepSeek-R1 was the closest to the ask, while Qwen-2.5’s fairly good result was the easiest to test.
Qwen2.5-Max is an amazing LLM that gives models like DeepSeek-R1 and Kimi k1.5 tough competition. Its responses were comparable across all different tasks. Although it currently lacks the power to analyze images or search the web, once those features are live; Qwen2.5-Max will be an unbeatable model. It already possesses video generation capabilities that even GPT-4o doesn’t have yet. Moreover, its interface is quite intuitive, with features like artifacts, which make it simpler to run the codes within the same platform. All in all, Qwen2.5-Max by Alibaba is an all-round LLM that is here to redefine how we work with LLMs!
A. Qwen2.5-Max is Alibaba’s latest multimodal LLM, optimized for text, image, and video generation with over 20 trillion parameters.
A. Compared to DeepSeek-R1 and Kimi k1.5, it excels in reasoning, multimodal content creation, and programming support, making it a strong competitor in the Chinese AI ecosystem.
A. No, Qwen2.5-Max is a closed-source model, while DeepSeek-R1 and Kimi k1.5 are open-source.
A. Yes! Qwen2.5-Max model supports image and video generation.
A. Yes, both DeepSeek-R1 and Kimi k1.5 support real-time web search, whereas Qwen2.5-Max currently lacks web search capabilities. This gives DeepSeek-R1 and Kimi an edge in retrieving the latest online information.
A. Depending on your use case, choose:
– Qwen2.5-Max: If you need multimodal capabilities (text, images, video) and advanced AI reasoning.
– DeepSeek-R1: If you want the flexibility of an open-source model, superior question-answering performance, and web search integration.
– Kimi k1.5: If you need efficient document handling, STEM-based problem-solving, and real-time web access.