After DeepSeek, Kimi k1.5 Outshines OpenAI o1

Harsh Mishra Last Updated : 30 Jan, 2025

7 min read

The Chinese AI model is the recent advancements in reinforcement learning (RL) with large language models (LLMs) that have led to the development of Kimi k1.5, a model that promises to reshape the landscape of generative AI reasoning. This article explores the key features, innovations, and implications of Kimi k1.5, drawing insights from the research paper.

What is Kimi k1.5?
What’s Kimi k1.5?
Kimi k1.5 Training
Kimi k1.5 Benchmarking
Kimi k1.5 Key Innovations
DeepSeek R1 vs Kimi k1.5
How to Access Kimi k1.5?
Conclusion

What is Kimi k1.5?

Kimi k1.5 represents a significant step forward in scaling reinforcement learning with LLMs. Unlike traditional models that rely on complex methods like Monte Carlo tree search, it adopts a more streamlined approach, focusing on autoregressive prediction and reinforcement learning techniques. The model is designed to handle multimodal tasks, excelling particularly in benchmarks such as Math Vista and Live Code Bench.

What’s Kimi k1.5?

Kimi k1.5 is a cutting-edge large language model (LLM) that integrates reinforcement learning (RL) to enhance its reasoning capabilities. Here are the key features:

Reinforcement Learning Integration: Kimi k1.5 learns from interactions and feedback, allowing it to adapt and explore solutions dynamically.
Streamlined Framework: The model simplifies traditional methods by focusing on autoregressive prediction combined with effective RL strategies, improving training efficiency.
Multimodal Capabilities: It excels in tasks that involve both text and visual data, performing well in benchmarks like Math Vista and Live Code Bench.
State-of-the-Art Performance: Kimi k1.5 achieves impressive scores across various reasoning benchmarks, showcasing its competitive edge in problem-solving.

Kimi k1.5 Training

The training process of Kimi k1.5 is a comprehensive and multi-stage approach designed to enhance its reasoning capabilities through reinforcement learning (RL) and multimodal integration. Here’s a breakdown of the training process:

1. Pretraining Stage

Data Collection: It is pretrained on a diverse and high-quality multimodal corpus, which includes text from various domains (English, Chinese, coding, mathematics, and knowledge) and visual data.
Quality Control: A rigorous filtering process ensures that the training data is relevant and diverse, enhancing the model’s foundational knowledge.

2. Supervised Fine-Tuning (SFT)

Vanilla SFT: After pretraining, the model undergoes a vanilla-supervised fine-tuning phase where it learns from a curated dataset of approximately 1 million examples across different tasks.
Long-CoT SFT: This phase focuses on long-chain of thought (CoT) reasoning, where the model is trained to generate detailed reasoning paths for complex problems.

3. Reinforcement Learning (RL)

RL Prompt Set Curation: A well-constructed prompt set is essential for effective RL training. The prompts are designed to cover a wide range of difficulties and domains, ensuring diverse coverage and accurate evaluability.
Training with RL: The model is trained using a policy model that learns to generate solutions through a sequence of reasoning steps. The training involves sampling thoughts and final answers in an autoregressive manner, guided by a reward model that evaluates the correctness of the responses.
Policy Optimization: Kimi k1.5 employs a variant of online mirror descent for policy optimization, allowing the model to refine its reasoning strategies iteratively.

4. Partial Rollouts

To manage long-context features effectively, Kimi k1.5 uses a partial rollout technique. This method allows the model to handle lengthy reasoning trajectories by saving unfinished portions for continuation in subsequent iterations, optimizing computational efficiency.

5. Length Penalty and Sampling Strategies

A length penalty is introduced to encourage concise reasoning, preventing the model from generating excessively long responses. Additionally, curriculum and prioritized sampling strategies are employed to focus on easier tasks initially and then progressively tackle more challenging problems.

6. Evaluation and Iteration

Throughout the training process, Kimi k1.5 is evaluated against various benchmarks to assess its performance. The model undergoes iterative updates based on feedback from these evaluations, continuously improving its reasoning capabilities.

Kimi k1.5 System Overview

As explained earlier here is the training architecture of Kimi k1.5:

Kimi k1.5 Partial Rollout

Kimi k1.5 Benchmarking

Kimi k1.5 was rigorously evaluated on a range of challenging tasks to assess its reasoning capabilities. The results demonstrate its state-of-the-art performance across various domains.

Key Findings

Math Whiz: Kimi k1.5 achieved a perfect score of 77.5 on AIME 2024, surpassing models like OpenAI o1 (74.4) and OpenAI o1 mini (63.6). In MATH-500, it performed 96.2 surpassing OpenAI o1 with a 94.8 score.
Coding: Kimi k1.5 demonstrated strong coding abilities, achieving a score of 94 same as OpenAI o1 on CodeForces, exceeding the performance of o1-mini and QwQ 72B preview.
Vision: Kimi k1.5 showcased impressive visual reasoning skills, achieving a perfect score of 74.9 on MathVista_test, surpassing models like QvQ 72B (71.4) and OpenAI o1-mini (71).
General Knowledge: Kimi k1.5 demonstrated broad knowledge across domains, scoring 87.4 on MMLU (EM), outperforming models like OpenAI 4o (87.2).

Reasoning Strategies

Kimi k1.5 leverages both short and long chains of thought to tackle problems, demonstrating adaptability in its reasoning approach.

Kimi k1.5 Key Innovations

Long Context Scaling

One of the standout features of Kimi k1.5 is its ability to process an extended context of up to 128,000 tokens. This capability allows the model to handle complex reasoning tasks more efficiently by reusing partial rollouts, which conserves computational resources while enhancing performance.

Chain of Thought Reasoning

It effectively combines long Chain of Thought (CoT) and short CoT reasoning strategies. This dual approach enables the model to engage in deep reasoning when necessary while maintaining efficiency for simpler tasks.

Reinforcement Learning Pipeline

The RL pipeline for Kimi k1.5 is meticulously designed:

Prompt Curation: Diverse prompts covering various domains ensure comprehensive training.
Supervised Fine-Tuning: Initial training focuses on detailed reasoning paths, allowing the model to learn coherent step-by-step logic.
Policy Optimization: Techniques like online policy mirror descent help optimize the model’s performance while preventing overfitting.

Performance Metrics

It has demonstrated remarkable performance across multiple benchmarks:

It outperforms models like GPT-4 and Claude Sonnet 3 by significant margins—up to 550% in some cases.
In specific benchmarks, it achieves a score of 77.5% on AIM for math tasks and ranks in the 94th percentile on coding challenges.

Handling Multimodal Data

It’s architecture allows it to process both text and visual data effectively. The model employs various strategies for handling different types of data, including real-world images and synthetic data, enhancing its versatility across tasks requiring diverse skill sets.

DeepSeek R1 vs Kimi k1.5

DeepSeek R1 and Kimi k1.5 represent two distinct approaches to large language model development, each with its own strengths. While both aim to achieve advanced reasoning capabilities, they differ significantly in their underlying architectures and training methodologies. These differences lead to variations in how they handle complex tasks, particularly those requiring extensive context or dynamic problem-solving. The following sections delve into these key distinctions, exploring how Kimi k1.5’s innovative design choices set it apart from DeepSeek R1.

1. Architectural Differences

Kimi k1.5:
- Utilizes a streamlined architecture that integrates reinforcement learning (RL) with autoregressive prediction, allowing for efficient processing of multimodal tasks.
- Capable of handling an extended context of up to 128,000 tokens, which enhances its ability to manage complex reasoning tasks.
DeepSeek R1:
- While specific architectural details of DeepseekR1 are less emphasized, it typically employs traditional LLM frameworks that may not fully leverage the benefits of RL or extended context processing.
- Focuses on a more conventional approach to model training and reasoning, which may limit its adaptability in dynamic problem-solving scenarios.

2. Training Methodologies

Kimi k1.5:
- Follows a comprehensive multi-stage training process that includes pretraining on a diverse multimodal corpus, supervised fine-tuning, and a robust RL pipeline.
- Incorporates innovative techniques such as partial rollouts and length penalties to optimize training efficiency and encourage concise reasoning.
DeepseekR1:
- Primarily relies on standard supervised learning techniques without the extensive integration of RL strategies.
- May not utilize advanced training techniques like partial rollouts, which can affect its performance in handling longer reasoning tasks.

To know more: Kimi k1.5 vs DeepSeek R1: Battle of the Best Chinese LLMs

How to Access Kimi k1.5?

Here we are going to see how to access and use Kimi k1.5 using an API.

API Access of Kimi k1.5

Log in to KIMI’s management console
Register an account with your phone number
Click on API Key management
Click on Create New and enter a name
The API Key looks like sk-xxxxxxxxxxx

Here’s an example of calling Kimi k1.5:

from openai import Client
client = Client(
 api_key="YOUR_KIMI_KEY",
 base_url="https://api.moonshot.ai/v1",
)
messages = [
 {
     "role": "user",
     "content": "The lengths of the two legs of a right triangle are 3 cm and 4 cm respectively. Find the length of the hypotenuse of this right triangle.",
 },
]

This code initializes a Kimi (Moonshot AI) API client using your API key and base URL, then prepares a user message asking for the hypotenuse of a 3-4-5 right triangle. It’s ready to send this message to the Kimi API for processing.

stream = client.chat.completions.create(
 model="kimi-k1.5-preview",
 messages=messages,
 temperature=0.3,
 stream=True,
 max_tokens=8192,
)

It sends the prepared message to the Kimi API using the specified model, temperature, and token limit, and sets up a streaming response to handle potentially long outputs. It’s designed to receive a step-by-step or chunked answer from Kimi.

for chunk in stream:
 if chunk.choices[0].delta:
     if chunk.choices[0].delta.content:
         print(chunk.choices[0].delta.content, end="")

It iterates through the streamed response from the Kimi API. For each chunk of the response, it checks if there’s new text content (chunk.choices[0].delta.content). If so, it prints that text to the console, effectively displaying the model’s response in real time as it’s generated.

Also Read: Kimi k1.5 vs OpenAI o1: Which a Better Reasoning Model?

Conclusion

Kimi k1.5 signifies a pivotal advancement in generative AI reasoning models by simplifying reinforcement learning design while achieving state-of-the-art performance across multiple domains. Its innovative approaches to scaling context length and integrating multimodal data-position it as a leading model in the field. As we move forward, the implications of such advancements will likely extend beyond academic research into practical applications across industries, fostering a new era of intelligent systems capable of complex reasoning.

Stay tuned to Analytics Vidhya Blog for more such awesome content!

Harsh Mishra

Harsh Mishra is an AI/ML Engineer who spends more time talking to Large Language Models than actual humans. Passionate about GenAI, NLP, and making machines smarter (so they don’t replace him just yet). When not optimizing models, he’s probably optimizing his coffee intake. 🚀☕

Advanced Generative AI LLMs

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

After DeepSeek, Kimi k1.5 Outshines OpenAI o1

Table of contents

What is Kimi k1.5?

What’s Kimi k1.5?

Kimi k1.5 Training

1. Pretraining Stage

2. Supervised Fine-Tuning (SFT)

3. Reinforcement Learning (RL)

4. Partial Rollouts

5. Length Penalty and Sampling Strategies

6. Evaluation and Iteration

Kimi k1.5 System Overview

Kimi k1.5 Partial Rollout

Kimi k1.5 Benchmarking

Key Findings

Reasoning Strategies

Kimi k1.5 Key Innovations

Long Context Scaling

Chain of Thought Reasoning

Reinforcement Learning Pipeline

Performance Metrics

Handling Multimodal Data

DeepSeek R1 vs Kimi k1.5

1. Architectural Differences

2. Training Methodologies

How to Access Kimi k1.5?

API Access of Kimi k1.5

Conclusion

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory

lms_analytics

liap

visit

li_at

s_plt

lang

s_tp

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

s_pltp

s_tslv

li_theme

li_theme_set

Google (11)