After DeepSeek, Kimi k1.5 Outshines OpenAI o1

Harsh Mishra Last Updated : 30 Jan, 2025
7 min read

The Chinese AI model is the recent advancements in reinforcement learning (RL) with large language models (LLMs) that have led to the development of Kimi k1.5, a model that promises to reshape the landscape of generative AI reasoning. This article explores the key features, innovations, and implications of Kimi k1.5, drawing insights from the research paper.

What is Kimi k1.5?

Kimi k1.5 represents a significant step forward in scaling reinforcement learning with LLMs. Unlike traditional models that rely on complex methods like Monte Carlo tree search, it adopts a more streamlined approach, focusing on autoregressive prediction and reinforcement learning techniques. The model is designed to handle multimodal tasks, excelling particularly in benchmarks such as Math Vista and Live Code Bench. 

What’s Kimi k1.5?

Kimi k1.5 is a cutting-edge large language model (LLM) that integrates reinforcement learning (RL) to enhance its reasoning capabilities. Here are the key features:

  • Reinforcement Learning Integration: Kimi k1.5 learns from interactions and feedback, allowing it to adapt and explore solutions dynamically.
  • Streamlined Framework: The model simplifies traditional methods by focusing on autoregressive prediction combined with effective RL strategies, improving training efficiency.
  • Multimodal Capabilities: It excels in tasks that involve both text and visual data, performing well in benchmarks like Math Vista and Live Code Bench.
  • State-of-the-Art Performance: Kimi k1.5 achieves impressive scores across various reasoning benchmarks, showcasing its competitive edge in problem-solving.

Kimi k1.5 Training

The training process of Kimi k1.5 is a comprehensive and multi-stage approach designed to enhance its reasoning capabilities through reinforcement learning (RL) and multimodal integration. Here’s a breakdown of the training process:

1. Pretraining Stage

  • Data Collection: It is pretrained on a diverse and high-quality multimodal corpus, which includes text from various domains (English, Chinese, coding, mathematics, and knowledge) and visual data.
  • Quality Control: A rigorous filtering process ensures that the training data is relevant and diverse, enhancing the model’s foundational knowledge.

2. Supervised Fine-Tuning (SFT)

  • Vanilla SFT: After pretraining, the model undergoes a vanilla-supervised fine-tuning phase where it learns from a curated dataset of approximately 1 million examples across different tasks.
  • Long-CoT SFT: This phase focuses on long-chain of thought (CoT) reasoning, where the model is trained to generate detailed reasoning paths for complex problems.

3. Reinforcement Learning (RL)

  • RL Prompt Set Curation: A well-constructed prompt set is essential for effective RL training. The prompts are designed to cover a wide range of difficulties and domains, ensuring diverse coverage and accurate evaluability.
  • Training with RL: The model is trained using a policy model that learns to generate solutions through a sequence of reasoning steps. The training involves sampling thoughts and final answers in an autoregressive manner, guided by a reward model that evaluates the correctness of the responses.
  • Policy Optimization: Kimi k1.5 employs a variant of online mirror descent for policy optimization, allowing the model to refine its reasoning strategies iteratively.

4. Partial Rollouts

To manage long-context features effectively, Kimi k1.5 uses a partial rollout technique. This method allows the model to handle lengthy reasoning trajectories by saving unfinished portions for continuation in subsequent iterations, optimizing computational efficiency.

5. Length Penalty and Sampling Strategies

A length penalty is introduced to encourage concise reasoning, preventing the model from generating excessively long responses. Additionally, curriculum and prioritized sampling strategies are employed to focus on easier tasks initially and then progressively tackle more challenging problems.

6. Evaluation and Iteration

Throughout the training process, Kimi k1.5 is evaluated against various benchmarks to assess its performance. The model undergoes iterative updates based on feedback from these evaluations, continuously improving its reasoning capabilities.

Kimi k1.5 System Overview

As explained earlier here is the training architecture of Kimi k1.5:

Kimi K1.5 System Overview
Source: Kimi k1.5

Kimi k1.5 Partial Rollout

Kimi K1.5 Partial Rollout
Source: Kimi k1.5

Kimi k1.5 Benchmarking

Kimi k1.5 was rigorously evaluated on a range of challenging tasks to assess its reasoning capabilities. The results demonstrate its state-of-the-art performance across various domains.

Key Findings

  • Math Whiz: Kimi k1.5 achieved a perfect score of 77.5 on AIME 2024, surpassing models like OpenAI o1 (74.4) and OpenAI o1 mini (63.6). In MATH-500, it performed 96.2 surpassing OpenAI o1 with a 94.8 score.
  • Coding: Kimi k1.5 demonstrated strong coding abilities, achieving a score of 94 same as OpenAI o1 on CodeForces, exceeding the performance of o1-mini and QwQ 72B preview.
  • Vision: Kimi k1.5 showcased impressive visual reasoning skills, achieving a perfect score of 74.9 on MathVista_test, surpassing models like QvQ 72B (71.4) and OpenAI o1-mini (71).
  • General Knowledge: Kimi k1.5 demonstrated broad knowledge across domains, scoring 87.4 on MMLU (EM), outperforming models like OpenAI 4o (87.2).

Reasoning Strategies

  • Kimi k1.5 leverages both short and long chains of thought to tackle problems, demonstrating adaptability in its reasoning approach.
Comparison
Source: Kimi k1.5

Kimi k1.5 Key Innovations 

Long Context Scaling

One of the standout features of Kimi k1.5 is its ability to process an extended context of up to 128,000 tokens. This capability allows the model to handle complex reasoning tasks more efficiently by reusing partial rollouts, which conserves computational resources while enhancing performance.

Chain of Thought Reasoning

It effectively combines long Chain of Thought (CoT) and short CoT reasoning strategies. This dual approach enables the model to engage in deep reasoning when necessary while maintaining efficiency for simpler tasks.

Reinforcement Learning Pipeline

The RL pipeline for Kimi k1.5 is meticulously designed:

  • Prompt Curation: Diverse prompts covering various domains ensure comprehensive training.
  • Supervised Fine-Tuning: Initial training focuses on detailed reasoning paths, allowing the model to learn coherent step-by-step logic.
  • Policy Optimization: Techniques like online policy mirror descent help optimize the model’s performance while preventing overfitting.

Performance Metrics

It has demonstrated remarkable performance across multiple benchmarks:

  • It outperforms models like GPT-4 and Claude Sonnet 3 by significant margins—up to 550% in some cases.
  • In specific benchmarks, it achieves a score of 77.5% on AIM for math tasks and ranks in the 94th percentile on coding challenges.

Handling Multimodal Data

It’s architecture allows it to process both text and visual data effectively. The model employs various strategies for handling different types of data, including real-world images and synthetic data, enhancing its versatility across tasks requiring diverse skill sets.

DeepSeek R1 vs Kimi k1.5

DeepSeek R1 and Kimi k1.5 represent two distinct approaches to large language model development, each with its own strengths. While both aim to achieve advanced reasoning capabilities, they differ significantly in their underlying architectures and training methodologies. These differences lead to variations in how they handle complex tasks, particularly those requiring extensive context or dynamic problem-solving. The following sections delve into these key distinctions, exploring how Kimi k1.5’s innovative design choices set it apart from DeepSeek R1.

1. Architectural Differences

  • Kimi k1.5:
    • Utilizes a streamlined architecture that integrates reinforcement learning (RL) with autoregressive prediction, allowing for efficient processing of multimodal tasks.
    • Capable of handling an extended context of up to 128,000 tokens, which enhances its ability to manage complex reasoning tasks.
  • DeepSeek R1:
    • While specific architectural details of DeepseekR1 are less emphasized, it typically employs traditional LLM frameworks that may not fully leverage the benefits of RL or extended context processing.
    • Focuses on a more conventional approach to model training and reasoning, which may limit its adaptability in dynamic problem-solving scenarios.

2. Training Methodologies

  • Kimi k1.5:
    • Follows a comprehensive multi-stage training process that includes pretraining on a diverse multimodal corpus, supervised fine-tuning, and a robust RL pipeline.
    • Incorporates innovative techniques such as partial rollouts and length penalties to optimize training efficiency and encourage concise reasoning.
  • DeepseekR1:
    • Primarily relies on standard supervised learning techniques without the extensive integration of RL strategies.
    • May not utilize advanced training techniques like partial rollouts, which can affect its performance in handling longer reasoning tasks.

To know more: Kimi k1.5 vs DeepSeek R1: Battle of the Best Chinese LLMs

How to Access Kimi k1.5?

Here we are going to see how to access and use Kimi k1.5 using an API.

API Access of Kimi k1.5

  • Log in to KIMI’s management console
  • Register an account with your phone number
  • Click on API Key management
  • Click on Create New and enter a name
  • The API Key looks like sk-xxxxxxxxxxx

Here’s an example of calling Kimi k1.5:

from openai import Client
client = Client(
 api_key="YOUR_KIMI_KEY",
 base_url="https://api.moonshot.ai/v1",
)
messages = [
 {
     "role": "user",
     "content": "The lengths of the two legs of a right triangle are 3 cm and 4 cm respectively. Find the length of the hypotenuse of this right triangle.",
 },
]

This code initializes a Kimi (Moonshot AI) API client using your API key and base URL, then prepares a user message asking for the hypotenuse of a 3-4-5 right triangle. It’s ready to send this message to the Kimi API for processing.

stream = client.chat.completions.create(
 model="kimi-k1.5-preview",
 messages=messages,
 temperature=0.3,
 stream=True,
 max_tokens=8192,
)

It sends the prepared message to the Kimi API using the specified model, temperature, and token limit, and sets up a streaming response to handle potentially long outputs. It’s designed to receive a step-by-step or chunked answer from Kimi.

for chunk in stream:
 if chunk.choices[0].delta:
     if chunk.choices[0].delta.content:
         print(chunk.choices[0].delta.content, end="")

It iterates through the streamed response from the Kimi API. For each chunk of the response, it checks if there’s new text content (chunk.choices[0].delta.content). If so, it prints that text to the console, effectively displaying the model’s response in real time as it’s generated. 

Also Read: Kimi k1.5 vs OpenAI o1: Which a Better Reasoning Model?

Conclusion

Kimi k1.5 signifies a pivotal advancement in generative AI reasoning models by simplifying reinforcement learning design while achieving state-of-the-art performance across multiple domains. Its innovative approaches to scaling context length and integrating multimodal data-position it as a leading model in the field. As we move forward, the implications of such advancements will likely extend beyond academic research into practical applications across industries, fostering a new era of intelligent systems capable of complex reasoning.

Stay tuned to Analytics Vidhya Blog for more such awesome content!

Harsh Mishra is an AI/ML Engineer who spends more time talking to Large Language Models than actual humans. Passionate about GenAI, NLP, and making machines smarter (so they don’t replace him just yet). When not optimizing models, he’s probably optimizing his coffee intake. 🚀☕

Responses From Readers

Clear

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details