The race for dominance in code-focused language models is heating up, and Hugging Face has entered the arena with a strong contender: OlympicCoder-7B, a part of its Open-R1 initiative. Designed to excel at competitive programming, the model is fine-tuned using a Chain-of-Thought-enhanced Codeforces dataset. Remarkably, it has already shown impressive results, outperforming Claude 3.7 Sonnet on the IOI benchmark. But does this mean Hugging Face’s 7B model truly beats Claude 3.7? In this blog, we’ll examine the benchmark scores of OlympicCoder-7B, explore the reasoning architecture behind the model, and demonstrate how to use it.
Hugging Face runs a community-driven project called the Open-R1 initiative – aimed at building open, high-quality reasoning models. This initiative has led to the development of two code-specialized models:
OlympicCoder-7B is built on Qwen2.5-Coder-7B-Instruct, an open-source model from Alibaba Cloud. What sets it apart is its fine-tuning using the CodeForces-CoTs dataset, which includes thousands of competitive programming problems from Codeforces. The addition of Chain-of-Thought (CoT) reasoning makes the model even better, allowing it to break down complex problems into logical steps. This helps the model go beyond syntactic code generation to actual logical problem-solving.
Constructing the CodeForces Dataset for OlymicCoder-7 B involved distilling nearly 100,000 high-quality samples using R1 (another initiative model). Each sample includes a problem statement, a thought process, and a verified solution in both C++ and Python. This dual-language setup ensures model robustness and adaptability across coding environments. This dataset wasn’t just a simple scrape of Codeforces; instead, it was designed to reflect how expert human coders think and write code.
A major issue in training and evaluating code models is code verifiability. Many existing datasets contain unverified or incorrect code, which can confuse models during training. To combat this, Hugging Face applied a rigorous filtering process in CodeForces-CoTs, ensuring only working, high-quality samples were used.
OlymipicCoder-7B was evaluated on the IOI Benchmark. Inspired by the International Olympiad in Informatics (IOI), this benchmark tests the model’s ability to handle real-world competitive programming problems. It emphasizes logical reasoning, constraint satisfaction, and optimality.
This chart visualizes the performance of ten different models on the 2024 IOI benchmark. The final score reflects how well each model performed on 50 competitive programming tasks. Here’s how well OlympicCoder performed on this benchmark:
This performance affirms OlympicCoder-7B’s capability as a strong reasoning model in the open-source domain.
Now that we are familiar with Hugging Face’s OlympicCoder, let’s test it out on Google Colab.
Before we get started, we need to have a Hugging Face access token. Here’s how to get one.
Now that we have the access token, let’s open a jupyter environment and get started. Make sure to set the runtime type to T4 GPU.
First, you need to install the transformers and accelerate libraries from PyPI (Python Package Index).
!pip install transformers accelerate
Add your access token to Colab secrets or run this command to add your access token.
!huggingface-cli login
Import the necessary libraries.
import torch
from transformers import pipeline
The model gets downloaded in 4 shards and is approximately 15 GB in size.
pipe = pipeline("text-generation", model="open-r1/OlympicCoder-7B", torch_dtype=torch.bfloat16, device_map="auto")
Let’s prompt the model to generate prime numbers up to 100 by including the prompt in the messages list with the role set to “user.” Additionally, you can choose to add a system prompt, such as “You are a C++ Developer,” to guide the model’s behavior.
messages = [
{"role": "user", "content": "Write a Python program \
that prints prime numbers upto 100"}]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=8000, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
I just copy-pasted the Python code generated by the model and got all the prime numbers as output.
It’s worth noting that it takes a while to get the outputs. Unfortunately, I couldn’t test the model with more prompts as it takes a lot of time to generate outputs in Colab.
If you have powerful hardware and GPU on your computer, you can try running OlympicCoder-7b on the LM Studio application. LM Studio is an application that lets you run LLMs locally on your machine. So first, let’s follow these steps and download LM Studio to start using these models.
1. Go to the LM Studio website: https://lmstudio.ai/
2. Download the application according to your operating system.
3. Search for the OlympicCoder-7B and download the model locally. (4.68 GB)
Note: Due to hardware limitations on my machine, I won’t be running inference using LM Studio.
Hugging Face has shared several lessons from training the OlympicCoder that could benefit the broader AI community:
These insights are valuable for anyone interested in building or fine-tuning code reasoning models.
Hugging Face has also been advancing the Open-R1 ecosystem with exciting developments:
Here are some practical scenarios where OlympicCoder-7B excels:
Working with OlympicCoder-7B was an insightful experience. Setting it up via Google Colab was straightforward, though inference speed was limited by hardware constraints. The model generated well-reasoned, accurate code, often accompanied by comments or explanations. The use of a chain of thought was visible in how the model tackled problem statements step by step. I found its ability to produce both functional code and logical breakdowns particularly helpful when working on algorithmic prompts.
I also explored its local deployment through LM Studio, though hardware limitations on my machine prevented full testing. Still, the experience affirmed that OlympicCoder is ready for local experimentation and integration into advanced workflows for those with the right hardware.
OlympicCoder-7B, as part of Hugging Face’s Open-R1 initiative, represents a major step toward open, powerful code reasoning models. Its strong showing on the IOI benchmark, robust dataset training using CoT strategies, and real-world applicability make it a valuable tool for developers, researchers, educators, and competitive programmers alike.
It bridges the gap between code generation and problem-solving, offering not just outputs, but insight. With further community support and continued updates, OlympicCoder has the potential to become a foundational model for code reasoning in the open-source AI ecosystem.
OlympicCoder-7B, as part of Hugging Face’s Open-R1 initiative, represents a major step toward open, powerful code reasoning models. Its performance on IOI benchmarks, innovative dataset design, and deep CoT reasoning make it a compelling tool for developers, students, and researchers alike.
A. The IOI benchmark measures a model’s ability to solve competitive programming problems, often used to evaluate reasoning and coding capabilities.
A. Qwen is a series of large language models developed by Alibaba Cloud, including specialized versions for coding, mathematics, and other tasks.
A. OlympicCoder-32B was fine-tuned from Qwen/Qwen2.5-Coder-32B-Instruct.
A. It is the dataset used for training the OlympicCoder-7B model, comprising decontaminated Codeforces data with Chain-of-Thought (CoT) reasoning.