Mistral AI has launched its latest and most efficient small language model (SLM) – Mistral Small 3. It’s a 24-billion-parameter language model designed for high efficiency and low latency. The model aims to deliver robust performance across various AI tasks while maintaining rapid response times. Here’s all you need to know about Mistral Small 3 – its features, applications, how to access it, and how it compares with Qwen2.5, Llama-3.3, and more.
Mistral Small 3 is a latency-optimized language model that balances performance and efficiency. Despite its 24B parameter size, it competes with larger models like Llama 3.3 70B Instruct and Qwen2.5 32B Instruct, offering comparable capabilities with significantly reduced computational demands.
Small 3, launched as a base model, allows developers train it further, using reinforcement learning or reinforcement fine tuning. It features a 32,000 tokens context window and generates responses at 150 tokens per second processing speed. This design makes it suitable for applications requiring swift and accurate language processing.
Mistral Small 3 has been evaluated across several key benchmarks to assess its performance in various domains. Let’s see how this new model has performed against gpt-4o-mini, Llama 3.3 70B Instruct, Qwen2.5 32B Instruct, and Gemma 2 27b.
Also Read: Phi 4 vs GPT 4o-mini: Which is Better?
1. Massive Multitask Language Understanding (MMLU) Pro (5-shot)
The MMLU benchmark evaluates a model’s proficiency across a wide range of subjects, including humanities, sciences, and mathematics, at an undergraduate level. In the 5-shot setting, where the model is provided with five examples before being tested, Mistral Small 3 achieved an accuracy exceeding 81%. This performance is notable, especially considering that Mistral 7B Instruct, an earlier model, scored 60.1% in a similar 5-shot scenario.
2. General Purpose Question Answering (GPQA) Main
GPQA assesses a model’s ability to answer a broad spectrum of questions that require general world knowledge and reasoning. Mistral Small 3 outperformed Qwen2.5-32B-Instruct, gpt-4o-mini, and Gemma-2 in GPQA, proving its strong capability in handling diverse question-answering tasks.
3. HumanEval
The HumanEval benchmark measures a model’s coding abilities by requiring it to generate correct code solutions for a given set of programming problems. Mistral Small 3’s performance in this test is almost as good as Llama-3.3-70B-Instruct.
4. Math Instruct
Math Instruct evaluates a model’s proficiency in solving mathematical problems and following mathematical instructions. Despite it’s small size and design, Mistral Small 3 shows promising results in this test as well.
Mistral Small 3 demonstrated performance on par with larger models such as Llama 3.3 70B instruct, while being more than three times faster on the same hardware. It outperformed most models, particularly in language understanding and reasoning tasks. These results show Mistral Small 3 to be a competitive model in the landscape of AI language models.
Also Read: Qwen2.5-VL Vision Model: Features, Applications, and More
Mistral Small 3 is versatile and well-suited for various applications, such as:
Here are some real-life use cases of Mistral Small 3 across industries:
Mistral Small 3 is available under the Apache 2.0 license, allowing developers to integrate and customize the model within their applications. As per official reports, the model can be downloaded from Mistral AI’s official website or accessed through the following platforms:
Here’s how you can access and utilize the Mistral-Small-24B model on Kaggle:
First install Kagglehub.
pip install kagglehub
Then put in this code to get started.
from transformers import AutoModelForCausalLM, AutoTokenizer
import kagglehub
model_name = kagglehub.model_download("mistral-ai/mistral-small-24b/transformers/mistral-small-24b-base-2501")
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Give me a short introduction to Mistral- AI company"
# Tokenize the input
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
# Generate text
generation_output = model.generate(**inputs,
max_new_tokens=100,
temperature=0.7, # Controls randomness (higher = more random)
top_p=0.9, # Nucleus sampling (higher = more diverse)
do_sample=True) # Enables sampling
# Decode the generated output
generated_text = tokenizer.decode(generation_output[0], skip_special_tokens=True)
print("Generated Text (Base Model):")
print(generated_text)
You can integrate the Small 3 model into your existing applications using Together AI’s OpenAI-compatible APIs. Additionally, Mistral AI offers deployment options via La Plateforme, providing market-leading availability, speed, and quality control.
Mistral AI also has plans of launching it soon on NVIDIA NIM, Amazon SageMaker, Groq, Databricks and Snowflake.
Now, let’s try out the capabilities of this new Mistral model. For this, we’ll try out 3 different prompts on Mistral Small 3 on the together Playground and see what responses we get.
Prompt: “Given a dataset containing customer transactions with columns ‘CustomerID’, ‘ProductID’, ‘PurchaseAmount’, and ‘PurchaseDate’, write a Python function to:
1. Calculate the total spend for each customer.
2. Find the customer who spent the most.
3. Determine the most popular product based on the total sales amount.
You can assume that the data is a Pandas DataFrame.”
Output:
import pandas as pd
def analyze_customer_transactions(df):
# Calculate the total spend for each customer
total_spend_per_customer = df.groupby('CustomerID')['PurchaseAmount'].sum().reset_index()
total_spend_per_customer.columns = ['CustomerID', 'TotalSpend']
# Find the customer who spent the most
max_spender = total_spend_per_customer.loc[total_spend_per_customer['TotalSpend'].idxmax()]
max_spender_customer_id = max_spender['CustomerID']
max_spender_amount = max_spender['TotalSpend']
# Determine the most popular product based on the total sales amount
total_sales_per_product = df.groupby('ProductID')['PurchaseAmount'].sum().reset_index()
total_sales_per_product.columns = ['ProductID', 'TotalSales']
most_popular_product = total_sales_per_product.loc[total_sales_per_product['TotalSales'].idxmax()]
most_popular_product_id = most_popular_product['ProductID']
most_popular_product_sales = most_popular_product['TotalSales']
return total_spend_per_customer, max_spender_customer_id, max_spender_amount, most_popular_product_id, most_popular_product_sales
# Example usage:
# df = pd.DataFrame({
# 'CustomerID': [1, 2, 1, 3, 2, 1],
# 'ProductID': [101, 102, 101, 103, 102, 101],
# 'PurchaseAmount': [100, 150, 200, 50, 250, 300],
# 'PurchaseDate': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06']
# })
# total
Prompt: “Solve this problem step by step: A tank can be filled by two pipes in 10 minutes and 15 minutes, respectively. If both pipes are opened together, how long will it take to fill the tank?”
Output:
Prompt: “Analyze the sentiment of the following customer review:
‘I was really excited about this product, but it broke within two days. Customer service was unhelpful, and I regret my purchase.'”
Output:
Mistral Small 3 represents a significant advancement in AI model development, offering a blend of efficiency, speed, and performance. Its size and latency makes it suitable for deployment on devices with limited computational resources, such as a single RTX 4090 GPU or a MacBook with 32GB RAM. Moreover, its open-source availability under the Apache 2.0 license encourages widespread adoption and customization. On the whole, Mistral Small 3, seems to be a valuable tool for developers and organizations aiming to implement high-performance AI solutions with reduced computational overhead.
A. Mistral Small 3 is a 24-billion-parameter language model optimized for low-latency, high-efficiency AI tasks.
A. Mistral Small 3 competes with larger models like Llama 3.3 70B Instruct and Qwen2.5 32B Instruct, offering similar performance but with significantly lower computational requirements.
A. You can access Mistral Small 3 through:
– Mistral AI’s official website (for downloading the model).
– Platforms like Hugging Face, Together AI, Ollama, Kaggle, and Fireworks AI (for cloud-based usage).
– La Plateforme by Mistral AI for enterprise-grade deployment.
– APIs from Together AI and other providers for seamless integration.
A. Here are the key features of Mistral Small 3:
– 32,000-token context window for handling long conversations.
– 150 tokens per second processing speed.
– Multilingual support (English, French, Spanish, German, Chinese, etc.).
– Function calling and JSON output support for structured AI applications.
– Optimized for low-latency inference on consumer GPUs.
A. Here are some real-life use cases of Mistral Small 3:
– Fraud detection in financial services.
– AI-driven patient triage in healthcare.
– On-device command and control in robotics, automotive, and manufacturing.
– Virtual customer service assistants for businesses.
– Sentiment and feedback analysis for brand reputation monitoring.
– Automated quality control in industrial applications.
A. Yes, Small 3 can be fine-tuned using reinforcement learning or reinforcement fine-tuning to adapt it for specific industries or tasks. It is released under the Apache 2.0 license, allowing free usage, modification, and commercial applications without major restrictions.