Last year, the DeepSeek LLM made waves with its impressive 67 billion parameters, meticulously trained on an expansive dataset of 2 trillion tokens in English and Chinese comprehension. Setting new benchmarks for research collaboration, DeepSeek ingrained the AI community by open-sourcing both its 7B/67B Base and Chat models. Now, what if I tell you there is an AI with 685 billion parameters and it outperforms almost every model in the AI space and is open source? Sounds intriguing right? DeepSeek taking a giant leap forward with the release of DeepSeek V3, developed by the Chinese Lab at DeepSeek, pushing the boundaries of AI innovation even further. It is a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.
The impressive part here is – it has achieved a remarkable cost efficiency with its training for just $5.5M!!!
Released under a permissive license, DeepSeek V3 empowers developers to download, modify, and integrate the model into diverse applications, including commercial ones. Its versatility spans a range of text-based tasks such as coding, translating, and generating essays or emails from descriptive prompts, making it a powerful tool for developers and businesses alike.
Also, DeepSeek V3 outperforms both openly available and closed AI models in several key domains. In competitive programming on Codeforces, DeepSeek V3 outshines rivals, including Meta’s Llama 3.1 405B, OpenAI’s GPT-4o, and Alibaba’s Qwen 2.5 72B. The model also excels in Aider Polyglot testing (2nd spot on the leaderboard), demonstrating an unmatched ability to generate new code that seamlessly integrates with existing projects.
The biggest leap forward yet:
Do you know, that with 685 billion parameters ( 671B of the Main Model weights and 14B of the Multi-Token Prediction (MTP) Module weights.), DeepSeek V3 can remember how many beers you chugged in 2017? Impressive right? Also, as per the creators they have spent $5.5 million to train DeepSeek V3 and if we compare this with OpenAI – OpenAI’s CEO, Sam Altman, mentioned that the training cost for GPT-4 was over $100 million. This stark contrast highlights DeepSeek V3’s remarkable cost efficiency, achieving cutting-edge performance at a fraction of the expense, making it a game-changer in the AI landscape.
Also, DeepSeek-V3 looks to be a stronger model at only 2.8M GPU-hours (~11X less compute) in comparison to Llama 3 405B which uses 30.8M GPU-hours.
DeepSeek (Chinese AI co) making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M).
— Andrej Karpathy (@karpathy) December 26, 2024
For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being… https://t.co/EW7q2pQ94B
DeepSeek V3 represents a monumental leap in AI architecture and training efficiency, pushing the boundaries of large-scale language models. This open-source model not only delivers state-of-the-art performance but does so with remarkable efficiency and scalability. Here’s what makes DeepSeek V3 a standout innovation:
DeepSeek V3 builds on proven frameworks from its predecessor, DeepSeek V2, adopting the Multi-Head Latent Attention (MLA) and the cutting-edge DeepSeekMoE architecture. These innovations ensure efficient inference and cost-effective training. Moreover, DeepSeek V3 employs an auxiliary-loss-free load-balancing strategy, eliminating the typical performance trade-offs associated with load-balancing mechanisms.
The model also integrates a Multi-Token Prediction (MTP) objective, enhancing its ability to predict multiple tokens simultaneously. This not only boosts performance but also enables speculative decoding, significantly accelerating inference speeds.
DeepSeek V3 is pre-trained on an expansive dataset of 14.8 trillion diverse, high-quality tokens ( to understand it better, 1 million tokens is around 750,000 words), a scale that far surpasses its predecessors. This pre-training is achieved using a revolutionary FP8 mixed precision training framework, marking the first successful application of FP8 in an ultra-large-scale model. The results include:
DeepSeek V3 integrates an innovative knowledge distillation pipeline, leveraging reasoning capabilities from DeepSeek R1 series models. This pipeline incorporates advanced verification and reflection patterns into the model, dramatically improving its reasoning performance. Additionally, the output style and length are meticulously controlled to ensure versatility and consistency across tasks.
Extensive evaluations confirm that DeepSeek V3 outperforms all open-source models and rivals leading closed-source AI systems. Despite its massive scale and complexity, the training process was exceptionally stable, with no irrecoverable loss spikes or rollbacks throughout the entire cycle.
DeepSeek V3 is a testament to the power of innovation and collaboration, offering developers and researchers a powerful, scalable, and cost-effective tool to tackle a wide range of challenges in AI and beyond. Its open-source nature ensures accessibility, paving the way for breakthroughs in coding, reasoning, and multi-modal applications.
Here are the links to download:
Model | Total Parameters | Context Length | Download |
DeepSeek-V3-Base | 671B | 128K | HuggingFace |
DeepSeek-V3 | 671B | 128K | HuggingFace |
This evaluation highlights DeepSeek-V3’s superior capabilities in handling complex reasoning, advanced mathematics, and competitive programming tasks.
Also here is the Open Ended Generation Evaluation:
Model | Arena-Hard | AlpacaEval 2.0 |
DeepSeek-V2.5-0905 | 76.2 | 50.5 |
Qwen2.5-72B-Instruct | 81.2 | 49.1 |
LLaMA-3.1 405B | 69.3 | 40.5 |
GPT-4o-0513 | 80.4 | 51.1 |
Claude-Sonnet-3.5-1022 | 85.2 | 52.0 |
DeepSeek-V3 | 85.5 | 70.0 |
You can also refer to this to understand the evaluation better:
Link to the DeepSeek V3 Github
Here are the Aider Polyglot Benchmark Results, which evaluate models on their ability to complete tasks correctly. The evaluation is divided into two output formats:
This ultimately reflects the versatility and specialized strengths of different AI systems in completing benchmark tasks.
If you prefer not to use the chat UI and want to directly work with the model, there’s an alternative for you. The model, DeepSeek-V3, has all its weights released on Hugging Face. You can access the SafeTensor files there.
Model Size and Hardware Requirements:
Firstly, the model is massive, with 671 billion parameters, making it challenging to run on standard consumer-grade hardware. If your hardware isn’t powerful enough, it’s recommended to use the DeepSeek platform for direct access. Wait for a Hugging Face Space if one becomes available.
If you have sufficient hardware, you can run the model locally using the DeepSeek-Infer Demo, SGLang, LMDeploy, TensorRT-LLM, vLLM, AMD GPU, Huawei Ascend NPU.
Convert the model to a quantized version to reduce memory requirements, which is particularly helpful for lower-end systems.
Here’s how you can convert FP8 weights to BF16:
Conversion script if you need bf16
cd inference
python fp8_cast_bf16.py --input-fp8-hf-path /path/to/fp8_weights --output-bf16-hf-path /path/to/bf16_weights
Hugging Face’s transformers library does not directly support the model yet. To set it up, you’ll need to:
Clone the DeepSeek AI GitHub repository:
git clone https://github.com/deepseek-ai/DeepSeek-V3.git
Install the required dependencies:
cd DeepSeek-V3/inference
pip install -r requirements.txt
Download the Hugging Face checkpoints and run the model locally.
Refer to this: DeepSeek V3
Recommendation:
You can also use llm-deepseek plugin to access the model:
# Step 1: Install the llm package
!pip install llm
# Step 2: Install the DeepSeek plugin
!llm install llm-deepseek
# Step 3: Set the DeepSeek API key
!llm keys set deepseek
# Paste your API key when prompted
# Step 4: Run the DeepSeek Chat Model
!llm -m deepseek-chat 'five great names for a pet ocelot'
# Step 5: Run the DeepSeek Coder Model
!llm -m deepseek-coder 'how to reverse a linked list in python'
On the DeepSeek platform, you can explore the web and engage in deep reasoning to solve complex problems, similar to OpenAI’s approach.
Here are some prompts and outputs:
Prompt: Generate an SVG of a kid and a football
Here’s a simple SVG representation of a kid playing with a football. You can
copy and paste this code into an HTML file or an SVG viewer to see the
image:
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 200 200" width="200"
height="200">
<!-- Kid's head -->
<circle cx="100" cy="70" r="20" fill="#FFCC99" />
<!-- Kid's body -->
<rect x="90" y="90" width="20" height="40" fill="#4CAF50" />
<!-- Kid's arms -->
<line x1="80" y1="100" x2="60" y2="110" stroke="#4CAF50" stroke-width="5" />
<line x1="120" y1="100" x2="140" y2="110" stroke="#4CAF50" stroke-width="5" />
<!-- Kid's legs -->
<line x1="95" y1="130" x2="85" y2="160" stroke="#4CAF50" stroke-width="5" />
<line x1="105" y1="130" x2="115" y2="160" stroke="#4CAF50" stroke-width="5" />
<!-- Football -->
<ellipse cx="140" cy="140" rx="10" ry="5" fill="#8B4513" />
<line x1="135" y1="140" x2="145" y2="140" stroke="#FFFFFF" stroke-width="2" />
<line x1="140" y1="137" x2="140" y2="143" stroke="#FFFFFF" stroke-width="2" />
</svg>
Explanation:
The kid is represented with simple shapes: a circle for the head, a rectangle
for the body, and lines for the arms and legs.
The football is an ellipse with two white lines to mimic the classic football
design.
You can adjust the colours, sizes, and positions to customize the image
further.
Let me know if you'd like to enhance or modify this!
Prompt: Create the classic snake game
Creating the classic Snake game involves using HTML, CSS, and JavaScript
. Below is a simple implementation:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Snake Game</title>
<style>
body {
display: flex;
justify-content: center;
align-items: center;
height: 100vh;
margin: 0;
background-color: #000;
}
canvas {
border: 1px solid #fff;
}
</style>
</head>
<body>
<canvas id="gameCanvas" width="400" height="400"></canvas>
<script src="snake.js"></script>
</body>
</html>
JavaScript (snake.js)
const canvas = document.getElementById("gameCanvas");
const ctx = canvas.getContext("2d");
const gridSize = 20;
const tileCount = canvas.width / gridSize;
let snake = [{ x: 10, y: 10 }];
let direction = { x: 0, y: 0 };
let food = { x: Math.floor(Math.random() * tileCount), y:
Math.floor(Math.random() * tileCount) };
let score = 0;
function gameLoop() {
update();
draw();
setTimeout(gameLoop, 100);
}
function update() {
const head = { x: snake[0].x + direction.x, y: snake[0].y + direction.y };
// Check for collision with walls or itself
if (head.x < 0 || head.x >= tileCount || head.y < 0 || head.y >= tileCount
|| snake.some(segment => segment.x === head.x && segment.y === head.y)) {
resetGame();
return;
}
snake.unshift(head);
// Check if snake eats the food
if (head.x === food.x && head.y === food.y) {
score++;
food = { x: Math.floor(Math.random() * tileCount), y:
Math.floor(Math.random() * tileCount) };
} else {
snake.pop();
}
}
function draw() {
// Clear the canvas
ctx.fillStyle = "#000";
ctx.fillRect(0, 0, canvas.width, canvas.height);
// Draw the snake
ctx.fillStyle = "#0F0";
snake.forEach(segment => ctx.fillRect(segment.x * gridSize, segment.y *
gridSize, gridSize, gridSize));
// Draw the food
ctx.fillStyle = "#F00";
ctx.fillRect(food.x * gridSize, food.y * gridSize, gridSize, gridSize);
// Draw the score
ctx.fillStyle = "#FFF";
ctx.font = "20px Arial";
ctx.fillText("Score: " + score, 10, 30);
}
function resetGame() {
snake = [{ x: 10, y: 10 }];
direction = { x: 0, y: 0 };
score = 0;
food = { x: Math.floor(Math.random() * tileCount), y:
Math.floor(Math.random() * tileCount) };
}
window.addEventListener("keydown", e => {
switch (e.key) {
case "ArrowUp":
if (direction.y === 0) direction = { x: 0, y: -1 };
break;
case "ArrowDown":
if (direction.y === 0) direction = { x: 0, y: 1 };
break;
case "ArrowLeft":
if (direction.x === 0) direction = { x: -1, y: 0 };
break;
case "ArrowRight":
if (direction.x === 0) direction = { x: 1, y: 0 };
break;
}
});
gameLoop();
How It Works:
Canvas Setup: The game is drawn on a <canvas> element.
Snake Movement: The snake is represented as an array of segments. The head
moves based on the direction, and the tail follows.
Food: A random position is generated for the food. When the snake eats the
food, it grows, and the score increases.
Collision Detection: The game checks if the snake collides with the walls or
itself, and resets if it does.
Controls: Arrow keys are used to change the snake's direction.
How to Run:
Save the HTML and JavaScript code into files (index.html and snake.js).
Open the index.html file in a browser.
Use the arrow keys to control the snake.
Let me know if you'd like to add more features or improve the game!
Prompt: How to clone the git repository?
Firstly, it thinks for 6 seconds and then gives the output
Cloning a Git repository involves creating a copy of an existing repository
from a remote location (such as GitHub, GitLab, or Bitbucket) to your local
machine. This copy includes all the project files, commit history, and
branches.
Here are the steps to clone a Git repository:
1. Install Git
Ensure that Git is installed on your machine. If it's not installed, download
and install it from the official Git website.
2. Open Terminal or Command Prompt
Open your terminal (Linux/Mac) or command prompt (Windows).
3. Navigate to the Desired Directory
Use the cd command to navigate to the directory where you want to clone the
repository.
bash
Copy
cd /path/to/directory
4. Clone the Repository
Use the git clone command followed by the URL of the repository.
bash
Copy
git clone https://github.com/username/repository.git
Replace https://github.com/username/repository.git with the actual URL of the
repository you want to clone.
5. Enter the Cloned Directory
After cloning, navigate into the repository directory.
bash
Copy
cd repository
6. Verify the Clone
You can verify that the repository was cloned correctly by checking the
remote URL.
bash
Copy
git remote -v
This should display the URL of the remote repository.
Example
bash
Copy
# Navigate to the desired directory
cd ~/Documents
# Clone the repository
git clone https://github.com/username/repository.git
# Enter the repository directory
cd repository
# Verify the remote URL
git remote -v
Optional: Cloning Using SSH
If you have set up SSH keys, you can clone the repository using an SSH URL:
bash
Copy
git clone [email protected]:username/repository.git
Optional: Shallow Clone
If you only need a part of the history (e.g., the latest commit), you can
perform a shallow clone:
bash
Copy
git clone --depth 1 https://github.com/username/repository.git
This is useful for large repositories where you don't need the full history.
Conclusion
You have now successfully cloned a Git repository to your local machine. You
can start working on the project, make changes, and commit them as needed.
All the outputs are up to the mark, and the output speed was quite impressive. Additionally, DeepSeek allows you to ask reasoning questions to the model, making it a versatile and efficient tool for complex problem-solving and in-depth analysis.
We will give you a full-fledged comparison of this model in the upcoming articles!
DeepSeek V3 stands as a monumental achievement in the evolution of large-scale AI models, combining unprecedented scale with unmatched efficiency. With its innovative architecture, cost-effective training, and impressive 685 billion parameters, DeepSeek V3 redefines what’s possible in the AI space. The model’s ability to excel in diverse benchmarks, outperforming both open-source and closed-source competitors, highlights its extraordinary capabilities.
Not only does DeepSeek V3 deliver state-of-the-art performance in tasks like coding, reasoning, and mathematical problem-solving, but it also democratizes access to cutting-edge AI with its open-source availability. Developers, researchers, and businesses alike can leverage its immense power, supported by a permissive license that fosters innovation and collaboration.
By achieving exceptional results with a training cost of just $5.5 million, DeepSeek V3 proves that scalability and efficiency can coexist, setting a new standard for the future of AI development. This release marks a significant leap forward, not just for DeepSeek, but for the entire AI community, paving the way for breakthroughs in machine learning, natural language processing, and beyond.