How DeepSeek Trained AI 30 Times Cheaper?

Himanshi Singh Last Updated : 28 Jan, 2025
5 min read

DeepSeek is everywhere right now – on Twitter, LinkedIn, and in conversations across the AI world. People can’t stop talking about how this company managed to do the “impossible.”   While AI training is usually expensive and resource-hungry, DeepSeek found a way to train their models at just 1/30th the usual cost. These days, everything claims to be “state-of-the-art,” but DeepSeek is proving that being the “best” isn’t enough anymore. It’s about pushing boundaries and achieving what others thought was impossible.

What’s adding to the hype? DeepSeek app has gone viral. It’s not just performing well – it’s sitting at the top of app store charts, beating even big names like ChatGPT. This viral image has been circulating all over the internet:

DeepSeek App
Source: Apple App Store

So, how did DeepSeek pull this off? Let’s break down their secret in the simplest way possible.

1. No Fancy Chips, Just Smart Optimizations

Many assumed that export restrictions from the US on advanced AI chips would limit DeepSeek’s capabilities. However, they proved that great software can compensate for hardware limitations. Instead of relying on the latest high-end GPUs like the NVIDIA H100, they optimized the hardware they had—likely the NVIDIA H800, which has lower chip-to-chip bandwidth.

DeepSeek engineers focused on low-level code optimizations to make memory usage as efficient as possible. Their improvements ensured that performance was not hindered by chip limitations. In essence, they maximized what they had instead of waiting for better hardware.

Key takeaway: They didn’t bypass restrictions; they simply made their existing resources work smarter.

In short: No need for expensive hardware—just efficient software.

2. Training Only the Important Parts

Training AI models usually involves updating everything, even parts that don’t contribute much. This leads to a massive waste of resources. DeepSeek tackled this problem head-on by training only the necessary parts of the model.

Using a technique called Auxiliary-Loss-Free Load Balancing, they ensured that only the most relevant parts (experts) of the model were activated and updated. Instead of depending on additional loss functions to balance workload, they introduced a bias term that helps dynamically distribute tasks to the right parts of the model.

How it Works?

  • Each token (piece of text) is sent to a small set of experts, instead of engaging the entire model.
  • The system monitors workload and adjusts the bias term to prevent some experts from being overloaded while others remain underutilized.
  • This dynamic adjustment allows for efficient resource usage without extra computational overhead.

Results

  • Only 5% of the model’s parameters were trained per token.
  • This led to a 95% reduction in GPU usage compared to companies like Meta.
  • Faster training at significantly lower costs, without losing accuracy.

In short: Train only what’s needed, save big on costs.

3. Faster and Cheaper AI with Compression

Running AI models, especially inference (when generating outputs), is memory-intensive and costly. DeepSeek overcame this by using an innovative technique called Low-Rank Key-Value (KV) Joint Compression.

The KV cache stores key-value pairs crucial for attention mechanisms, but storing them at full capacity takes up a lot of memory. DeepSeek found a way to compress these key-value pairs efficiently, reducing storage without sacrificing performance.

How it Works?

  • The model compresses key and value vectors using a down-projection matrix, reducing their size while preserving essential information.
  • During inference, only the compressed version is stored, significantly reducing memory requirements.
  • When needed, the compressed data is expanded back with minimal loss of accuracy.

Benefits

  • Lower memory usage: DeepSeek stores a much smaller amount of data without losing performance.
  • Faster inference: Less data to process means quicker responses.
  • Reduced costs: Less hardware is required to run the model efficiently.

In short: Smaller memory, faster results, lower costs.

4. Smarter Learning with Reinforcement Learning

DeepSeek also improved model learning efficiency through reinforcement learning. Instead of relying solely on traditional training methods, they focused on tasks that have clear, verifiable answers, such as math and coding problems.

How it Works?

  • The AI is given complex, easily verifiable tasks (e.g., coding challenges).
  • If the model produces the correct result, it is rewarded and learns to reinforce those patterns.
  • If it makes mistakes, adjustments are made to improve performance in future iterations.

This method allowed DeepSeek to improve accuracy with fewer resources by focusing only on challenges that provided immediate, measurable feedback.

In short: Smarter training through trial and error.

Also Read: How is DeepSeek Making Money?

Why is DeepSeek a Big Deal?

DeepSeek’s success comes down to three powerful yet straightforward ideas:

  • Training only what matters: Focusing on the most important parts of the model to reduce computation.
  • Smart memory compression: Using less storage without losing performance.
  • Efficient hardware use: Getting the most out of available resources instead of relying on cutting-edge chips.

These strategies didn’t just cut costs—they gave DeepSeek the ability to test, experiment, and innovate faster than their competitors.

What makes their story so compelling is that it’s not about having unlimited resources. It’s about making the best use of what’s available. DeepSeek has proven that groundbreaking AI doesn’t have to come with an outrageous price tag. Their approach is a blueprint for how companies can think smarter, not harder, when it comes to AI. By focusing on efficiency, they’ve opened the door for others to rethink how AI models are trained and deployed.

As AI continues to evolve, DeepSeek has demonstrated that efficiency isn’t just important—it’s the real game-changer.

Checkout our detailed articles on DeepSeek working and comparison with similar models:

Stay tuned to Analytics Vidhya Blog for more such awesome content!

I’m a data lover who enjoys finding hidden patterns and turning them into useful insights. As the Manager - Content and Growth at Analytics Vidhya, I help data enthusiasts learn, share, and grow together. 

Thanks for stopping by my profile - hope you found something you liked :)

Responses From Readers

Clear

Danny
Danny

In your article (https://www.analyticsvidhya.com/blog/2025/01/how-deepseek-trained-ai-30-times-cheaper/) there are multiple citations that are incorrect: The sections for the training methods of 'Auxiliary-Loss-Free Load Balancing' and 'Key-Value Joint Compression' both cite as their source 'DeepSeek research paper' (https://arxiv.org/pdf/2501.12948) but these cited sections do not appear anywhere in the linked research paper. Hopefully there is a corrected link you can provide for those citations.

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details