Are 100K+ GPUs for Grok 3 worth it?

Anu Madan Last Updated : 21 Feb, 2025
9 min read

With 3.3M+ people watching the launch, Elon Musk and his team introduced the world to “Grok 3”, the most capable and powerful model by x.AI to date. The company that started in 2023 and got its last model (Grok 2) out in 2024, is now challenging models by top companies like OpenAI, Google, and Meta that have been in the AI race for the last 5-7 years. All thanks to over 100K H100 NVIDIA GPUs! But DeepSeek, which also started its work in 2023, achieved o3-mini level capabilities with just a fraction of GPUs that Grok 3 did! In this blog, we will explore if Grok 3 is worth utilizing 100K+ H100 NVIDIA GPUs.

What is NVIDIA H100 GPU?

The NVIDIA H100 GPU is a high-performance processor built for AI training, inference, and high-performance computing (HPC). Being a successor to A100, it delivers faster processing, better efficiency, and improved scalability, making it a critical tool for modern AI applications. It is used by AI companies and research institutions, including OpenAI, Google, Meta, Tesla, and AWS, who rely on the NVIDIA H100 for developing cutting-edge AI solutions.

Also Read: Intel’s Gaudi 3: Setting New Standards with 40% Faster AI Acceleration than Nvidia H100

Why Do AI Companies Need It?

There are several reasons why major tech and AI companies around the world are investing in the NVIDIA H100 Chips:

Why do AI companies need H100 NVIDIA GPUs
  1. AI Training & Inference: The H100 is behind many advanced AI models like GPT-4, Grok 3, and Gemini, as it minimizes training time and improves inference performance.
  2. High-Speed Processing: Equipped with 80GB of HBM3 memory and a 3 TB/s bandwidth, along with NVLink (900 GB/s), the H100 ensures rapid data movement and seamless multi-GPU operations.
  3. Optimized for AI: Featuring FP8 & TF32 precision with its Transformer Engine, it accelerates deep learning tasks while maintaining efficiency and accuracy.
  4. Cloud & HPC Applications: Widely used by cloud providers such as AWS, Google Cloud, and Microsoft Azure, the H100 supports large-scale AI workloads and enterprise applications.
  5. Cost & Energy Efficiency: Built for high performance per watt, it reduces operational costs while maximizing computational power, making it a sustainable choice for AI infrastructure.

What Can 100K H100 GPUs Do?

100,000 H100 GPUs can break down massive problems (like training sophisticated AI models or running complex simulations) into many small tasks, and work on them all at once. This extraordinary parallel processing power means tasks that would normally take a very long time can be completed incredibly fast.

Imagine a simple task that takes 10 days to complete on a single H100 GPU. Now, let’s convert 10 days to seconds:

10 days ≈ 10 × 24 × 3600 = 864,000 seconds

If the task scales perfectly, with 100,000 GPUs the time required would be:

Time = 864,000 seconds ÷ 100,000 = 8.64 seconds

So a job that would have taken 10 days on one GPU could, in theory, be completed in less than 10 seconds with 100K GPUs working together!

Why Did Grok 3 Need 100K H100?

Grok 3 is a successor to Grok 2, a model that did come with features like image generation on top of text. However, as a whole, it was subpar when compared to top models by OpenAI, Google, and Meta. That is why for Grok 3, Elon Musk’s x.AI wanted to catch up or in fact beat all the existing competitors in the field. That is why x.AI went big! They created a data center consisting of over 100K GPUs and expanded it further to 200K GPUs. That is why, in less than a year, they have been able to create Grok 3 – a model capable of advanced reasoning, enhanced thinking as well as deep research.

The performance difference between Grok 3 to Grok 2 is a clear indicates this leap.

Benchmark Grok 2 mini (High) Grok 3 (mini)
Math (AIME2 ’24) 72 80
Science (GPOA) 68 78
Coding (LCB Oct–Feb) 72 80
Grok 2 vs Grok 3 Performance

Almost a 10-point jump across all major benchmarks including Math, Science, and Coding! Impressive right? But is it impressive enough for the computing power of 100K H100 GPUs?

Also Read: Grok 3 is Here! And What It Can Do Will Blow Your Mind!

Grok 3 Comparison with DeepSeek-R1

When DeepSeek-R1 was launched, it took the world by storm! All major AI companies could feel the heat due to their falling stock prices and decreasing user base as people flocked towards the open source marvel that challenged OpenAI’s best of the best! But to do this, did DeepSeek-R1 use 100K GPUs?

Well, not even a fraction of it! DeepSeek-R1 has been fine-tuned on top of the DeepSeek-V3 base model. DeepSeek-V3 has been trained on just 2048 NVIDIA H800 GPUs. (H800 GPUs are a China-specific variant of NVIDIA’s H100 GPUs, designed to comply with U.S. export restrictions with a smaller inference time). This essentially means that DeepSeek-R1 has been trained using just 2% of the computation compared to Grok 3.

As per the benchmarks, Grok 3 is significantly better than DeepSeek-R1 across all major fronts.

Grok 3 vs DeepSeek-R1 Performance

But is it true? Is Grok 3 truly better than DeepSeek-R1 and the rest of the other models as the benchmarks claim? Were 100K H100 GPUs really worth it?

Also Read: Grok 3 vs DeepSeek R1: Which is Better?

Value Check: Grok 3 vs Other Leading Models

We will test Grok 3 against the top models including o1, DeepSeek-R1, and Gemini models for various tasks to see how it performs. To do this I will compare Grok 3 with a different model in each test, based on the outputs I receive from the two models. I will be evaluating the models on three different tasks:

  1. Deep Search
  2. Advanced Reasoning
  3. Image Analysis

I will then select the one that I find better based on the outputs. 

Models: Grok 3 and Gemini 1.5 Pro with Deep Research

Prompt: “Give me a detailed report on the latest LLMs comparing them on all the available benchmarks.”

Results:

By Grok 3:

Report

By Gemini 1.5 Pro with Deep Search:

Report

Review:

Criteria Grok 3 (Deep Research) Gemini 1.5 Pro with Deep Search Which is Better?
Coverage of LLMs Focuses on 5 models (Grok 3, GPT-4o, Claude 3.5, DeepSeek-R1, and Gemini 2.0 Pro). Covers a wider range of models, including Grok 3, GPT-4o, Gemini Flash 2.0, Mistral, Mixtral, Llama 3, Command R+, and others. Gemini
Benchmark Variety Math (AIME, MATH-500), Science (GPQA), Coding (HumanEval), and Chatbot Arena ELO score. Includes all major benchmarks + multilingual, tool use and general reasoning, Gemini
Depth of Performance Analysis Detailed benchmark-specific scores but lacks efficiency and deployment insights. Provides broader performance analysis, covering both raw scores and real-world usability. Gemini
Efficiency Metrics (Context, Cost, Latency, etc.) Not covered. Includes API pricing, context window size, and inference latency. Gemini
Real-World Applications Focuses only on benchmark numbers. Covers practical use cases like AI assistants, business productivity, and enterprise tools. Gemini

Clearly, on each criterion, the report generated by Gemini 1.5 Pro Deep Search was better, more inclusive,, and more comprehensive of all the details around LLM benchmarks. 

Test 2: Advanced Reasoning

Models: Grok 3 and o1

Prompt: “If a wormhole and a black hole suddenly come near Earth from two opposing sides, what would happen?”

Results:

Response by Grok 3:

Is 100K+ GPUs for Grok 3 worth it? | output by Grok 3

Response by o1:

Is 100K+ GPUs worth it? | output by o1

Review:

Criteria Grok 3 (Think) o1 Which is Better?
Black Hole Effects Simplified explanation, focusing on event horizon and spaghettification. Detailed explanation of tidal forces, orbital disruption, and radiation. o1
Wormhole Effects Briefly mentions stability and travel potential. Discusses stability, gravitational influence, and theoretical properties. o1
Gravitational Impact on Earth Mentions gravitational pull but lacks in-depth analysis. Explains how the black hole dominates with stronger tidal forces. o1
Interplay Between Both Speculates about a possible link between the black hole and wormhole. Describes gravitational tug-of-war and possible wormhole collapse. o1
Potential for Earth’s Survival Suggests the wormhole could be an escape route but is highly speculative. Clearly states that survival is highly unlikely due to black hole’s forces. o1
Scientific Depth More general and practical, less detailed on physics. Provides a structured, theoretical discussion on spacetime effects. o1
Conclusion Black hole dominates, and wormhole adds minor chaos. Earth is destroyed by black hole forces. Wormhole’s role is uncertain. o1

The result generated by o1 is better as it is more detailed, scientific, and well-structured compared to the result given by Grok 3.

Also Read: Grok 3 vs o3-mini: Which Model is Better?

Test 3: Image Analysis

Models: Grok 3 and DeepSeek-R1

Prompt: “What is the win probability of each team based on the image?”

100K+ H100 NVIDIA GPUs for Grok 3

Results:

Response by Grok 3:

output by Grok 3

Response by DeepSeek-R1:

output by DeepSeek-R1

Review:

Criteria Grok 3 DeepSeek-R1 Which is Better?
Win Probability (Afghanistan) 55-60% 70% DeepSeek-R1
Win Probability (Pakistan) 40-45% 30% Grok 3
Key Factors Considered Includes historical trends, required run rate, team strengths, and pitch conditions. Focuses on the final-over situation (9 runs needed, 2 wickets left). Grok 3
Assumptions Made Considers Pakistan’s ability to chase 316 and Afghanistan’s bowling attack. Assumes Afghanistan will successfully chase the target. Grok 3
Overall Conclusion Afghanistan has a slight edge, but Pakistan has a reasonable chance depending on their chase. Afghanistan is in a strong position, and Pakistan needs quick wickets. Grok 3

Although the result given by DeepSeek-R1 was more accurate, Grok 3 gave a brilliant assessment of the match based on the image.

Final Result: Grok 3 lost in 2 out of 3 tasks when pitied against its competitors.

100K H100 GPUs: Was It Worth It?

Now that we’ve seen how Grok 3 performs against competitors in various tasks, the real question remains: Was the massive investment in over 100K H100 GPUs justified?

While Grok 3 has demonstrated significant improvements over its predecessor and outperforms some models in specific areas, it consistently fails to dominate across the board. Other models, such as DeepSeek-R1 and OpenAI’s o1, achieved similar or superior results while utilizing significantly fewer computational resources.

Energy Utilization

Beyond the financial investment, powering and cooling a data center with 100K+ H100 GPUs comes with a massive energy burden. Each H100 GPU consumes up to 700W of power under full load. That means:

  • 100K GPUs x 700W = 70 megawatts (MW) of power consumption at peak usage.
  • That’s roughly equivalent to the electricity consumption of a small city!
  • Factor in cooling requirements and the total energy consumption increases significantly.

Grok 3’s energy-intensive approach may not be the most sustainable. OpenAI & Google are now focussing on smaller, more efficient architectures and energy-optimized training techniques, while x.AI has chosen brute-force computation.

Scalability and Efficiency Considerations

Training AI models at scale is an expensive endeavor—not just in terms of hardware but also power consumption and operational costs.

By comparison, companies like OpenAI and Google optimize their training pipelines by employing mixture-of-experts (MoE) models, retrieval-augmented generation (RAG), and fine-tuning techniques to maximize efficiency while minimizing compute costs.

Meanwhile, open-source communities are demonstrating that high-quality AI models can be built with significantly lower resources. DeepSeek-R1 challenging industry leaders while being trained on just 2,048 H800 GPUs, is a prime example of this.

Hence, the development of a model like Grok 3 raises major concerns:

  • Can x.AI sustain the financial and environmental costs of running a 200K-GPU infrastructure long-term?
  • Could x.AI have achieved similar results with better data curation, training optimizations, or parameter efficiency rather than brute-forcing with GPUs?
  • Would investing in more efficient architectures have yielded better results?
  • How sustainable is this approach in the long run, given the increasing costs and competition in the AI space?

Conclusion

Grok 3 marks a significant leap for x.AI, demonstrating notable improvements over its predecessor. However, despite its 100K+ H100 GPU infrastructure, it failed to consistently outperform competitors like DeepSeek-R1, o1, and Gemini 1.5 Pro, which achieved comparable results with far fewer resources.

Beyond performance, the energy and financial costs of such massive GPU usage raise concerns about long-term sustainability. While x.AI prioritized raw power, rivals are achieving efficiency through optimized architectures and smarter training strategies.

So, were the 100K GPUs worth it? We don’t think so, at this point. If Grok 3 can’t consistently dominate, x.AI may need to rethink whether brute-force computation is the best path forward in the AI race.

Discover the power of xAI Grok 3, the smartest AI on Earth! Learn how 100K+ GPUs enhance its capabilities. Enroll in our course to explore its features and transform your projects today!

Frequently Asked Questions

Q1. What is Grok 3?

A. Grok 3 is x.AI’s latest LLM capable of performing tasks like advanced reasoning, enhanced reasoning and coding. 

Q2. Why did x.AI use 100K GPUs for Grok 3?

A. x.AI used 100K+ NVIDIA H100 GPUs to accelerate Grok 3’s training and improve its reasoning, research, and problem-solving abilities.

Q3. What is the cost of training Grok 3 on 100K GPUs?

A. The estimated cost of training and running 100K GPUs includes millions of dollars in hardware, energy consumption, and maintenance costs.

Q4. How does Grok 3 compare to DeepSeek-R1 in efficiency?

A. DeepSeek-R1 was trained on just 2,048 GPUs but achieved competitive results. This shows that efficient AI training techniques can rival brute-force computation.

Q5. Are 100K GPUs necessary for training AI models?

A. While more GPUs speed up training, AI companies like OpenAI and Google use optimized architectures, mixture-of-experts (MoE), and retrieval-augmented generation (RAG) to achieve similar results with fewer GPUs.

Q6. What are the limitations of Grok 3 despite using 100K GPUs?

A. Despite using massive computational resources, Grok 3 did not consistently outperform competitors. Moreover, it struggled in tasks like advanced reasoning and deep search analysis.

Q7. Was the investment in 100K GPUs for Grok 3 worth it?

A. While Grok 3 is a powerful AI model, the high cost, energy consumption, and performance inconsistencies suggest that a more efficient approach may have been a better strategy.

Anu Madan has 5+ years of experience in content creation and management. Having worked as a content creator, reviewer, and manager, she has created several courses and blogs. Currently, she working on creating and strategizing the content curation and design around Generative AI and other upcoming technology.

Responses From Readers

Clear

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details