Are 100K+ GPUs for Grok 3 worth it?

Anu Madan Last Updated : 21 Feb, 2025

9 min read

With 3.3M+ people watching the launch, Elon Musk and his team introduced the world to “Grok 3”, the most capable and powerful model by x.AI to date. The company that started in 2023 and got its last model (Grok 2) out in 2024, is now challenging models by top companies like OpenAI, Google, and Meta that have been in the AI race for the last 5-7 years. All thanks to over 100K H100 NVIDIA GPUs! But DeepSeek, which also started its work in 2023, achieved o3-mini level capabilities with just a fraction of GPUs that Grok 3 did! In this blog, we will explore if Grok 3 is worth utilizing 100K+ H100 NVIDIA GPUs.

What is NVIDIA H100 GPU?
Why Do AI Companies Need It?
What Can 100K H100 GPUs Do?
Why Did Grok 3 Need 100K H100?
Grok 3 Comparison with DeepSeek-R1
Value Check: Grok 3 vs Other Leading Models
100K H100 GPUs: Was It Worth It?
- Energy Utilization
- Scalability and Efficiency Considerations
Conclusion
Frequently Asked Questions

What is NVIDIA H100 GPU?

The NVIDIA H100 GPU is a high-performance processor built for AI training, inference, and high-performance computing (HPC). Being a successor to A100, it delivers faster processing, better efficiency, and improved scalability, making it a critical tool for modern AI applications. It is used by AI companies and research institutions, including OpenAI, Google, Meta, Tesla, and AWS, who rely on the NVIDIA H100 for developing cutting-edge AI solutions.

Also Read: Intel’s Gaudi 3: Setting New Standards with 40% Faster AI Acceleration than Nvidia H100

Why Do AI Companies Need It?

There are several reasons why major tech and AI companies around the world are investing in the NVIDIA H100 Chips:

Why do AI companies need H100 NVIDIA GPUs

AI Training & Inference: The H100 is behind many advanced AI models like GPT-4, Grok 3, and Gemini, as it minimizes training time and improves inference performance.
High-Speed Processing: Equipped with 80GB of HBM3 memory and a 3 TB/s bandwidth, along with NVLink (900 GB/s), the H100 ensures rapid data movement and seamless multi-GPU operations.
Optimized for AI: Featuring FP8 & TF32 precision with its Transformer Engine, it accelerates deep learning tasks while maintaining efficiency and accuracy.
Cloud & HPC Applications: Widely used by cloud providers such as AWS, Google Cloud, and Microsoft Azure, the H100 supports large-scale AI workloads and enterprise applications.
Cost & Energy Efficiency: Built for high performance per watt, it reduces operational costs while maximizing computational power, making it a sustainable choice for AI infrastructure.

What Can 100K H100 GPUs Do?

100,000 H100 GPUs can break down massive problems (like training sophisticated AI models or running complex simulations) into many small tasks, and work on them all at once. This extraordinary parallel processing power means tasks that would normally take a very long time can be completed incredibly fast.

Imagine a simple task that takes 10 days to complete on a single H100 GPU. Now, let’s convert 10 days to seconds:

10 days ≈ 10 × 24 × 3600 = 864,000 seconds

If the task scales perfectly, with 100,000 GPUs the time required would be:

Time = 864,000 seconds ÷ 100,000 = 8.64 seconds

So a job that would have taken 10 days on one GPU could, in theory, be completed in less than 10 seconds with 100K GPUs working together!

Why Did Grok 3 Need 100K H100?

Grok 3 is a successor to Grok 2, a model that did come with features like image generation on top of text. However, as a whole, it was subpar when compared to top models by OpenAI, Google, and Meta. That is why for Grok 3, Elon Musk’s x.AI wanted to catch up or in fact beat all the existing competitors in the field. That is why x.AI went big! They created a data center consisting of over 100K GPUs and expanded it further to 200K GPUs. That is why, in less than a year, they have been able to create Grok 3 – a model capable of advanced reasoning, enhanced thinking as well as deep research.

The performance difference between Grok 3 to Grok 2 is a clear indicates this leap.

Benchmark	Grok 2 mini (High)	Grok 3 (mini)
Math (AIME2 ’24)	72	80
Science (GPOA)	68	78
Coding (LCB Oct–Feb)	72	80

Almost a 10-point jump across all major benchmarks including Math, Science, and Coding! Impressive right? But is it impressive enough for the computing power of 100K H100 GPUs?

Also Read: Grok 3 is Here! And What It Can Do Will Blow Your Mind!

Grok 3 Comparison with DeepSeek-R1

When DeepSeek-R1 was launched, it took the world by storm! All major AI companies could feel the heat due to their falling stock prices and decreasing user base as people flocked towards the open source marvel that challenged OpenAI’s best of the best! But to do this, did DeepSeek-R1 use 100K GPUs?

Well, not even a fraction of it! DeepSeek-R1 has been fine-tuned on top of the DeepSeek-V3 base model. DeepSeek-V3 has been trained on just 2048 NVIDIA H800 GPUs. (H800 GPUs are a China-specific variant of NVIDIA’s H100 GPUs, designed to comply with U.S. export restrictions with a smaller inference time). This essentially means that DeepSeek-R1 has been trained using just 2% of the computation compared to Grok 3.

As per the benchmarks, Grok 3 is significantly better than DeepSeek-R1 across all major fronts.

But is it true? Is Grok 3 truly better than DeepSeek-R1 and the rest of the other models as the benchmarks claim? Were 100K H100 GPUs really worth it?

Also Read: Grok 3 vs DeepSeek R1: Which is Better?

Value Check: Grok 3 vs Other Leading Models

We will test Grok 3 against the top models including o1, DeepSeek-R1, and Gemini models for various tasks to see how it performs. To do this I will compare Grok 3 with a different model in each test, based on the outputs I receive from the two models. I will be evaluating the models on three different tasks:

Deep Search
Advanced Reasoning
Image Analysis

I will then select the one that I find better based on the outputs.

Test 1: Deep Search

Models: Grok 3 and Gemini 1.5 Pro with Deep Research

Prompt: “Give me a detailed report on the latest LLMs comparing them on all the available benchmarks.”

Results:

By Grok 3:

Report

By Gemini 1.5 Pro with Deep Search:

Report

Review:

Criteria	Grok 3 (Deep Research)	Gemini 1.5 Pro with Deep Search	Which is Better?
Coverage of LLMs	Focuses on 5 models (Grok 3, GPT-4o, Claude 3.5, DeepSeek-R1, and Gemini 2.0 Pro).	Covers a wider range of models, including Grok 3, GPT-4o, Gemini Flash 2.0, Mistral, Mixtral, Llama 3, Command R+, and others.	Gemini
Benchmark Variety	Math (AIME, MATH-500), Science (GPQA), Coding (HumanEval), and Chatbot Arena ELO score.	Includes all major benchmarks + multilingual, tool use and general reasoning,	Gemini
Depth of Performance Analysis	Detailed benchmark-specific scores but lacks efficiency and deployment insights.	Provides broader performance analysis, covering both raw scores and real-world usability.	Gemini
Efficiency Metrics (Context, Cost, Latency, etc.)	Not covered.	Includes API pricing, context window size, and inference latency.	Gemini
Real-World Applications	Focuses only on benchmark numbers.	Covers practical use cases like AI assistants, business productivity, and enterprise tools.	Gemini

Clearly, on each criterion, the report generated by Gemini 1.5 Pro Deep Search was better, more inclusive,, and more comprehensive of all the details around LLM benchmarks.

Test 2: Advanced Reasoning

Models: Grok 3 and o1

Prompt: “If a wormhole and a black hole suddenly come near Earth from two opposing sides, what would happen?”

Results:

Response by Grok 3:

Is 100K+ GPUs for Grok 3 worth it? | output by Grok 3

Response by o1:

Review:

Criteria	Grok 3 (Think)	o1	Which is Better?
Black Hole Effects	Simplified explanation, focusing on event horizon and spaghettification.	Detailed explanation of tidal forces, orbital disruption, and radiation.	o1
Wormhole Effects	Briefly mentions stability and travel potential.	Discusses stability, gravitational influence, and theoretical properties.	o1
Gravitational Impact on Earth	Mentions gravitational pull but lacks in-depth analysis.	Explains how the black hole dominates with stronger tidal forces.	o1
Interplay Between Both	Speculates about a possible link between the black hole and wormhole.	Describes gravitational tug-of-war and possible wormhole collapse.	o1
Potential for Earth’s Survival	Suggests the wormhole could be an escape route but is highly speculative.	Clearly states that survival is highly unlikely due to black hole’s forces.	o1
Scientific Depth	More general and practical, less detailed on physics.	Provides a structured, theoretical discussion on spacetime effects.	o1
Conclusion	Black hole dominates, and wormhole adds minor chaos.	Earth is destroyed by black hole forces. Wormhole’s role is uncertain.	o1

The result generated by o1 is better as it is more detailed, scientific, and well-structured compared to the result given by Grok 3.

Also Read: Grok 3 vs o3-mini: Which Model is Better?

Test 3: Image Analysis

Models: Grok 3 and DeepSeek-R1

Prompt: “What is the win probability of each team based on the image?”

Results:

Response by Grok 3:

Response by DeepSeek-R1:

Review:

Criteria	Grok 3	DeepSeek-R1	Which is Better?
Win Probability (Afghanistan)	55-60%	70%	DeepSeek-R1
Win Probability (Pakistan)	40-45%	30%	Grok 3
Key Factors Considered	Includes historical trends, required run rate, team strengths, and pitch conditions.	Focuses on the final-over situation (9 runs needed, 2 wickets left).	Grok 3
Assumptions Made	Considers Pakistan’s ability to chase 316 and Afghanistan’s bowling attack.	Assumes Afghanistan will successfully chase the target.	Grok 3
Overall Conclusion	Afghanistan has a slight edge, but Pakistan has a reasonable chance depending on their chase.	Afghanistan is in a strong position, and Pakistan needs quick wickets.	Grok 3

Although the result given by DeepSeek-R1 was more accurate, Grok 3 gave a brilliant assessment of the match based on the image.

Final Result: Grok 3 lost in 2 out of 3 tasks when pitied against its competitors.

100K H100 GPUs: Was It Worth It?

Now that we’ve seen how Grok 3 performs against competitors in various tasks, the real question remains: Was the massive investment in over 100K H100 GPUs justified?

While Grok 3 has demonstrated significant improvements over its predecessor and outperforms some models in specific areas, it consistently fails to dominate across the board. Other models, such as DeepSeek-R1 and OpenAI’s o1, achieved similar or superior results while utilizing significantly fewer computational resources.

Energy Utilization

Beyond the financial investment, powering and cooling a data center with 100K+ H100 GPUs comes with a massive energy burden. Each H100 GPU consumes up to 700W of power under full load. That means:

100K GPUs x 700W = 70 megawatts (MW) of power consumption at peak usage.
That’s roughly equivalent to the electricity consumption of a small city!
Factor in cooling requirements and the total energy consumption increases significantly.

Grok 3’s energy-intensive approach may not be the most sustainable. OpenAI & Google are now focussing on smaller, more efficient architectures and energy-optimized training techniques, while x.AI has chosen brute-force computation.

Scalability and Efficiency Considerations

Training AI models at scale is an expensive endeavor—not just in terms of hardware but also power consumption and operational costs.

By comparison, companies like OpenAI and Google optimize their training pipelines by employing mixture-of-experts (MoE) models, retrieval-augmented generation (RAG), and fine-tuning techniques to maximize efficiency while minimizing compute costs.

Meanwhile, open-source communities are demonstrating that high-quality AI models can be built with significantly lower resources. DeepSeek-R1 challenging industry leaders while being trained on just 2,048 H800 GPUs, is a prime example of this.

Hence, the development of a model like Grok 3 raises major concerns:

Can x.AI sustain the financial and environmental costs of running a 200K-GPU infrastructure long-term?
Could x.AI have achieved similar results with better data curation, training optimizations, or parameter efficiency rather than brute-forcing with GPUs?
Would investing in more efficient architectures have yielded better results?
How sustainable is this approach in the long run, given the increasing costs and competition in the AI space?

Conclusion

Grok 3 marks a significant leap for x.AI, demonstrating notable improvements over its predecessor. However, despite its 100K+ H100 GPU infrastructure, it failed to consistently outperform competitors like DeepSeek-R1, o1, and Gemini 1.5 Pro, which achieved comparable results with far fewer resources.

Beyond performance, the energy and financial costs of such massive GPU usage raise concerns about long-term sustainability. While x.AI prioritized raw power, rivals are achieving efficiency through optimized architectures and smarter training strategies.

So, were the 100K GPUs worth it? We don’t think so, at this point. If Grok 3 can’t consistently dominate, x.AI may need to rethink whether brute-force computation is the best path forward in the AI race.

Discover the power of xAI Grok 3, the smartest AI on Earth! Learn how 100K+ GPUs enhance its capabilities. Enroll in our course to explore its features and transform your projects today!

Frequently Asked Questions

Q1. What is Grok 3?

A. Grok 3 is x.AI’s latest LLM capable of performing tasks like advanced reasoning, enhanced reasoning and coding.

Q2. Why did x.AI use 100K GPUs for Grok 3?

A. x.AI used 100K+ NVIDIA H100 GPUs to accelerate Grok 3’s training and improve its reasoning, research, and problem-solving abilities.

Q3. What is the cost of training Grok 3 on 100K GPUs?

A. The estimated cost of training and running 100K GPUs includes millions of dollars in hardware, energy consumption, and maintenance costs.

Q4. How does Grok 3 compare to DeepSeek-R1 in efficiency?

A. DeepSeek-R1 was trained on just 2,048 GPUs but achieved competitive results. This shows that efficient AI training techniques can rival brute-force computation.

Q5. Are 100K GPUs necessary for training AI models?

A. While more GPUs speed up training, AI companies like OpenAI and Google use optimized architectures, mixture-of-experts (MoE), and retrieval-augmented generation (RAG) to achieve similar results with fewer GPUs.

Q6. What are the limitations of Grok 3 despite using 100K GPUs?

A. Despite using massive computational resources, Grok 3 did not consistently outperform competitors. Moreover, it struggled in tasks like advanced reasoning and deep search analysis.

Q7. Was the investment in 100K GPUs for Grok 3 worth it?

A. While Grok 3 is a powerful AI model, the high cost, energy consumption, and performance inconsistencies suggest that a more efficient approach may have been a better strategy.

Anu Madan

Anu Madan has 5+ years of experience in content creation and management. Having worked as a content creator, reviewer, and manager, she has created several courses and blogs. Currently, she working on creating and strategizing the content curation and design around Generative AI and other upcoming technology.

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Deep Learning

Feed Forward Networks

Gradient Descent

Loss Function

Activation Functions

Introduction to Neural networks

Forward and Backward Propagation

Optimizers

Learning Rate Schedulers

NN on Structured Data

Improving the Deep Learning Model

Deep Learning Model Optimization

Unsupervised Deep Learning

AutoDL

Model Deployment

Introduction to PyTorch

Are 100K+ GPUs for Grok 3 worth it?

Table of Contents

What is NVIDIA H100 GPU?

Why Do AI Companies Need It?

What Can 100K H100 GPUs Do?

Why Did Grok 3 Need 100K H100?

Grok 3 Comparison with DeepSeek-R1

Value Check: Grok 3 vs Other Leading Models

Test 1: Deep Search

Results:

Review:

Test 2: Advanced Reasoning

Results:

Review:

Test 3: Image Analysis

Results:

Review:

Final Result: Grok 3 lost in 2 out of 3 tasks when pitied against its competitors.

100K H100 GPUs: Was It Worth It?

Energy Utilization

Scalability and Efficiency Considerations

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm