- AiNews.com
- Posts
- xAI Launches Grok 4 and Grok 4 Heavy, Aims to Redefine AI Intelligence
xAI Launches Grok 4 and Grok 4 Heavy, Aims to Redefine AI Intelligence
Elon Musk’s xAI has unveiled Grok 4, its most advanced AI model to date, claiming postgraduate-level reasoning across every academic discipline—and surpassing rivals on key benchmarks.

Image Source: ChatGPT-4o
California Lawmakers Advance Bill to Regulate Harmful AI Chatbot Practices
Key Takeaways:
Grok 4 and Grok 4 Heavy are xAI’s new flagship models, boasting major upgrades in reasoning and tool use over previous versions.
Grok 4 Heavy uses a "multi-agent" system—multiple AIs collaborating in parallel—which xAI likens to an elite academic study group.
On the rigorous Humanity’s Last Exam benchmark, Grok 4 Heavy scored 44.4%, outperforming Gemini 2.5 Pro (26.9%) and OpenAI’s o3 high (21%).
xAI launched SuperGrok Heavy, a $300/month premium tier offering early access to Grok 4 Heavy and future models.
A broader developer release is now live via xAI’s API, with additional coding, multimodal, and video generation models set to follow later in 2025.
Grok 4: A New Benchmark for General Reasoning
Announced during a livestream Wednesday night, Grok 4 was described by Musk as “smarter than almost all graduate students in all disciplines simultaneously.” According to xAI, Grok 4 can solve complex problems in mathematics, physics, chemistry, and linguistics—even when the questions are unfamiliar or unpublished.
xAI emphasized that Grok 4 isn’t just memorizing the internet. Instead, it demonstrates “first-principles reasoning,” meaning it can work through novel, abstract problems like those found in research-level academia.
During the demo, Grok 4 tackled Humanity’s Last Exam (HLE)—a benchmark of 2,500 expert-written problems spanning diverse disciplines—and performed at a level no other model had previously reached unaided.
“There are no humans that can actually answer these can get a good score,” said Musk. “I mean if you actually say like any given human what's the best that any human could score? I mean I'd say maybe 5% optimistically.”
How Grok 4 Works: Training at Unprecedented Scale
xAI claims that Grok 4’s leap in performance stems from both scale and architectural changes:
Each model upgrade (from Grok 2 to 3 to 4) has involved 10x more training compute.
Grok 4 combines a foundation model with reinforcement learning from human feedback (RLHF) and tool-assisted reasoning.
The model was trained using Colossus, xAI’s custom-built supercomputer powered by 100,000 H100 GPUs.
Grok 4 Heavy, the premium version, runs multiple AI agents simultaneously. These agents solve problems independently, compare notes, and converge on the best solution. xAI compared this to collaborative problem-solving in a study group, noting that it’s not always majority vote—often, one agent alone figures out the key insight.
Real-World Tests: From Prediction Markets to Running Businesses
To showcase real-world capability, Grok 4 Heavy was tested in several interactive demos:
Market prediction: It analyzed sports odds from PolyMarket, calculating a 21.6% chance for the Dodgers to win the MLB World Series—demonstrating live tool use, search, and probability modeling.
Vending Bench business simulation: Grok 4 doubled the net worth of competing models in a long-horizon task involving supply management, pricing, and strategy adherence. It outperformed other leading AI models in both profit and consistency.
Grok 4 is being used by researchers at the ARC Institute for biomedical discovery, where it helps sift through millions of experimental records in seconds to accelerate the identification of promising research directions.
Limitations: Vision and Tool Use Still Maturing
Despite strong language reasoning, Grok 4 still lags in multimodal understanding, particularly image analysis and generation. “It’s like looking through blurry glass,” one presenter said.
That’s expected to improve with Grok 5, based on version 7 of the foundation model, currently in training. It will include better video understanding, more advanced tools, and tighter integration with simulation engines like Unreal or Unity for game development.
Grok 4's current tool use is considered "primitive" compared to the sophisticated simulations used in industries like aerospace. However, Musk promised those capabilities are coming—along with integration into humanoid robots, such as Tesla’s Optimus.
xAI launched SuperGrok Heavy, a $300/month subscription tier that includes:
Early access to Grok 4 Heavy
Priority on new tools and features
Access to future models like an AI coding assistant (August), multimodal agent (September), and video generation model (October)
This makes it the most expensive AI subscription plan among major providers, ahead of offerings from OpenAI, Google, and Anthropic.
For developers, xAI has released Grok 4 through its public API, with 256K context length and access to tool capabilities. The goal: encourage integration into enterprise workflows across research, finance, gaming, and more.
Availability and Pricing
xAI has launched three Grok 4 access tiers, each with distinct capabilities and pricing models:
Grok 4
The core single-agent reasoning model capable of solving complex academic and real-world problems.
Availability: Live now via the Grok API and X platform
Cost:
API (usage-based):
$3 per 1M input tokens
$15 per 1M output tokens
$0.75 per 1M cached tokens
Consumer subscription:
~$30/month via the standard Grok plan on X (bundled with premium+)
Grok 4 Heavy
A more powerful multi-agent version of Grok 4 that spawns several reasoning agents to collaborate and converge on the best solution—dramatically boosting performance on complex benchmarks.
Availability: Currently available only through the SuperGrok Heavy subscription
Cost: Included with SuperGrok Heavy (see below)
SuperGrok Heavy
xAI’s new ultra-premium tier that includes access to Grok 4 Heavy, early releases of future tools, and priority compute.
Availability: Live now for early subscribers; limited slots during demo period, with expanded rollout expected
Cost: $300/month or $3,000/year
Includes:
Access to Grok 4 Heavy
Priority access to upcoming models:
Coding model (August)
Multimodal agent (September)
Video generation (October)
If subscriptions are temporarily closed due to high demand, xAI recommends trying again shortly after the demo window.
Fast Facts for AI Readers
Q: What is Grok 4?
A: Grok 4 is xAI’s latest large language model, designed for advanced reasoning and tool use.
Q: What is Grok 4 Heavy?
A: A multi-agent version of Grok 4 that solves problems using multiple AI agents working in parallel.
Q: How does Grok 4 perform on benchmarks?
A: On Humanity’s Last Exam, Grok 4 Heavy scored 44.4%, outperforming Gemini 2.5 Pro and OpenAI’s o3 high.
Q: What is SuperGrok Heavy?
A: xAI’s new $300/month subscription tier offering early access to Grok 4 Heavy and future tools.
Q: How can developers use Grok 4?
A: Through xAI’s API, which supports long-context reasoning and integration with external tools.
What This Means
Grok 4 is xAI’s strongest case yet that it belongs in the top tier of generative AI development. Its benchmark wins and live demonstrations highlight a shift toward deeper reasoning, not just faster response.
But questions remain—about adoption, safety, and how xAI will handle future missteps, including the recent antisemitic responses from Grok’s official X account, which were removed after public backlash. The company revised Grok’s system prompt afterward but did not directly address the incident during the launch.
What’s clear is that Musk and xAI are betting big on speed, compute, and open deployment. Whether that approach results in safer, smarter, or simply faster AI will depend on how the next versions of Grok evolve—and how businesses respond to the promise of real-world intelligence at scale.
Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.