
A realistic visualization of Claude Sonnet 4.5 as an AI agent, coding and managing tasks across multiple screens, highlighting Anthropic’s breakthrough in coding, reasoning, and real-world computer use. Image Source: ChatGPT-5
Anthropic Launches Claude Sonnet 4.5 With Breakthrough Coding and Agentic AI Capabilities
Key Takeaways: Claude Sonnet 4.5 Sets a New Standard for AI Coding and Agents
Anthropic released Claude Sonnet 4.5, describing it as the world’s most advanced AI coding model, with major improvements in reasoning, math, and long-horizon tasks.
The update includes Claude Code checkpoints, a refreshed terminal interface, a native VS Code extension, and file creation inside Claude apps.
A new Claude Agent SDK gives developers access to the same infrastructure powering Anthropic’s agentic tools.
Benchmarks show leading results: 61.4% on OSWorld for real-world computer use and state-of-the-art performance on SWE-bench Verified for software coding.
Customers including Cursor, Netflix, and Replit report transformative improvements in code generation, debugging, and developer productivity.
Released under AI Safety Level 3 (ASL-3) protections, Claude Sonnet 4.5 introduces stronger defenses against risks like prompt injection and misuse of sensitive content.
A bonus research preview, “Imagine with Claude,” lets Max users interact with Claude generating software in real time for five days.
Claude Sonnet 4.5: Frontier Model for Coding and Computer Use
Anthropic has announced the release of Claude Sonnet 4.5, calling it the strongest AI model yet for coding, agent creation, and advanced computer use. The company said the model delivers “substantial gains in reasoning and math,” while maintaining long focus on complex, multi-step tasks.
The release builds on previous Claude iterations with sweeping upgrades across Claude Code, Claude apps, and the Claude API.
New features include:
Checkpoints in Claude Code to save progress and instantly roll back.
A refreshed terminal interface and native VS Code extension.
Context editing and expanded memory tools in the Claude API enable longer runs and greater complexity.
Direct file creation and code execution within Claude apps, enabling spreadsheets, slides, and documents to be generated inside conversations.
Wider availability of the Claude for Chrome extension, now accessible to Max waitlist users.
Launch of the Claude Agent SDK, giving developers access to the infrastructure behind Claude Code for building advanced AI agents.
Pricing for Claude Sonnet 4.5 remains the same as Claude Sonnet 4 at $3/$15 per million tokens, and it is available immediately worldwide through the Claude API and apps.
Benchmark Performance: Coding, Reasoning, and Long-Horizon Tasks
Anthropic highlighted strong results across industry-standard benchmarks for Claude Sonnet 4.5.
On SWE-bench Verified, which measures real-world software engineering accuracy, the model scored 77.2% — rising to 82.0% with parallel test-time compute. This marks the highest performance across leading frontier models, ahead of Claude Opus 4.1 (74.5%), Claude Sonnet 4 (72.7%), GPT-5 Codex (74.5%), and Gemini 2.5 Pro (67.2%).
On OSWorld, a benchmark testing real-world computer use, Claude Sonnet 4.5 achieved 61.4%, compared with 42.2% for Sonnet 4 just four months earlier.
The model also shows significant advances in reasoning and math. On the AIME 2025 high school math competition, it scored a perfect 100% in Python and 87% without tools, compared to 70.5% for Sonnet 4. It also achieved 83.4% on graduate-level reasoning (GPAQ Diamond) and 89.1% on multilingual Q&A (MMLU).
In specialized domains, Claude Sonnet 4.5 demonstrated further gains. On financial analysis benchmarks, it reached a 72% win rate with extended context and 68% at standard context, compared with 60% for Opus 4.1 and 50% for Sonnet 4. In legal reasoning, it scored 65% with extended context, up from 55% for Opus 4.1 and 50% for Sonnet 4. In medicine, the model achieved 61% with extended context, compared with 53% for Opus 4.1 and 50% for Sonnet 4.
Beyond benchmarks, Claude Sonnet 4.5 demonstrated the ability to maintain focus for more than 30 hours on extended tasks, with improved domain-specific performance across finance, law, medicine, and STEM.
In this demo video, Anthropic shows Claude working directly in a browser — navigating sites, filling spreadsheets, and completing tasks in real time.
By the Numbers: Claude Sonnet 4.5 Performance Highlights
77.2% on SWE-bench Verified — and 82.0% with parallel test-time compute, the highest across major models.
61.4% on OSWorld, up from 42.2% for Claude Sonnet 4 four months ago.
100% (Python) and 87% (no tools) on AIME 2025 high school math, compared with 70.5% for Sonnet 4.
83.4% on graduate-level reasoning (GPAQ Diamond), up from 76.1% for Sonnet 4.
72% in finance, 65% in law, and 61% in medicine with extended context, outperforming earlier Claude models across all three domains.
Industry Endorsements: Early Customers Report Major Gains
Several enterprise customers have already tested Claude Sonnet 4.5 and reported transformative improvements.
Cursor CEO Michael Truell said: “We're seeing state-of-the-art coding performance from Claude Sonnet 4.5, with significant improvements on longer horizon tasks. It reinforces why many developers using Cursor choose Claude for solving their most complex problems.”
Netflix Tech Lead Eric Wendelin noted: “Claude Sonnet 4.5 is excellent at software development tasks, learning our codebase patterns to deliver precise implementations. It handles everything from debugging to architecture with deep contextual understanding, transforming our development velocity.”
Replit President Michele Catasta praised its editing capabilities: “Claude Sonnet 4.5's edit capabilities are exceptional — we went from 9% error rate on Sonnet 4 to 0% on our internal code editing benchmark. Higher tool success at lower cost is a major leap for agentic coding. Claude Sonnet 4.5 balances creativity and control perfectly.”
Safety and Alignment: AI Safety Level 3 Protections
Anthropic said Claude Sonnet 4.5 is its “most aligned frontier model” to date, citing improvements in addressing concerning behaviors such as sycophancy, deception, power-seeking, and delusional responses.
To mitigate risks, the model is released under AI Safety Level 3 (ASL-3) protections, including:
Classifier filters designed to detect dangerous content related to chemical, biological, radiological, and nuclear (CBRN) weapons.
Enhanced defenses against prompt injection attacks, a growing risk for agentic AI systems.
Ongoing work to reduce false positives, with misclassifications reduced tenfold compared to earlier releases.
The release is accompanied by a detailed system card including new evaluations using mechanistic interpretability techniques.
Claude Agent SDK: Infrastructure for Building AI Agents
Alongside the model, Anthropic is releasing the Claude Agent SDK, offering developers access to the same infrastructure used to build Claude Code. The SDK addresses core challenges of long-running AI tasks, including:
Memory management across extended processes.
Permission systems balancing autonomy with user oversight.
Coordination between subagents working toward shared goals.
Anthropic said the SDK can be applied to “a very wide variety of tasks, not just coding,” giving developers a foundation to build advanced agentic applications.
Additional Resources: Claude Sonnet 4.5 System Card, Documentation, and Developer Tools
Claude Code updates are available to all users. Claude Developer Platform updates, including the Claude Agent SDK, are open to all developers. Code execution and file creation are included on all paid plans in the Claude apps.
For full technical details and evaluation results, see Anthropic’s system card, model page, and documentation. Additional insights can be found in the company’s engineering posts and cybersecurity research post.
Research Preview: “Imagine with Claude”
As a limited-time experiment, Anthropic is also offering “Imagine with Claude,” a research preview available to Max subscribers for five days. The tool allows users to watch Claude generate software in real time with no prewritten functionality, showcasing the adaptability of the new model.
Max subscribers can try “Imagine with Claude” at claude.ai/imagine.
Q&A: Claude Sonnet 4.5 Explained
Q: What is Claude Sonnet 4.5?
A: It is Anthropic’s latest frontier AI model, designed for coding, agent creation, and advanced computer use, with major upgrades across reasoning and math.
Q: How does it perform compared to previous versions?
A: It outperforms Claude Sonnet 4 with 61.4% on OSWorld and state-of-the-art coding on SWE-bench Verified, showing stronger long-task focus and domain knowledge.
Q: What’s new in Claude Code and apps?
A: New features include checkpoints, a VS Code extension, file creation, and Chrome extension support for Max users.
Q: What is the Claude Agent SDK?
A: It is a developer toolkit offering the infrastructure behind Claude Code, enabling the creation of advanced AI agents across multiple domains.
Q: How is safety being addressed?
A: The model is released under AI Safety Level 3, with new safeguards against prompt injection, misuse risks, and false-positive filtering.
What This Means: Claude Sonnet 4.5 and the Next Phase of Agentic AI
The release of Claude Sonnet 4.5 signals a pivotal shift in how AI systems are being designed, deployed, and integrated into real-world workflows. By combining advanced reasoning, domain knowledge, and long-horizon coding capabilities with direct computer use, Anthropic is positioning Claude not just as a conversational model, but as a foundation for agentic AI — systems capable of autonomously carrying out complex tasks across digital environments.
For developers, the addition of the Claude Agent SDK is especially significant. Until now, much of the infrastructure behind successful agentic tools has been proprietary. By opening access to memory management systems, permission protocols, and subagent coordination frameworks, Anthropic is effectively lowering the barrier to building powerful, domain-specific AI agents. This could accelerate innovation in fields ranging from enterprise software and scientific research to financial modeling and operations.
For businesses and institutions, the release raises both opportunities and risks. On one hand, tools like Claude Code and Claude for Chrome promise major productivity gains by embedding AI directly into coding, document creation, and browser-based workflows. On the other, the expansion of agentic capabilities intensifies concerns about data privacy, security, and misuse — particularly as models gain greater autonomy in navigating and acting within digital systems.
For the AI industry as a whole, Claude Sonnet 4.5 intensifies the competitive landscape of frontier models. By maintaining price parity with Sonnet 4 while delivering substantial performance gains, Anthropic is challenging rivals to match both capability and accessibility. The model’s framing under AI Safety Level 3 (ASL-3) protections also underscores a broader shift: that the release of frontier AI is inseparable from robust governance frameworks and alignment commitments.
Looking ahead, the question will be less about whether AI can code or reason — and more about how these models are deployed as agents of work, shaping productivity, creativity, and decision-making at scale. With Claude Sonnet 4.5, Anthropic has taken a decisive step toward that future, setting a high bar for capability while tying its progress to a safety-first philosophy.
Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.