AiNews.com
Posts
Anthropic Launches Claude 4: Major Upgrades in Coding, Reasoning, and AI Agents

Anthropic Launches Claude 4: Major Upgrades in Coding, Reasoning, and AI Agents

Alicia Shapiro
May 23, 2025 • Estimated Reading Time: 10 minutes

A male software developer sits at a clean, modern desk in a sunlit office, focused on two large monitors. The left screen displays Python code in Visual Studio Code, while the right screen shows a Claude Code interface offering AI-generated suggestions. The developer wears glasses and a navy button-up shirt, typing on a wireless keyboard. A white coffee mug and yellow sticky notes are placed neatly on the desk, contributing to a tidy, professional workspace. The scene conveys a realistic, in-progress coding session with AI assistance.

Image Source: ChatGPT-4o

Anthropic Launches Claude 4: Major Upgrades in Coding, Reasoning, and AI Agents

Anthropic has launched the latest generation of its Claude AI models, introducing Claude Opus 4 and Claude Sonnet 4 with significant gains in software development, advanced reasoning, and AI agent workflows.

The models are designed to support both rapid interactions and extended, tool-assisted thinking. Claude Opus 4 is now positioned as the top-performing coding model in the world, while Sonnet 4 offers stronger coding and reasoning capabilities, with improved precision in following instructions.

Expanded Capabilities and Tool Use

Both Claude 4 models include new support for tool use during extended thinking, a feature in beta that enables the models to alternate between reasoning and external tools like web search. These tools can now be used in parallel, allowing for more efficient processing and task management.

Developers who provide Claude access to local files can take advantage of improved memory capabilities, with the models able to extract, retain, and save key information. This allows for more continuity across interactions and helps build what Anthropic describes as “tacit knowledge” over time.

Claude Code Now Generally Available

Claude Code is now officially available, expanding the reach of Claude’s AI capabilities across more of the developer workflow—from terminals and integrated development environments (IDEs) to automated background tasks.

IDE Integrations

New beta extensions for Visual Studio Code and JetBrains IDEs bring Claude Code directly into the coding environment. Once installed, Claude can:

Propose edits that appear inline within your files
Streamline code review and tracking
Integrate naturally into your existing editor interface

To get started, developers simply run Claude Code from the terminal within their IDE, making installation quick and familiar.

Claude Code SDK and GitHub Integration

Beyond IDEs, Anthropic is also releasing a new Claude Code SDK, allowing developers to build their own AI-powered agents and applications using the same foundational technology behind Claude Code.

Installation is simple: developers can activate the GitHub app by running the /install-github-app command from within Claude Code.

These tools are designed to make Claude not just a coding assistant, but a fully integrated part of the software development lifecycle—capable of editing, testing, debugging, and improving code collaboratively, all in real time.

Four New API Tools for Developers

The update introduces four new features on the Anthropic API:

A code execution tool
An MCP connector (for external system integration)
A Files API for handling documents
A prompt caching feature for up to one hour of reuse

These tools are designed to help developers build more capable, autonomous AI agents.

Opus 4: Focused Power for Complex Tasks

Claude Opus 4 sets a new standard in long-duration performance, solving complex, multi-step tasks that demand sustained attention. It leads industry benchmarks with:

72.5% on SWE-bench (a benchmark for software engineering)
43.2% on Terminal-bench

A bar chart showing model performance on SWE-bench Verified, a benchmark for software engineering tasks. Claude Opus 4 and Sonnet 4 top the chart with 72.5% and 72.7% accuracy, respectively. Sonnet 3.7 trails behind at 62.3%, followed by OpenAI Codex-1 (72.1%), OpenAI o3 (69.1%), GPT-4.1 (54.6%), and Gemini 2.5 Pro (63.2%). The chart highlights Claude 4’s leadership in real-world coding accuracy.

Claude 4 Sets New Standard in Software Engineering Accuracy (SWE-bench Verified). Image Source: Anthropic

A benchmark comparison table showing performance across seven AI tasks for Claude Opus 4, Claude Sonnet 4, Claude Sonnet 3.7, OpenAI o3, GPT-4.1, and Gemini 2.5 Pro. Claude 4 models consistently score highest or near-highest in categories including agentic coding (SWE-bench), terminal coding, graduate-level reasoning, tool use, multilingual Q&A, visual reasoning, and high school math. Claude Opus 4 and Sonnet 4 lead on most measures, especially in software-related tasks.

Claude 4 Leads Across Key AI Benchmarks in Coding, Reasoning, and Multimodal Tasks. Image Source: Anthropic

Why Companies Are Backing Opus 4

Several companies have validated Claude Opus 4’s capabilities in real-world development environments:

Cursor called it a state-of-the-art tool for coding, especially in understanding and navigating complex codebases.
Replit reported that Opus 4 delivered more precise suggestions and dramatically better handling of multi-file edits.
Block praised it as the first model that improves code quality during editing and debugging, noting consistent reliability in their internal agent, codenamed “goose.”
Rakuten tested the model on a demanding open-source refactoring task that ran for 7 hours without interruption. Opus 4 maintained full performance throughout, marking a significant leap in long-duration AI performance.
Cognition emphasized that Opus 4 solved complex, high-stakes tasks where previous models had failed—successfully executing critical actions without needing manual correction.

Together, these endorsements highlight Opus 4’s ability to support longer, more complex software workflows with a level of accuracy and resilience not seen in earlier models.

Sonnet 4: Balance of Speed and Precision

Claude Sonnet 4 improves on version 3.7 with enhanced reasoning, instruction-following, and usability. It scores 72.7% on SWE-bench, offering top-tier performance for daily coding needs with faster response times and more control. Although it doesn't reach Opus 4’s performance across all areas, it offers a well-balanced combination of power and efficiency.

Why Sonnet 4 Is Gaining Industry Support

Claude Sonnet 4 is also seeing early adoption for its speed, precision, and adaptability across a range of development tasks:

GitHub plans to use Sonnet 4 in its upcoming version of Copilot, citing its strong performance in agentic scenarios where models must reason through tasks autonomously.
iGent found it excelled at autonomous app development, with substantial improvements in problem-solving and codebase navigation. Navigation errors dropped from 20% to nearly zero.
Sourcegraph described Sonnet 4 as a notable leap forward in software development, staying on track longer and offering deeper understanding of code problems with more elegant solutions.
Manus highlighted its improvements in handling complex instructions, delivering clearer reasoning and more polished outputs.
Augment Code noted higher success rates and more careful code edits, making it their preferred model for complex tasks that require surgical precision.

These endorsements suggest that while Sonnet 4 isn’t as powerful as Opus 4 in every domain, it offers a practical and efficient option for a wide range of coding use cases.

Better Memory and Safer Agent Behavior

Claude 4 models show a 65% reduction in shortcut-seeking behavior on tasks where agents might typically exploit loopholes. This leads to more reliable and transparent performance in agentic workflows.

Opus 4 also demonstrates superior long-term memory. When connected to local files, it can create detailed internal guides—such as a “Navigation Guide” used during a seven-hour autonomous task involving a complex game environment.

Thinking Summaries and Developer Mode

A new summarization feature condenses Claude’s longer thought processes into readable overviews, used in about 5% of cases. For developers needing full access to these chains of thought—for debugging or prompt engineering—Anthropic now offers a Developer Mode with raw output access.

Availability and Pricing

Both models are available now through:

Anthropic’s API
Amazon Bedrock
Google Cloud’s Vertex AI

Claude Sonnet 4 is included in free plans, while Claude Opus 4 is available with Pro, Max, Team, and Enterprise plans. Pricing remains consistent with earlier versions:

Opus 4: $15 (input) / $75 (output) per million tokens
Sonnet 4: $3 (input) / $15 (output) per million tokens

Built for Collaboration—with Safety at the Core

These models represent a significant step toward more capable virtual collaborators—able to maintain context, stay focused on long-running tasks, and contribute meaningfully to complex projects. To support safe deployment at this scale, Claude 4 models have undergone extensive testing and evaluation. They also include built-in safeguards aligned with higher AI Safety Levels, such as ASL-3, which help minimize risk and ensure more reliable, responsible behavior in real-world use.

What This Means

The launch of Claude 4 marks a shift toward more capable, context-aware AI systems that can handle longer, more complex workflows with greater reliability. These updates aren’t just about faster responses—they reflect a broader evolution in AI from reactive assistants to active collaborators.

With extended reasoning, memory, and tool use, the new models are built to take on work that once required sustained human focus—like debugging large codebases, responding to technical feedback, or autonomously developing new features. That opens new possibilities for teams looking to scale development, streamline operations, or prototype faster without sacrificing precision.

For companies, this means:

More integrated AI agents that can operate within existing tools and platforms
Fewer handoffs and interruptions in software workflows
Safer and more stable outputs, thanks to improvements in memory and reasoning safeguards

The release also signals how AI providers like Anthropic are addressing real concerns around long-term task performance and responsible deployment. By pairing raw model power with stricter safety testing and flexible usage controls, Claude 4 aims to support both innovation and trust.

As businesses explore how to build with AI rather than just around it, these models set a new bar for what collaboration with a virtual agent can look like—persistent, reliable, and increasingly aligned with human goals.

As the role of AI shifts from assistant to teammate, Claude 4 models offer a clearer view of what responsible, high-impact collaboration with machines can truly look like.

Editor’s Note: This article was created by Alicia Shapiro, CMO of AiNews.com, with writing, image, and idea-generation support from ChatGPT, an AI assistant. However, the final perspective and editorial choices are solely Alicia Shapiro’s. Special thanks to ChatGPT for assistance with research and editorial support in crafting this article.