On the evening of April 16, 2026, AI company Anthropic announced that its latest large model, Claude Opus 4.7, is officially online. The model is now available on all Claude products, the official API, and the three major cloud platforms: Amazon, Google, and Microsoft. Pricing remains consistent with the previous generation Opus 4.6: $5 per million input tokens and $25 per million output tokens.

Core Upgrades for Claude Opus 4.7

Opus 4.7 demonstrates enhanced performance in complex software engineering tasks. It can handle long-running tasks more stably and adhere more strictly to user instructions during execution. The model exhibits higher consistency in long-running tasks and self-verifies its own outputs before generating results.

In terms of multimodal capabilities, the model now supports images with a longest edge up to 2,576 pixels (approximately 3.75 megapixels), an increase of over three times compared to previous Claude models. Opus 4.7 performs comprehensively and stably across various benchmarks, ranking in the top tier overall, with strong capabilities in coding, reasoning, and multi-domain tasks.

Regarding memory, Opus 4.7 has improved the file-system-based memory mechanism, allowing it to retain key notes across sessions in long tasks. In third-party evaluations such as GDPval-AA and Finance Agent, Opus 4.7 achieved state-of-the-art scores.

New Features and Changes

Opus 4.7 introduces an xhigh (ultra-high) mode, positioned between high and max. Users can more finely weigh the trade-off between reasoning depth and response latency when tackling difficult problems. In Claude Code, the default tier for all plans has been upgraded to xhigh.

The API adds a "Task Budget" feature (in public beta), allowing developers to set an approximate budget for token consumption so the model knows where to spend more and where to save during long tasks. Claude Code adds the /ultrareview command, specifically for code review, which carefully reads through changes to identify bugs and design issues.

However, two changes in Opus 4.7 will affect token usage: First, it adopts an updated tokenizer, improving how the model processes text, at the cost of increased token count for the same input—roughly 1.0 to 1.35 times the original amount depending on content type. Second, the amount of thinking increases under higher thinking intensity tiers in Opus 4.7, especially in subsequent turns of agent-like scenarios.

Performance Benchmark Comparison

According to benchmark data, Opus 4.7 scored 64.3% in the SWE-bench Pro programming test, jumping from 53.4% in version 4.6, a single-generation increase of nearly 11 percentage points, surpassing GPT-5.4's 57.7% and Gemini 3.1 Pro's 54.2%. In visual reasoning, the CharXiv benchmark rose from 69.1% to 82.1%, benefiting from the newly supported 2576-pixel longest-edge recognition capability. On the tool calling scale evaluation MCP-Atlas, Opus 4.7 reached 77.3%, exceeding GPT-5.4's 68.1% and Gemini's 73.9%.

However, on the Agentic search evaluation BrowseComp, Opus 4.7's score dropped from 83.7% to 79.3%, surpassed by GPT-5.4's 89.3% and Gemini's 85.9%.

Detailed Comparison of Four Models

FeatureClaude Opus 4.7GPT-5.4Claude Opus 4.6Gemini 3.1 Pro
Release DateApril 16, 2026March 6, 2026February 5, 2026February 19, 2026
DeveloperAnthropicOpenAIAnthropicGoogle
Core FeaturesEnhanced complex software engineering, higher resolution image support, self-verifying outputNative computer usage capability, thought process preview, 1M token context1M token context window, adaptive thinking, agent task persistenceThree-layer thinking mode, 2M token context, strengthened core reasoning
Coding AbilitySWE-bench Pro: 64.3%SWE-bench Pro: 57.7%SWE-bench Pro: 53.4%SWE-bench Pro: 54.2%
Multimodal CapabilitySupports 2576 pixel image processing (~3.75MP)Improved visual perception and document parsingStandard image processing capabilityPowerful multimodal understanding capability
Context Window200K tokens / 1M tokens (beta)Up to 1M tokens200K tokens / 1M tokens (beta)Up to 2M tokens
PricingInput $5/MTok, Output $25/MTokNot specified (usually billed by usage)Input $5/MTok, Output $25/MTokTiered pricing, same as previous generation
Special Featuresxhigh mode, Task Budget, /ultrareview commandThought process preview, native computer control, tool searchAdaptive thinking, compressed API, 128K output tokensThree-layer thinking mode, Deep Think technology downgraded
Security FeaturesProject Glasswing cybersecurity protectionContinues existing security protections and introduces new open-source evaluationsOverall security is goodHallucination control AA-Omniscience Index reaches 30

User Feedback and Industry Impact

User reviews for Opus 4.7 are somewhat polarized. Most users acknowledge the improvement in coding ability, but have many complaints regarding copywriting and conversational communication. Some users stated that while the official announcement touted visual improvements, token consumption increased significantly; testing the same design draft showed Opus 4.7's input tokens spiked to over 3 times that of Opus 4.6.

In long-context retrieval, Opus 4.6 scored 78.3%, while Opus 4.7 dropped directly to 32.2%. Anthropic explained that the new model reports errors directly when information is missing, rather than hallucinating as before. Actual user tests show that even when information is clearly within the context, it can still miss it.

Conclusion

Claude Opus 4.7 represents a significant advancement for Anthropic in complex software engineering and multimodal processing, particularly surpassing major competitors in programming benchmarks. However, the increase in token consumption and performance decline in certain areas (such as long-context retrieval) indicate that this was not a painless upgrade. For users in hardcore coding scenarios, Opus 4.7 offers significant value; however, for broader application scenarios, users may need to weigh costs against benefits.

With the successive releases of GPT-5.4, Gemini 3.1, and Claude Opus 4.7, the 2026 large model competition has entered a heated stage, with manufacturers seeking the best balance between specialized capabilities, cost control, and user experience.