Claude Opus 4.8 Released: Leads 6 of 7 Benchmarks, Same Pricing

Anthropic released Claude Opus 4.8 on May 28. Compared to the previous generation Opus 4.7, this upgrade is relatively modest, but it shows improvements across multiple core benchmarks. The pricing hasn't changed, and it comes with several new features.

Benchmark Comparison: Won 6, Lost 1

Opus 4.8 competed against its predecessor Opus 4.7, OpenAI's GPT-5.5, and Google's Gemini 3.1 Pro. Here are the results:

Benchmark	Opus 4.8	Opus 4.7	GPT-5.5	Gemini 3.1 Pro
SWE-Bench Pro (Agentic coding)	69.2%	64.3%	58.6%	54.2%
Terminal-Bench 2.1 (Terminal coding)	74.6%	66.1%	78.2%	70.3%
HLE without tools (Multidisciplinary reasoning)	49.8%	46.9%	41.4%	44.4%
HLE with tools (Multidisciplinary reasoning)	57.9%	54.7%	52.2%	51.4%
OSWorld-Verified (Computer use)	83.4%	82.8%	78.7%	76.2%
GDPVal-AA (Knowledge work)	1890	1753	1769	1314
Finance Agent v2 (Financial analysis)	53.9%	51.5%	51.8%	43.0%

On SWE-Bench Pro, Opus 4.8 scored 69.2%, 4.9 percentage points higher than Opus 4.7 and more than 10 points ahead of GPT-5.5. This was the biggest gap across all benchmarks.

The only stumble was Terminal-Bench 2.1. GPT-5.5 won with 78.2% versus Opus 4.8's 74.6%, a gap of 3.6 percentage points. If you primarily use AI for terminal coding, GPT-5.5 might still be the better choice.

Humanity's Last Exam (HLE) has two scenarios: without tools and with tools. Opus 4.8 scored highest in both, but the 49.8% score without tools itself shows how difficult these tests are — even the strongest model only got half the answers right.

GDPVal-AA (knowledge work) showed the largest gap: Opus 4.8 scored 1890, while Gemini 3.1 Pro only managed 1314, a 44% difference.

Pricing and Modes

The regular mode pricing is identical to Opus 4.7: $5 per million input tokens and $25 per million output tokens. At today's exchange rate (1 USD ≈ 6.79 CNY), that's roughly ¥34 per million input tokens and ¥170 per million output tokens.

The new fast mode runs 2.5x faster and costs $10 input and $50 output per million tokens. Anthropic says the fast mode is three times cheaper than before.

There's also a new effort control feature. Users can choose between "high" (default), "extra," and "max." On coding tasks, the high effort mode uses a similar number of tokens as Opus 4.7's default but delivers better results. For maximum performance, extra or max modes are available at the cost of more tokens.

New Feature: Dynamic Workflows

Alongside Opus 4.8, Claude Code is getting a "Dynamic Workflows" feature. Claude can launch hundreds of parallel sub-agents in a single session to handle large-scale tasks — such as migrating across hundreds of thousands of lines of code, then running tests to verify the results. Currently available only for Enterprise, Team, and Max plans.

Honesty Improvements

Anthropic specifically highlighted Opus 4.8's improvements in "honesty." Their evaluations show that Opus 4.8 is roughly four times less likely than its predecessor to let flaws in its own code go unremarked. In other words, it's more willing to say "there's a problem here" rather than pretending everything is fine.

This improvement may be more valuable in practice than benchmark scores. HN testers noted that Opus 4.8 is better at maintaining context and style consistency across long sessions, particularly useful in scenarios where "voice, taste, and technical execution all have to happen side-by-side."

Community Reaction

The HN post received 1,165 points and 144 comments. Reactions were mixed: some called it a "minor upgrade," others mocked Anthropic for "describing their own models as if they're discovering new species in the wild."

One particularly interesting comment compared the Opus series to iPhone updates — "every year since 2018 they say thinnest and fastest, but it's mostly the same and everyone buys it anyway." Others pointed out that every company cherry-picks benchmarks where they win when releasing models, calling it a "benchmark arms race."

To Anthropic's credit, they described this release as a "modest but tangible improvement" in their blog post, which is a refreshingly honest take.

What's Next?

Anthropic mentioned two things: Claude Mythos is coming soon, positioned above Opus; and Project Glasswing is developing a new class of models with intelligence beyond Opus.

For most users, Opus 4.8 is available right now, the price hasn't gone up, and performance has improved. It's a no-brainer upgrade.