Anthropic released Claude Opus 4.8 yesterday morning at the same price as Opus 4.7 - $5 per million input tokens and $25 per million output - alongside a re-engineered fast mode that runs at roughly 2.5x the steady-state token rate for three times less than previous fast-mode pricing. The headline benchmark deltas, lined up against Opus 4.7 and the contemporaneous GPT-5.5 release, put Opus 4.8 at the top of the Artificial Analysis Intelligence Index v4.0 at 61.4 (Opus 4.7 was 57.3, GPT-5.5 xhigh 60.2), and at 84% on Online-Mind2Web for browser-agent end-to-end task completion - Anthropic's framing calls this a meaningful jump over both 4.7 and GPT-5.5 and the largest single-release move on that benchmark in the post-CUA era. Anthropic's reported evaluations also claim Opus 4.8 is roughly four times less likely than Opus 4.7 to let a flaw in code it has written pass unremarked, a hallucination-suppression signal corroborated by AA-Omniscience: Opus 4.8 is third on the index at +27, behind Gemini 3.1 Pro Preview (+33) and ahead of Opus 4.7 (+26).
The headline architectural / capability shift ships with two product-level companions. Dynamic workflows, in research preview for Claude Code on Enterprise, Team, and Max plans, lets Claude plan a task, fork hundreds of parallel subagents inside a single Claude Code session, then verify outputs against the project's existing test suite before reporting back - the model card calls out codebase-scale migrations across hundreds of thousands of lines from kickoff to merge as a worked example. Effort control, available on every plan, lets the user pick how hard Claude thinks per turn. The Messages API now accepts system entries inside the messages array, so an agent harness can update permissions, token budgets, or environment context mid-task without breaking the prompt cache or routing through a synthetic user turn - a quiet but pragmatic concession to long-running agents.
Beyond the in-product changes, the alignment readout is the more interesting paragraph. Anthropic's alignment team frames Opus 4.8 as reaching new highs on prosocial traits (supporting user autonomy, acting in the user's best interest) and reports rates of misaligned behavior - deception, cooperation with misuse - substantially below Opus 4.7 and roughly at parity with Claude Mythos Preview, the unreleased cybersecurity-tier model currently in restricted use through Project Glasswing. Anthropic states it expects to bring Mythos-class models to all customers in the coming weeks once stronger cyber safeguards are in place - that is the first concrete public signal about the next intelligence tier above Opus.
Reactions were measured. Simon Willison's read, published within hours, highlighted Anthropic's deliberately understated framing - the release notes call Opus 4.8 'a modest but tangible improvement' - and pulled the system card line that Opus 4.8 achieved the lowest hallucination rate of the six models tested mainly by abstaining on questions about which it was uncertain. Cursor (Michael Truell) reports Opus 4.8 exceeds prior Opus models at every effort level on CursorBench with meaningfully more efficient tool calling. Cognition CEO Scott Wu, quoted in the Anthropic release, says Opus 4.8 fixes the comment-verbosity and tool-calling issues that surfaced in Opus 4.7 and translates directly into faster capability gains for engineers building on Devin. Databricks' Hanlin Tang reports a 61% reduction in token cost on multimodal reasoning over PDFs and diagrams in Genie, suggesting the price-per-quality math is meaningfully better despite list price being unchanged.
- Simon Willison: refreshing to see an AI lab honestly describe a release as a minor incremental improvement. Highlights system card: lowest incorrect rate achieved by abstaining on uncertainty, not by answering more.
- Artificial Analysis: Opus 4.8 (max) reaches 61.4 on the Intelligence Index, ahead of GPT-5.5 xhigh at 60.2; tops GDPval-AA at 1890 Elo; still underperforms on IFBench at 62%.
- Latent Space AINews: positions release as part of Anthropic's flippening narrative against OpenAI alongside the $65B Series H; flags compute parity (5GW Amazon, 5GW Google/Broadcom, SpaceX Colossus) as the structural change.
- TechCrunch: frames as the final private fundraise before a highly anticipated IPO, with Opus 4.8 as product anchor for the IPO story.