Claude Mythos Preview

Description

Claude Mythos Preview is an unreleased frontier model announced on April 7, 2026 through Project Glasswing, Anthropic's first public system card for a model that was intentionally withheld from release due to its capabilities. Mythos represents a qualitative leap in both coding ability and offensive security capabilities, achieving benchmark results that substantially exceed all prior Claude models while simultaneously demonstrating unprecedented alignment scores.

Benchmark Performance

Benchmark Mythos Score Previous Best (Opus 4) Notes
SWE-bench Verified 93.9% ~72% Near-human software engineering performance
Firefox 147 exploits 181 working exploits 2 (Opus) Full browser exploitation capability
Control flow hijack 10/10 patched targets 0/10 (Opus) Complete bypass of patched vulnerabilities
Alignment evaluation Best by significant margin -- Described as most aligned model Anthropic has produced

Offensive Security Capabilities

The most striking aspect of Mythos is its offensive security performance. In controlled evaluations, Mythos generated 181 working exploits against Firefox 147 -- a fully patched, current-release browser -- compared to just 2 from Claude Opus 4. It achieved full control flow hijack on all 10 patched target programs in the evaluation suite, meaning it could take complete control of program execution despite the presence of security patches.

Beyond the benchmark suite, Mythos independently discovered a 23-year-old Linux kernel vulnerability that had evaded detection by the entire security research community. It also found and produced a working proof-of-concept for a Vim CVE in under 2 minutes, demonstrating the ability to find real-world vulnerabilities at superhuman speed.

Why It Was Withheld

Anthropic chose not to release Mythos, making it the first model for which they published a system card explaining the decision to withhold. The reasoning centered on the offensive security capabilities: while the model's coding and alignment improvements are desirable, the ability to generate hundreds of browser exploits and discover kernel vulnerabilities represents a capability that could cause serious harm if widely available. The Project Glasswing system card frames this as a case study in responsible scaling, where capability improvements cross a threshold requiring new deployment safeguards.

Implications for Claude Code

If Mythos were deployed as the backing model for claude-code, its 93.9% SWE-bench score would dramatically improve autonomous coding accuracy. The yolo-classifier and permission-pipeline would need significant hardening to account for a model capable of discovering and potentially exploiting vulnerabilities in the code it is asked to work on. The three-layer-verification system's threat model would need to expand to include a model that understands offensive security at an expert level.

Key claims

Relations

Sources

src-20260409-09a1b2325b23