Repository Triage Benchmark (5-Tier)

Description

Anthropic internal benchmark running one pass on approximately 7,000 entry points across production repositories, scored on a 5-tier ladder (tier 1 = crash, tier 5 = full control-flow hijack / arbitrary code execution). Used as the primary capability differentiator between Sonnet/Opus 4.6 and Mythos Preview. Mythos: 595 crashes at tiers 1 and 2, handful at tiers 3-4, and 10 full control-flow hijacks at tier 5 on fully patched production targets.

Key claims

Relations

Sources

src-20260409-a5ff5a259c43