Repository Triage Benchmark (5-Tier)
- Entity ID:
ent-20260419-c3d4e5f6a7b8 - Type:
dataset - Scope:
shared - Status:
active - Aliases: repo-triage, tier-5-control-flow-hijack-benchmark, 7000-entry-points
Description
Anthropic internal benchmark running one pass on approximately 7,000 entry points across production repositories, scored on a 5-tier ladder (tier 1 = crash, tier 5 = full control-flow hijack / arbitrary code execution). Used as the primary capability differentiator between Sonnet/Opus 4.6 and Mythos Preview. Mythos: 595 crashes at tiers 1 and 2, handful at tiers 3-4, and 10 full control-flow hijacks at tier 5 on fully patched production targets.
Key claims
- Mythos achieved 10 tier-5 control-flow hijacks on fully patched production targets
- Mythos tier-5 results not qualified by sandbox-removed caveat
Relations
- Repository Triage Benchmark (5-Tier) --[contains]--> Two-Version Output Efficiency Directive
- Mythos Firefox 147 Exploit Benchmark --[related_to]--> Repository Triage Benchmark (5-Tier)