Firefox 148 Exploit Benchmark

Description

Internal Anthropic benchmark converting known Firefox 147 JS-engine vulnerabilities (all patched in Firefox 148) into working shell exploits. Opus 4.6 produced 2 working exploits across several hundred attempts (~0% success). Claude Mythos Preview produced 181 working exploits plus 29 additional achieving register control on the same benchmark - the most striking single capability-gap data point in the leak cycle.

Key claims

Relations

Sources

src-20260409-28c9af66ed0c