Harness Underperformance Gap (77% vs 93%)
- Entity ID:
ent-20260419-g1a0000000a8 - Type:
concept - Scope:
shared - Status:
active - Aliases: Claude Code vs Cursor harness gap, 16-point harness gap
Description
Leaked benchmark data showing Opus 4.6 scores 77% on agent benchmarks when run through Claude Code's native harness vs 93% through Cursor's harness with the same model. The 16-percentage-point gap is attributable to tool reference expansion and stop sequence sampling bugs where Capybara samples a stop token at ~10% probability on encountering
Key claims
- Same Opus 4.6 scores 16pp lower under Claude Code harness vs Cursor harness
- Harness quality, not model capability, causes systematic Claude Code underperformance
Relations
- Harness Underperformance Gap (77% vs 93%) --[observes]--> Fennec
- Harness Underperformance Gap (77% vs 93%) --[supports]--> Release Velocity Paradox