Harness Underperformance Gap (77% vs 93%)

Entity ID: ent-20260419-g1a0000000a8
Type: concept
Scope: shared
Status: active
Aliases: Claude Code vs Cursor harness gap, 16-point harness gap

Description

Leaked benchmark data showing Opus 4.6 scores 77% on agent benchmarks when run through Claude Code's native harness vs 93% through Cursor's harness with the same model. The 16-percentage-point gap is attributable to tool reference expansion and stop sequence sampling bugs where Capybara samples a stop token at ~10% probability on encountering ... tags at the prompt tail, terminating responses early. Cursor's harness does not trigger this pattern.

Key claims

Same Opus 4.6 scores 16pp lower under Claude Code harness vs Cursor harness
Harness quality, not model capability, causes systematic Claude Code underperformance

Relations

Harness Underperformance Gap (77% vs 93%) --[observes]--> Fennec
Harness Underperformance Gap (77% vs 93%) --[supports]--> Release Velocity Paradox

Sources

src-20260409-28c9af66ed0c