Tokenizer-Effort-Cache Cost-Multiplier Model

Description

Analytical framing (Finout, Vellum) treating effective Opus 4.7 cost as a product of three multipliers: tokenizer overhead (1.0-1.35x), effort allocation (xhigh ~2x thinking tokens vs high), and cache behavior (friendly vs hostile). Heavy auto-mode xhigh with poor caching compounds to 2-3x the cost of 4.6 high with stable prompts. Operationalizes the leak's insights into SYSTEM_PROMPT_DYNAMIC_BOUNDARY and compression stages as cost-control levers.

Key claims

Relations

Sources

src-20260423-22e662f6932a