Cache Economics: The Load-Bearing Infrastructure
Prompt caching is not an optimization in Claude Code — it is the cost model itself. The entire architecture (forked agents, background daemons, tool registration order, MCP partitioning) is designed around preserving cache prefix hits. Breaking cache alignment doesn't just waste money — it can multiply costs by 10-20x.
The Core Mechanism
Claude Code achieves 92% overall prompt cache prefix reuse. This is not accidental — it requires: 1. Byte-identical API request prefixes across all agent forks 2. Cache-aware tool registration (built-in tools first, MCP tools after, never interleaved) 3. CacheSafeParams objects that enforce cache key identity 4. Only four safely-overridable parameters (abortController, skipTranscript, skipCacheWrite, canUseTool)
Known Cache Bugs (as of v2.1.88)
| Bug | Impact | Status |
|---|---|---|
| cch=00000 billing sentinel | Cache breaks for all subsequent requests when billing token echoes into conversation | Active |
| --resume cache tax | 10-20x cost on first resumed turn — only system prompt cached | Active |
| Cross-terminal /effort cascade | Changing effort in one terminal nukes cache in all others via global settings.json | Active, --debug only detection |
| Autocompact death loop | 1,279 sessions had 50+ consecutive failures, burning 250K API calls/day | Fixed in v2.1.88 (MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3) |
| PR #18143 effort:low on fork | Cache hit rate dropped 92.7% → 61%, cache writes spiked 45x | Reverted |
Cache Partitioning Strategy
MCP tools are sorted alphabetically and appended after all built-in tools. The two groups never interleave. This preserves the expensive built-in prefix cache when MCP servers change. Enterprise Jira MCP alone adds ~17,000 tokens; five active servers add 55,000+ tokens before the first user message.
Tool documentation deduplication saves tokens by stripping repeated tool descriptions via a lightweight LLM side-query: "Has the user seen this tool's description in the last N turns?"
The 5-Minute Cliff
The prompt cache expires after 5 minutes of inactivity. This creates a natural tension: - SleepTool (KAIROS) must balance sleep duration against cache expiry - Background tasks must complete within the cache window or accept cold-start costs - Session resumption always breaks cache (different code path for message replay)
Why This Matters
At Claude Code's scale (4% of all public GitHub commits), even small cache efficiency changes have massive cost implications. The 250K wasted API calls/day from the autocompact death loop was documented for 3 weeks before shipping — because at $2.5B ARR, it was "a rounding error." But for individual users, a 10-20x cost spike from the /effort cascade or resume bug is a real bill.
Related Entities
cache-economics— the entity pageforked-agent-pattern— cache-sharing primitivehandleStopHooks— background scheduling within cache windowmcp-integration— cache-aware tool registrationbilling-sentinel-bug— cch=00000 cache breakresume-cache-tax— --resume cost spikeeffort-cascade— cross-terminal cache invalidation