Cache Economics: The Load-Bearing Infrastructure

Prompt caching is not an optimization in Claude Code — it is the cost model itself. The entire architecture (forked agents, background daemons, tool registration order, MCP partitioning) is designed around preserving cache prefix hits. Breaking cache alignment doesn't just waste money — it can multiply costs by 10-20x.

The Core Mechanism

Claude Code achieves 92% overall prompt cache prefix reuse. This is not accidental — it requires: 1. Byte-identical API request prefixes across all agent forks 2. Cache-aware tool registration (built-in tools first, MCP tools after, never interleaved) 3. CacheSafeParams objects that enforce cache key identity 4. Only four safely-overridable parameters (abortController, skipTranscript, skipCacheWrite, canUseTool)

Known Cache Bugs (as of v2.1.88)

Bug	Impact	Status
cch=00000 billing sentinel	Cache breaks for all subsequent requests when billing token echoes into conversation	Active
--resume cache tax	10-20x cost on first resumed turn — only system prompt cached	Active
Cross-terminal /effort cascade	Changing effort in one terminal nukes cache in all others via global settings.json	Active, --debug only detection
Autocompact death loop	1,279 sessions had 50+ consecutive failures, burning 250K API calls/day	Fixed in v2.1.88 (MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3)
PR #18143 effort:low on fork	Cache hit rate dropped 92.7% → 61%, cache writes spiked 45x	Reverted

Cache Partitioning Strategy

MCP tools are sorted alphabetically and appended after all built-in tools. The two groups never interleave. This preserves the expensive built-in prefix cache when MCP servers change. Enterprise Jira MCP alone adds ~17,000 tokens; five active servers add 55,000+ tokens before the first user message.

Tool documentation deduplication saves tokens by stripping repeated tool descriptions via a lightweight LLM side-query: "Has the user seen this tool's description in the last N turns?"

The 5-Minute Cliff

The prompt cache expires after 5 minutes of inactivity. This creates a natural tension: - SleepTool (KAIROS) must balance sleep duration against cache expiry - Background tasks must complete within the cache window or accept cold-start costs - Session resumption always breaks cache (different code path for message replay)

Why This Matters

At Claude Code's scale (4% of all public GitHub commits), even small cache efficiency changes have massive cost implications. The 250K wasted API calls/day from the autocompact death loop was documented for 3 weeks before shipping — because at $2.5B ARR, it was "a rounding error." But for individual users, a 10-20x cost spike from the /effort cascade or resume bug is a real bill.