Cache Economics
The principle that every design decision in claude-code bends toward prompt cache preservation. The cache is not an optimization — it is the cost model. Breaking it multiplies costs by 5-10x.
Why Cache Is Load-Bearing
Claude Code's prompt cache stores the static portion of the system prompt (see system-prompt-assembly). When the cache hits, the API reuses pre-computed key-value pairs for the cached prefix, charging only for the new tokens. When it breaks, the entire prompt must be reprocessed from scratch.
The forked-agent-pattern achieves 92% prefix reuse across concurrent forks by keeping byte-identical API request prefixes. Spawning 5 parallel agents costs nearly the same as spawning 1. Breaking cache alignment would make multi-agent operations 5x more expensive.
What Bends Toward Cache
| Design Decision | Cache Motivation |
|---|---|
| Plan Mode tool retention | Keep tool list identical between turns |
| Fork child inheritance | Byte-identical prefixes for cache sharing |
DANGEROUS_ naming |
Force developers to justify cache breaks |
| system-reminders in user messages | Avoid invalidating system prompt cache |
assembleToolPool alphabetical sort |
Stable tool ordering across changes |
SYSTEM_PROMPT_DYNAMIC_BOUNDARY |
Explicit split between cached/uncached |
promptCacheBreakDetection.ts |
Hash all request components to detect breaks |
Cache Break Detection
promptCacheBreakDetection.ts hashes all API request components between turns. When a cache break is detected, it logs field-level diffs identifying which component changed. This monitoring drives architectural changes — the source comment frequency for "per BQ [date]" implies BigQuery data drives every decision.
The --resume Cache Tax
The --resume path creates a triple cache-prefix discrepancy since v2.1.69, causing 10-20x token consumption per request on resumed sessions. The resume path serializes the conversation differently from live sessions, breaking cache alignment.
The omitClaudeMd Optimization
Read-only agents (Explore, Plan) are spawned with omitClaudeMd: true. Anthropic's fleet data estimates this saves 5-15 GTok/week — avoiding the cost of loading CLAUDE.md three separate times for agents that only read code.
Key Claims
clm-20260409-d38a90870cce: Every design decision bends toward prompt cache preservationclm-20260409-7defb956a391: Resumed sessions incur 10-20x cost from triple cache miss
Sources
src-20260409-a14e9e98c3cd— Internalssrc-20260409-cbf9b6837f5f— Round 10: Quality Gap