Cache Economics

The principle that every design decision in claude-code bends toward prompt cache preservation. The cache is not an optimization — it is the cost model. Breaking it multiplies costs by 5-10x.

Why Cache Is Load-Bearing

Claude Code's prompt cache stores the static portion of the system prompt (see system-prompt-assembly). When the cache hits, the API reuses pre-computed key-value pairs for the cached prefix, charging only for the new tokens. When it breaks, the entire prompt must be reprocessed from scratch.

The forked-agent-pattern achieves 92% prefix reuse across concurrent forks by keeping byte-identical API request prefixes. Spawning 5 parallel agents costs nearly the same as spawning 1. Breaking cache alignment would make multi-agent operations 5x more expensive.

What Bends Toward Cache

Design Decision	Cache Motivation
Plan Mode tool retention	Keep tool list identical between turns
Fork child inheritance	Byte-identical prefixes for cache sharing
`DANGEROUS_` naming	Force developers to justify cache breaks
system-reminders in user messages	Avoid invalidating system prompt cache
`assembleToolPool` alphabetical sort	Stable tool ordering across changes
`SYSTEM_PROMPT_DYNAMIC_BOUNDARY`	Explicit split between cached/uncached
`promptCacheBreakDetection.ts`	Hash all request components to detect breaks

Cache Break Detection

promptCacheBreakDetection.ts hashes all API request components between turns. When a cache break is detected, it logs field-level diffs identifying which component changed. This monitoring drives architectural changes — the source comment frequency for "per BQ [date]" implies BigQuery data drives every decision.

The --resume Cache Tax

The --resume path creates a triple cache-prefix discrepancy since v2.1.69, causing 10-20x token consumption per request on resumed sessions. The resume path serializes the conversation differently from live sessions, breaking cache alignment.

The `omitClaudeMd` Optimization

Read-only agents (Explore, Plan) are spawned with omitClaudeMd: true. Anthropic's fleet data estimates this saves 5-15 GTok/week — avoiding the cost of loading CLAUDE.md three separate times for agents that only read code.

Key Claims

clm-20260409-d38a90870cce: Every design decision bends toward prompt cache preservation
clm-20260409-7defb956a391: Resumed sessions incur 10-20x cost from triple cache miss

Sources

src-20260409-a14e9e98c3cd — Internals
src-20260409-cbf9b6837f5f — Round 10: Quality Gap