Cache Economics

The principle that every design decision in claude-code bends toward prompt cache preservation. The cache is not an optimization — it is the cost model. Breaking it multiplies costs by 5-10x.

Why Cache Is Load-Bearing

Claude Code's prompt cache stores the static portion of the system prompt (see system-prompt-assembly). When the cache hits, the API reuses pre-computed key-value pairs for the cached prefix, charging only for the new tokens. When it breaks, the entire prompt must be reprocessed from scratch.

The forked-agent-pattern achieves 92% prefix reuse across concurrent forks by keeping byte-identical API request prefixes. Spawning 5 parallel agents costs nearly the same as spawning 1. Breaking cache alignment would make multi-agent operations 5x more expensive.

What Bends Toward Cache

Design Decision Cache Motivation
Plan Mode tool retention Keep tool list identical between turns
Fork child inheritance Byte-identical prefixes for cache sharing
DANGEROUS_ naming Force developers to justify cache breaks
system-reminders in user messages Avoid invalidating system prompt cache
assembleToolPool alphabetical sort Stable tool ordering across changes
SYSTEM_PROMPT_DYNAMIC_BOUNDARY Explicit split between cached/uncached
promptCacheBreakDetection.ts Hash all request components to detect breaks

Cache Break Detection

promptCacheBreakDetection.ts hashes all API request components between turns. When a cache break is detected, it logs field-level diffs identifying which component changed. This monitoring drives architectural changes — the source comment frequency for "per BQ [date]" implies BigQuery data drives every decision.

The --resume Cache Tax

The --resume path creates a triple cache-prefix discrepancy since v2.1.69, causing 10-20x token consumption per request on resumed sessions. The resume path serializes the conversation differently from live sessions, breaking cache alignment.

The omitClaudeMd Optimization

Read-only agents (Explore, Plan) are spawned with omitClaudeMd: true. Anthropic's fleet data estimates this saves 5-15 GTok/week — avoiding the cost of loading CLAUDE.md three separate times for agents that only read code.

Key Claims

Sources