Cost and Token Economics

How Claude Code tracks, estimates, and optimizes token usage and cost. The system operates on three levels: real-time cost tracking per session, token estimation for threshold decisions, and architectural optimization for prompt cache economics.

Cost tracking

The cost tracker (src/cost-tracker.ts) accumulates per-model usage after every API response:

Pricing tiers

Tier	Models	Input/Mtok	Output/Mtok	Cache Write	Cache Read
Sonnet	Claude Sonnet 4.6	$3	$15	$3.75	$0.30
Opus standard	Opus 4.5/4.6	$5	$25	$6.25	$0.50
Opus legacy	Opus 4/4.1	$15	$75	$18.75	$1.50
Opus fast	Opus 4.6 fast mode	$30	$150	$37.50	$3.00
Haiku 3.5	Haiku 3.5	$0.80	$4	$1.00	$0.08
Haiku 4.5	Haiku 4.5	$1	$5	$1.25	$0.10

Web search: $0.01 per request across all tiers. Unknown models default to Opus standard pricing.

What's tracked

Per model: inputTokens, outputTokens, cacheReadInputTokens, cacheCreationInputTokens, webSearchRequests, costUSD, contextWindow, maxOutputTokens.

Session costs persist to project config and restore on --resume, so cost display is accurate across session resumption.

Token estimation

The token estimation service (src/services/tokenEstimation.ts) provides three accuracy levels:

Method	Speed	Accuracy	Cost
API counting (`countTokensWithAPI`)	Slow	Exact	API call
Haiku fallback	Medium	Good	Cheap API call
Rough estimation (4 bytes/token)	Instant	~2x variance	Free

The rough estimation is file-type-aware: JSON uses 2 bytes/token (higher token density); everything else uses 4. Images and documents are a fixed 2000 tokens.

These estimates drive threshold decisions throughout the system — auto-compact triggers, context window budgets, tool result persistence.

Prompt cache economics

Prompt caching is not an optimization — it's the cost model. Claude Code achieves 92% overall prompt cache prefix reuse. The architecture enforces this:

Cache-preserving patterns

Tool registration order — built-in tools sorted alphabetically as a contiguous prefix, MCP tools as a sorted suffix. Adding/removing an MCP server invalidates only the suffix.
Fork agent cache sharing — sub-agents inherit the parent's exact system prompt and tool array, producing byte-identical API request prefixes. The child gets cache hits from the parent's cached prefix.
Micro-compaction with cache edits — removes old tool results via the API's cache_edits mechanism without invalidating the cached prefix.
One-shot agent optimization — Explore and Plan agents skip agentId/SendMessage/usage trailer, saving ~135 chars × 34M runs/week.

Cache-breaking risks

Risk	Impact	Mitigation
`--resume`	10-20x cost on first turn (only system prompt cached)	Known limitation
`/effort` cascade	Changes effort in one terminal nukes cache in all others	--debug detection only
MCP server changes	Invalidates MCP suffix cache	Partition keeps built-in prefix intact
Auto-compact	Rewrites message history, breaks cache	Forked agent reuses parent cache for the summarization itself
5-minute TTL	Cache expires after inactivity	SleepTool must balance sleep vs cache window

Scale impact

At Claude Code's scale (4% of all public GitHub commits), even small cache efficiency changes have massive cost implications. The autocompact death loop wasted ~250K API calls/day globally. A PR that dropped cache hit rate from 92.7% to 61% was immediately reverted.

Tool result storage economics

Large tool outputs are persisted to disk rather than kept in conversation context: - maxResultSizeChars = 30,000 — threshold for disk persistence - MAX_PERSISTED_SIZE = 64MB — absolute limit - Images resized to max 20MB

This prevents memory bloat during long sessions with hundreds of file reads, and keeps the conversation context budget for actual reasoning.

Context window management

The auto-compact system manages context budget through three tiers:

Microcompaction — continuous trimming of old tool results (zero API cost)
Session memory compaction — uses extracted session memory as summary (zero API cost)
Full compaction — LLM-generated structured summary (API cost, but uses cache-sharing fork)

Threshold: effectiveContextWindow - 13K buffer. For a 200K-context model: triggers at ~167K tokens.

cost-tracker — per-session cost accumulation and display
token-estimation-service — three-tier token counting
cache-economics — the deep dive on cache preservation
auto-compact — context window management
tool-system — cache-aware tool registration
forked-agent-pattern — cache-sharing fork model