query.ts

The single most important file in the claude-code codebase — a 1,729-line while(true) async generator that governs every turn of the agent. It is the core agentic loop.

The Async Generator Design

export async function* query(params: QueryParams): AsyncGenerator<
  StreamEvent | RequestStartEvent | Message | TombstoneMessage | ToolUseSummaryMessage,
  Terminal
>

The generator pattern has four non-obvious consequences: 1. Streaming is native — tokens flow through yield, not callbacks 2. Interruption is clean — a single AbortController cancels the whole generator 3. Budget control is trivial — check maxBudget at each iteration boundary 4. Tool calls are tail-recursive — tool_use -> tool_result -> continue is just another iteration

The Continue Site Pattern

The loop separates immutable QueryParams (never change mid-loop) from mutable State (updated every iteration). State updates use atomic reassignment:

state = {
  ...state,
  messages: newMessages,
  turnCount: nextTurnCount,
  transition: { reason: 'next_turn' }
}

A partial state update mid-iteration would mean corrupted conversation history and wasted API calls. The atomic reassignment eliminates that failure mode entirely.

6-Stage Per-Turn Pipeline

Stage	Lines	Function
1. Pre-Request Compaction	365-548	Five compaction-pipeline tiers fire before any API call
2. API Call & Streaming	659-863	streaming-tool-executor begins parallel tool execution during generation
3. Error Recovery Cascade	1062-1256	3-stage recovery for 413 errors and output-token exhaustion
4. Stop Hooks & Token Budget	1267-1355	User hooks-system run; diminishing returns detection
5. Tool Execution	1363-1520	Streaming and batch tool results merged
6. Post-Tool & Transition	1547-1727	Skills/memory harvested; MCP tools refreshed; state update

9 Typed Exit Paths

The Terminal return type is a typed enum with 9 distinct exit reasons, each mapped to a specific line range:

Exit Reason	Line(s)	When
`completed`	1264/1357	Model finished normally
`blocking_limit`	646	Exceeded blocking budget
`aborted_streaming`	1051	User interrupted during streaming
`aborted_tools`	1515	User interrupted during tool execution
`prompt_too_long`	1175/1182	Context too large after all recovery
`image_error`	977/1175	Image processing failed
`model_error`	996	API error after retries
`hook_stopped`	1520	User hook blocked continuation
`max_turns`	1711	Turn budget exhausted

Diminishing Returns Detection

const isDiminishing = (
  continuationCount >= 3 &&
  deltaSinceLastCheck < 500 &&
  lastDeltaTokens < 500
)

After 3 consecutive continuations producing fewer than 500 tokens each, the loop stops. This prevents the most dangerous pattern in agentic loops: the model repeatedly saying "let me try one more fix" while burning tokens and producing nothing.

Error Recovery Cascade

Built on a single principle: start with the cheapest option, escalate to expensive only as a last resort.

Prompt-too-long (413) errors: 1. Context Collapse drain — cost: 0; flushes pre-computed reductions 2. Reactive Compact — cost: 1 API call; summarizes entire history 3. Surface the error — user sees error only if both fail

Max-output-token errors: 1. Silent cap escalation — cost: 0; bumps 8K to 64K (ESCALATED_MAX_TOKENS) 2. Resume injection — cost: 1 API call; "Your previous response was truncated" (up to 3 attempts) 3. Recovery exhaustion — completes with whatever is available

Supervised by QueryEngine.ts

queryengine-ts (1,295 lines) manages the session above query.ts. Its most interesting design: asymmetric transcript persistence — user messages are saved with await (blocking, needed for --resume), assistant messages are fire-and-forget (non-critical, reduces latency).

Key Claims

clm-20260409-c0bf3fcef119: 6-stage per-turn pipeline
clm-20260409-98b860325f16: 9 typed exit paths
clm-20260409-cb0d1bd0e4db: Parallel read-only tool execution during generation

Sources

src-20260409-6913a0b93c8b — Round 7: The Deepest Architecture Yet