query.ts

The single most important file in the claude-code codebase — a 1,729-line while(true) async generator that governs every turn of the agent. It is the core agentic loop.

The Async Generator Design

export async function* query(params: QueryParams): AsyncGenerator<
  StreamEvent | RequestStartEvent | Message | TombstoneMessage | ToolUseSummaryMessage,
  Terminal
>

The generator pattern has four non-obvious consequences: 1. Streaming is native — tokens flow through yield, not callbacks 2. Interruption is clean — a single AbortController cancels the whole generator 3. Budget control is trivial — check maxBudget at each iteration boundary 4. Tool calls are tail-recursivetool_use -> tool_result -> continue is just another iteration

The Continue Site Pattern

The loop separates immutable QueryParams (never change mid-loop) from mutable State (updated every iteration). State updates use atomic reassignment:

state = {
  ...state,
  messages: newMessages,
  turnCount: nextTurnCount,
  transition: { reason: 'next_turn' }
}

A partial state update mid-iteration would mean corrupted conversation history and wasted API calls. The atomic reassignment eliminates that failure mode entirely.

6-Stage Per-Turn Pipeline

Stage Lines Function
1. Pre-Request Compaction 365-548 Five compaction-pipeline tiers fire before any API call
2. API Call & Streaming 659-863 streaming-tool-executor begins parallel tool execution during generation
3. Error Recovery Cascade 1062-1256 3-stage recovery for 413 errors and output-token exhaustion
4. Stop Hooks & Token Budget 1267-1355 User hooks-system run; diminishing returns detection
5. Tool Execution 1363-1520 Streaming and batch tool results merged
6. Post-Tool & Transition 1547-1727 Skills/memory harvested; MCP tools refreshed; state update

9 Typed Exit Paths

The Terminal return type is a typed enum with 9 distinct exit reasons, each mapped to a specific line range:

Exit Reason Line(s) When
completed 1264/1357 Model finished normally
blocking_limit 646 Exceeded blocking budget
aborted_streaming 1051 User interrupted during streaming
aborted_tools 1515 User interrupted during tool execution
prompt_too_long 1175/1182 Context too large after all recovery
image_error 977/1175 Image processing failed
model_error 996 API error after retries
hook_stopped 1520 User hook blocked continuation
max_turns 1711 Turn budget exhausted

Diminishing Returns Detection

const isDiminishing = (
  continuationCount >= 3 &&
  deltaSinceLastCheck < 500 &&
  lastDeltaTokens < 500
)

After 3 consecutive continuations producing fewer than 500 tokens each, the loop stops. This prevents the most dangerous pattern in agentic loops: the model repeatedly saying "let me try one more fix" while burning tokens and producing nothing.

Error Recovery Cascade

Built on a single principle: start with the cheapest option, escalate to expensive only as a last resort.

Prompt-too-long (413) errors: 1. Context Collapse drain — cost: 0; flushes pre-computed reductions 2. Reactive Compact — cost: 1 API call; summarizes entire history 3. Surface the error — user sees error only if both fail

Max-output-token errors: 1. Silent cap escalation — cost: 0; bumps 8K to 64K (ESCALATED_MAX_TOKENS) 2. Resume injection — cost: 1 API call; "Your previous response was truncated" (up to 3 attempts) 3. Recovery exhaustion — completes with whatever is available

Supervised by QueryEngine.ts

queryengine-ts (1,295 lines) manages the session above query.ts. Its most interesting design: asymmetric transcript persistence — user messages are saved with await (blocking, needed for --resume), assistant messages are fire-and-forget (non-critical, reduces latency).

Key Claims

Sources