query.ts
The single most important file in the claude-code codebase — a 1,729-line while(true) async generator that governs every turn of the agent. It is the core agentic loop.
The Async Generator Design
export async function* query(params: QueryParams): AsyncGenerator<
StreamEvent | RequestStartEvent | Message | TombstoneMessage | ToolUseSummaryMessage,
Terminal
>
The generator pattern has four non-obvious consequences:
1. Streaming is native — tokens flow through yield, not callbacks
2. Interruption is clean — a single AbortController cancels the whole generator
3. Budget control is trivial — check maxBudget at each iteration boundary
4. Tool calls are tail-recursive — tool_use -> tool_result -> continue is just another iteration
The Continue Site Pattern
The loop separates immutable QueryParams (never change mid-loop) from mutable State (updated every iteration). State updates use atomic reassignment:
state = {
...state,
messages: newMessages,
turnCount: nextTurnCount,
transition: { reason: 'next_turn' }
}
A partial state update mid-iteration would mean corrupted conversation history and wasted API calls. The atomic reassignment eliminates that failure mode entirely.
6-Stage Per-Turn Pipeline
| Stage | Lines | Function |
|---|---|---|
| 1. Pre-Request Compaction | 365-548 | Five compaction-pipeline tiers fire before any API call |
| 2. API Call & Streaming | 659-863 | streaming-tool-executor begins parallel tool execution during generation |
| 3. Error Recovery Cascade | 1062-1256 | 3-stage recovery for 413 errors and output-token exhaustion |
| 4. Stop Hooks & Token Budget | 1267-1355 | User hooks-system run; diminishing returns detection |
| 5. Tool Execution | 1363-1520 | Streaming and batch tool results merged |
| 6. Post-Tool & Transition | 1547-1727 | Skills/memory harvested; MCP tools refreshed; state update |
9 Typed Exit Paths
The Terminal return type is a typed enum with 9 distinct exit reasons, each mapped to a specific line range:
| Exit Reason | Line(s) | When |
|---|---|---|
completed |
1264/1357 | Model finished normally |
blocking_limit |
646 | Exceeded blocking budget |
aborted_streaming |
1051 | User interrupted during streaming |
aborted_tools |
1515 | User interrupted during tool execution |
prompt_too_long |
1175/1182 | Context too large after all recovery |
image_error |
977/1175 | Image processing failed |
model_error |
996 | API error after retries |
hook_stopped |
1520 | User hook blocked continuation |
max_turns |
1711 | Turn budget exhausted |
Diminishing Returns Detection
const isDiminishing = (
continuationCount >= 3 &&
deltaSinceLastCheck < 500 &&
lastDeltaTokens < 500
)
After 3 consecutive continuations producing fewer than 500 tokens each, the loop stops. This prevents the most dangerous pattern in agentic loops: the model repeatedly saying "let me try one more fix" while burning tokens and producing nothing.
Error Recovery Cascade
Built on a single principle: start with the cheapest option, escalate to expensive only as a last resort.
Prompt-too-long (413) errors: 1. Context Collapse drain — cost: 0; flushes pre-computed reductions 2. Reactive Compact — cost: 1 API call; summarizes entire history 3. Surface the error — user sees error only if both fail
Max-output-token errors:
1. Silent cap escalation — cost: 0; bumps 8K to 64K (ESCALATED_MAX_TOKENS)
2. Resume injection — cost: 1 API call; "Your previous response was truncated" (up to 3 attempts)
3. Recovery exhaustion — completes with whatever is available
Supervised by QueryEngine.ts
queryengine-ts (1,295 lines) manages the session above query.ts. Its most interesting design: asymmetric transcript persistence — user messages are saved with await (blocking, needed for --resume), assistant messages are fire-and-forget (non-critical, reduces latency).
Key Claims
clm-20260409-c0bf3fcef119: 6-stage per-turn pipelineclm-20260409-98b860325f16: 9 typed exit pathsclm-20260409-cb0d1bd0e4db: Parallel read-only tool execution during generation
Sources
src-20260409-6913a0b93c8b— Round 7: The Deepest Architecture Yet