Five-Layer Platform Architecture

Claude Code's architecture is organized in five layers, each with distinct responsibilities and trust boundaries. This layering explains many otherwise-puzzling design choices — why the 46K-line QueryEngine monolith exists, why safety is a layer rather than distributed, and why tools are registered in a specific order.

The Layers

Layer Responsibility Key Components
1. Core Agent Loop Message processing, API calls, tool execution query.ts (~30 lines of core loop), tt() (3,167-line orchestrator)
2. Tool Layer Interface between LLM intentions and real world 40+ tools, buildTool factory, StreamingToolExecutor
3. Safety Layer Permission checks, risk classification, deny rules Permission pipeline (4-layer stack), auto-mode classifier, bash security validator (2,592 lines)
4. Platform Services Background daemons, memory, MCP, sessions handleStopHooks, auto-dream, extract memories, session persistence, MCP client
5. Interface Layer Terminal UI, IDE bridge, voice React+Ink renderer, yoga-layout, bridge system (v2 WebSocket), voice system

Why This Matters

The agent loop is tiny. The core message loop in query.ts is ~30 lines — an async generator that yields tokens, processes tool calls, and checks budgets. The 46K-line QueryEngine monolith exists because all the INTERESTING complexity (retries, cache management, rate limits, streaming state) must be co-located to reason about their interactions.

Safety is a testable, bypassable layer. The permission pipeline, auto-mode classifier, and bash security validator form a distinct layer that can be tested independently, bypassed for trusted contexts (--dangerously-skip-permissions), and audited separately. This is deliberate — distributing safety checks across layers makes them harder to verify.

Background services share a single scheduling point. All nine background services funnel through handleStopHooks, executing during the gap between user messages. This coupling is the cost of cache preservation — background agents must share the parent's prompt cache prefix.

Architectural Decisions That Follow

  1. Model as planner — no external DAG, routing, or planning layer. The LLM drives the loop directly.
  2. Grep over RAG — agentic search (read files, grep for patterns) over vector retrieval. More accurate, more secure, no staleness.
  3. Cache as load-bearing infrastructure — the five-layer architecture is shaped by cache economics. Tool registration order, background scheduling, and agent forking are all cache-preservation mechanisms.
  4. Dead-code elimination via compilation — Bun builds eliminate unused code paths, keeping the shipped binary focused.