The Architecture and Engineering of Claude Code Internals

Tags

internal-docs architecture

Content

The Architecture and Engineering of Claude Code Internals

Summary of Key Concepts, Ideas, and Insights

Synthesized from 17 sources in the NotebookLM notebook covering the March 2026 Claude Code source leak.


1. The Leak: How It Happened


2. Overall Architecture: A Platform Runtime, Not a Wrapper

Claude Code is described as "closer to VS Code's extension host or Emacs's Lisp core than to a typical AI wrapper."

Five-Layer Architecture

Layer Contents
1 -- Entrypoints CLI / Desktop / Web / SDK / IDE Extensions
2 -- Runtime REPL loop / Query executor / Hook system / State manager
3 -- Engine QueryEngine / Context coordinator / Model manager / Compact
4 -- Tools & Capabilities 100+ tools / Plugin / MCP / Skill / Agent / Command
5 -- Infrastructure Auth / Storage / Cache / Analytics / Bridge transport

The Bridge layer abstracts transport so the same QueryEngine powers every surface (terminal, desktop, web, IDE extensions, SDK).

The Agent Loop: Async Generator, Not State Machine

Most agent frameworks use state machines or event emitters. Claude Code uses an async generator:

Split Query Engine

The separation means the outer loop can abort on cost without the inner loop knowing about budgets.


3. System Prompt Architecture

Cache Boundary Design

The system prompt is split by a SYSTEM_PROMPT_DYNAMIC_BOUNDARY marker:

Section Cache Scope TTL Cost
Identity, rules, tool descriptions Global (all users) 1 hour 10% on hit
CLAUDE.md, git status, memory Per-session 5 min 10% on hit
Latest messages Sliding window 5 min 10% on hit

Real sessions achieve 96% cache hit rates. At Opus 4.6 pricing, a 100K-token uncached prompt costs $0.50; cached, $0.05.

The 80/28 Breakdown

The codebase contains ~80 prompts, of which 28 are system prompts assembled into four injection categories.

Tool Pool Ordering

Built-in tools are sorted alphabetically and placed first as a stable cache prefix. MCP tools are sorted alphabetically and appended after. The two groups never interleave. This means adding/removing MCP servers never invalidates the expensive built-in prefix cache.

Tool Documentation Deduplication

An LLM side-query runs before each tool invocation to check if the tool's docs were already provided in recent turns. If so, the description is stripped to save context tokens.

Two-Tier Prompt: Internal vs. External

Anthropic internal engineers (USER_TYPE === 'ant') receive a materially different system prompt:

Directive External Customers Internal Employees
Verbosity "Short and concise" "Err on the side of more explanation"
Verification Absent "Before reporting complete, verify it actually completed"
Collaboration Absent "You're a collaborator, not just an executor"

Internal users also get access to a Verification Agent that double-checks work, addressing a known 29-30% false-claims rate in the current model.

Undercover Mode

When Anthropic employees work on public/open-source repos, a prompt injection instructs Claude to hide that it is an AI, scrub internal codenames, and omit Co-Authored-By attribution.


4. Tool System

40+ Built-in Tools

Key tools include BashTool, PowerShellTool, FileReadTool, FileEditTool, FileWriteTool, GlobTool, GrepTool, AgentTool, WebFetchTool, WebSearchTool, NotebookEditTool, SkillTool, TaskTools, and more.

StreamingToolExecutor

Bash AST Parser Flaw

The Bash tool's security parser performs prefix-only checking rather than full AST decomposition. 52/52 tested attack patterns are exploitable via shell operator chaining (&&, ||, ;, |).


5. Permission Model: ADVISOR

A 4-layer auto-approval pipeline evaluating actions against a reversibility x blast-radius framework:

  1. Fast-path rules -- glob/regex patterns. If matched, allow/deny instantly.
  2. acceptEdits simulation -- if the tool call would pass in file-edit mode, skip heavy classification.
  3. Read-only whitelist -- Read, Glob, Grep, LSP, WebSearch modify no state -> unconditionally allow.
  4. ML safety classifier -- Two-stage Sonnet 4.6 classifier:
  5. Stage 1: fast yes/no filter (tuned for low false-negatives, high false-positives).
  6. Stage 2: chain-of-thought reasoning on the same prompt body (~100% cache hit from Stage 1).

Reasoning-blind by design: Claude's own response text and tool outputs are stripped before the classifier sees anything. The agent cannot persuade the classifier to bypass a block. Denials are returned as tool results instructing the model to find a safer path (not session kills). 3 consecutive or 20 total denials trigger human escalation.


6. Context Compression: Four Tiers of Managed Degradation

Tier Mechanism When
1 autoCompact Context approaching limit (13K buffer)
2 apiMicrocompact API-native
3 reactiveCompact After API returns HTTP 413 (prompt too long)
4 snip Emergency: discard non-critical content

Known Bug: 250,000 Wasted API Calls/Day

Before the fix, 1,279 sessions had 50+ consecutive compaction failures (up to 3,272 per session), wasting ~250K API calls/day globally. Fix: MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3.


7. CLAUDE.md System


8. Auto-Memory: Claude Writing Its Own Notes


9. Auto-Dream: Background Memory Consolidation

Inspired by the UC Berkeley "Sleep-time Compute" paper (arXiv:2504.13171). The system prompt literally reads: "You are performing a dream."

Four-Gate Trigger (cheapest checks first)

Gate Check Cost
1 >= 24 hours since last consolidation Trivial (timestamp)
2 >= 5 sessions accumulated Low (scan file list)
3 >= 10 minutes since last scan Trivial (timestamp)
4 Acquire filesystem lock Medium (prevents concurrent dreams)

Four Phases

  1. Orient: Survey memory directory, read index, skim topic headings.
  2. Gather Signal: Targeted grep of daily logs (never reads transcripts end-to-end). Targets user corrections, explicit saves, recurring themes, architectural decisions.
  3. Consolidate: Convert relative timestamps to absolute dates, delete contradicted facts, merge duplicates, resolve conflicts.
  4. Prune & Index: Rewrite topic files, rebuild MEMORY.md under 200-line cap, remove stale entries.

One documented test processed 913 accumulated sessions in ~8-9 minutes without blocking the active terminal.


10. Multi-Agent System

Three Execution Models

Model Isolation Use Case
InProcessTeammate AsyncLocalStorage Same terminal, lightweight
LocalAgentTask Async background process Non-blocking parallel sub-agent
RemoteAgentTask Cloud container (CCR) Full cloud-sandboxed execution
Worktree isolation Git worktree (own branch/directory) Parallel code editing without conflicts
DreamTask Background-only Memory consolidation

Cache Inheritance

Subagents are spawned via a fork-join pattern inheriting the parent's CacheSafeParams. They produce byte-identical API request prefixes, achieving up to 92% prompt-cache reuse across the swarm. Spawning 5 parallel agents costs nearly the same as spawning 1.

Atomic Task Claiming

File-locking in a current_tasks/ directory prevents race conditions without a central coordinator. Anti-recursion: <fork_boilerplate_tag> marker prevents recursive forking.

omitClaudeMd Optimization

Read-only agents (explore, plan) are spawned with omitClaudeMd: true, saving 5-15 GTok/week internally.


11. Speculative Execution Engine

A completely dark feature (behind tengu_speculation in GrowthBook):

  1. When Claude finishes responding and generates a suggestion, it silently forks a background API call and begins executing the predicted prompt immediately.
  2. Execution runs in an overlay filesystem -- writes copy originals to overlay first; the real codebase is never modified.
  3. If accepted: overlay files copy back, speculated messages inject into conversation.
  4. If rejected: overlay is destroyed, API call aborted.
  5. Recursive pipelining: successful speculation immediately predicts and starts executing the next suggestion (isPipelined = true).

Permission tiers: Read/Glob/Grep run freely; Edit/Write redirect to overlay; Bash only if already auto-approved; everything else denied.


12. KAIROS: The Always-On Daemon (Unreleased)

Named after the Greek concept of "opportune timing" (vs. chronos/sequential time).

Community reaction: "KAIROS is Claude as a hyper-agentic OpenClaw. Always-on. Proactive. Responsive."


13. Anti-Distillation System (Three Layers)

Responding to documented industrial-scale distillation campaigns by DeepSeek, Moonshot AI, and MiniMax (16M exchanges via ~24,000 fraudulent accounts):

Layer 1: Fake Tool Injection

Layer 2: Connector-Text Summarization (Chain-of-Thought Hiding)

Layer 3: (Soft deterrent via the fake tools above)


14. Known Bugs and Engineering Issues

Cache-Destroying Bugs

File Descriptor Leak

Zero-Test Culture

Frustration Detection via Regex


15. Four Generalizable Engineering Patterns

Identified by community analysis as extractable for any agent framework:

  1. System prompt engineering with tool constraints: Describe tool risk levels, reversibility checks, and output format specs directly in the prompt. Claude Code's Bash tool description is 1,558 tokens specifically for this.
  2. Multi-agent atomic task claiming: File-lock-based claiming with typed task IDs prevents duplicate work without a central coordinator.
  3. Staged context compression: Three mechanisms at different trigger thresholds prevent both premature summarization and catastrophic context loss.
  4. Dream-based memory consolidation: Append-only logs during sessions + periodic consolidation subagent. Never re-read transcripts end-to-end; always grep for specific patterns.

16. Easter Eggs


17. Community Impact