Auto-Memory
Auto-memory is a background extraction process where Claude reads its own ongoing conversation and writes structured notes to a project-local memory directory. Where CLAUDE.md is a file the user writes as standing instructions, MEMORY.md is a file Claude writes for itself. Shipped in v2.1.32 (late February 2026).
How It Works
The extraction runs as a forked-agent-pattern subagent via runForkedAgent() with querySource: 'session_memory'. The fork is given exactly one permitted tool: FileEdit on the memory directory. All other tools are denied. A sequential() guard prevents overlapping extractions.
Trigger Cadence
| Trigger | Threshold | Rationale |
|---|---|---|
| First extraction | ~10,000 tokens | Wait for enough context to extract meaningful patterns |
| Subsequent | ~5,000 tokens or 3 tool calls (whichever first) | Capture evolving understanding without burning compute |
Short "fix this typo" sessions may produce one extraction or none. Deep architecture discussions produce many.
The 10-Section Summary Template
Each extraction follows a consistent template, capped at ~12,000 tokens total (~2,000 per section):
| Section | What it captures |
|---|---|
| Session Title | Auto-generated 5-10 word description |
| Current State | Active work, pending tasks, immediate next steps |
| Task Specification | What the user asked to build; design decisions |
| Files and Functions | Important files, what they contain, why relevant |
| Workflow | Bash commands and their typical order |
| Errors & Corrections | Errors encountered and fixes; failed approaches |
| Codebase Documentation | Important components and how they work together |
| Learnings | What worked, what didn't, what to avoid |
| Key Results | Exact outputs if the user requested specific artifacts |
| Worklog | Chronological record of actions taken |
Memory File Architecture
The memory directory uses a two-layer index structure:
~/.claude/projects/<slug>/memory/
├── MEMORY.md # Index — always loaded (first 200 lines only)
├── user_role.md # Topic files loaded on-demand
├── debugging.md
├── project_auth.md
└── reference_linear.md
MEMORY.md is injected into every session context but only the first 200 lines are loaded. Topic files are loaded on-demand when Claude judges them relevant. The index acts as a pointer structure, not a full dump.
Four Semantic Types
| Type | Description | Example |
|---|---|---|
| User | Who the user is | "Senior Go engineer, new to React" |
| Feedback | How to behave | "Don't mock the database in tests" |
| Project | What's happening | "Merge freeze after March 5 for mobile release" |
| Reference | Where to look | "Pipeline bugs tracked in Linear project INGEST" |
Retrieval: LLM Reasoning, Not Embeddings
The memory system deliberately rejects vector search. Claude calls ls() to list memory files, reasons about which are relevant based on filenames and the current task, then calls read_file() on the selected files. The source architecture notes: "choose regex over embeddings for search, Markdown files over databases for memory."
This trades retrieval sophistication for simplicity and interpretability — an LLM reasoning about filename semantics outperforms opaque vector chunk matching for structured, human-readable files.
Context Loading Order
The system prompt is assembled from six layers on every query (not just session start). CLAUDE.md is reloaded every turn and supports up to 40,000 characters:
~/.claude/CLAUDE.md(global user rules)./CLAUDE.md(project root).claude/rules/*.md(modular rules, alphabetical)~/.claude/projects/<slug>/memory/MEMORY.md(auto-memory index, 200 lines max)
Hidden Token Cost
The extractMemories mechanism fires a background Opus API call per turn to extract session memories. This fire-and-forget call doubles effective token consumption — users see 13M tokens but the actual usage is 26M. The memory extraction cost is hidden from the user.
Feature Gate
Auto-memory is rolled out via the tengu_onyx_plover GrowthBook gate. Force on/off:
export CLAUDE_CODE_DISABLE_AUTO_MEMORY=0 # force enable
export CLAUDE_CODE_DISABLE_AUTO_MEMORY=1 # force disable
If ~/.claude/projects/<your-project>/memory/ exists, you are enrolled.
Relationship to Auto-Dream
Auto-memory writes continuously, one session at a time. Over weeks, this produces scattered, redundant, contradictory notes. auto-dream is the consolidation layer — a background process that reads accumulated transcripts and rewrites memory into clean, non-redundant, topic-organized files. The two systems are designed as a pair: auto-memory appends, auto-dream consolidates.
Key Claims
clm-20260409-2000cb25955a: Extraction cadence — 10K initial, then 5K tokens or 3 tool callsclm-20260409-86ce9df0fb73: Memory retrieval uses LLM reasoning over filenames, not vector search
Sources
src-20260409-a14e9e98c3cd— Internals: Auto-Memory, Auto-Dream, and Agent Teams