Auto-Memory

Auto-memory is a background extraction process where Claude reads its own ongoing conversation and writes structured notes to a project-local memory directory. Where CLAUDE.md is a file the user writes as standing instructions, MEMORY.md is a file Claude writes for itself. Shipped in v2.1.32 (late February 2026).

How It Works

The extraction runs as a forked-agent-pattern subagent via runForkedAgent() with querySource: 'session_memory'. The fork is given exactly one permitted tool: FileEdit on the memory directory. All other tools are denied. A sequential() guard prevents overlapping extractions.

Trigger Cadence

Trigger	Threshold	Rationale
First extraction	~10,000 tokens	Wait for enough context to extract meaningful patterns
Subsequent	~5,000 tokens or 3 tool calls (whichever first)	Capture evolving understanding without burning compute

Short "fix this typo" sessions may produce one extraction or none. Deep architecture discussions produce many.

The 10-Section Summary Template

Each extraction follows a consistent template, capped at ~12,000 tokens total (~2,000 per section):

Section	What it captures
Session Title	Auto-generated 5-10 word description
Current State	Active work, pending tasks, immediate next steps
Task Specification	What the user asked to build; design decisions
Files and Functions	Important files, what they contain, why relevant
Workflow	Bash commands and their typical order
Errors & Corrections	Errors encountered and fixes; failed approaches
Codebase Documentation	Important components and how they work together
Learnings	What worked, what didn't, what to avoid
Key Results	Exact outputs if the user requested specific artifacts
Worklog	Chronological record of actions taken

Memory File Architecture

The memory directory uses a two-layer index structure:

~/.claude/projects/<slug>/memory/
├── MEMORY.md          # Index — always loaded (first 200 lines only)
├── user_role.md       # Topic files loaded on-demand
├── debugging.md
├── project_auth.md
└── reference_linear.md

MEMORY.md is injected into every session context but only the first 200 lines are loaded. Topic files are loaded on-demand when Claude judges them relevant. The index acts as a pointer structure, not a full dump.

Four Semantic Types

Type	Description	Example
User	Who the user is	"Senior Go engineer, new to React"
Feedback	How to behave	"Don't mock the database in tests"
Project	What's happening	"Merge freeze after March 5 for mobile release"
Reference	Where to look	"Pipeline bugs tracked in Linear project INGEST"

Retrieval: LLM Reasoning, Not Embeddings

The memory system deliberately rejects vector search. Claude calls ls() to list memory files, reasons about which are relevant based on filenames and the current task, then calls read_file() on the selected files. The source architecture notes: "choose regex over embeddings for search, Markdown files over databases for memory."

This trades retrieval sophistication for simplicity and interpretability — an LLM reasoning about filename semantics outperforms opaque vector chunk matching for structured, human-readable files.

Context Loading Order

The system prompt is assembled from six layers on every query (not just session start). CLAUDE.md is reloaded every turn and supports up to 40,000 characters:

~/.claude/CLAUDE.md (global user rules)
./CLAUDE.md (project root)
.claude/rules/*.md (modular rules, alphabetical)
~/.claude/projects/<slug>/memory/MEMORY.md (auto-memory index, 200 lines max)

Hidden Token Cost

The extractMemories mechanism fires a background Opus API call per turn to extract session memories. This fire-and-forget call doubles effective token consumption — users see 13M tokens but the actual usage is 26M. The memory extraction cost is hidden from the user.

Feature Gate

Auto-memory is rolled out via the tengu_onyx_plover GrowthBook gate. Force on/off:

export CLAUDE_CODE_DISABLE_AUTO_MEMORY=0   # force enable
export CLAUDE_CODE_DISABLE_AUTO_MEMORY=1   # force disable

If ~/.claude/projects/<your-project>/memory/ exists, you are enrolled.

Relationship to Auto-Dream

Auto-memory writes continuously, one session at a time. Over weeks, this produces scattered, redundant, contradictory notes. auto-dream is the consolidation layer — a background process that reads accumulated transcripts and rewrites memory into clean, non-redundant, topic-organized files. The two systems are designed as a pair: auto-memory appends, auto-dream consolidates.

Key Claims

clm-20260409-2000cb25955a: Extraction cadence — 10K initial, then 5K tokens or 3 tool calls
clm-20260409-86ce9df0fb73: Memory retrieval uses LLM reasoning over filenames, not vector search

Sources

src-20260409-a14e9e98c3cd — Internals: Auto-Memory, Auto-Dream, and Agent Teams