Auto-Compact
- Entity ID:
ent-20260410-cf8d68784cd0 - Type:
service - Scope:
shared - Status:
active - Aliases: auto-compact, context compression, compaction
Description
Auto-compact is the context compression system that prevents Claude Code conversations from exceeding the model's context window. When token usage crosses a threshold (effective context window minus 13K tokens), auto-compact triggers a summarization pipeline that replaces older conversation turns with a structured summary — preserving critical information while freeing token budget for new work.
The system has three tiers: microcompaction (continuous trimming of old tool results), session-memory compaction (zero-API-cost compression using extracted session memory), and full compaction (API-based summarization). A circuit breaker stops retrying after 3 consecutive failures.
How it works
Trigger threshold
The trigger calculation (src/services/compact/autoCompact.ts:62-91):
effective_window = context_window - reserved_for_summary (20K)
threshold = effective_window - 13K buffer
For a 200K-context model: 200K - 20K = 180K effective, threshold at 167K tokens. When the conversation crosses 167K, compaction triggers automatically.
The 13K buffer exists so there's room for the model to generate a response and for new tool results to arrive before hitting the hard wall.
Three tiers of compression
Tier 1: Microcompaction (src/services/compact/microCompact.ts)
Continuous, lightweight trimming that runs before full compaction is needed. Two modes:
- Time-based (lines 422-530): If enough time has passed since the last assistant message, content-clear all but the most recent N compactable tool results (file reads, bash output, grep results, etc.)
- Cache-aware (lines 305-399): Uses the API's cache_edits to remove tool results from the cached prefix without invalidating the cache — the cheapest possible compression.
Only specific tool results are compactable: file read, bash, grep, glob, web search, web fetch, file edit, and file write. Other tool results (like agent responses) are preserved.
Tier 2: Session-memory compaction (src/services/compact/sessionMemoryCompact.ts:514-630)
A zero-API-cost alternative to full compaction. Instead of calling Claude to summarize, it uses the already-extracted session memory file as the summary and keeps a configurable number of recent messages.
Key parameters (controlled via GrowthBook tengu_sm_compact gate):
- minTokens / minTextBlockMessages: minimum recent context to preserve
- maxTokens: hard cap on kept messages
- Tracks lastSummarizedMessageId to only compress new messages on subsequent compactions
The trade-off: cheaper than full compaction but produces less precise summaries since the session memory wasn't written specifically for this purpose.
Tier 3: Full compaction (src/services/compact/compact.ts:387-763)
The heavyweight path. Calls Claude to produce a structured summary of the conversation.
The summarization prompt
The compaction prompt (src/services/compact/prompt.ts:61-143) instructs Claude to produce a 9-section summary:
- Primary Request and Intent — what the user is trying to accomplish
- Key Technical Concepts — domain knowledge from the conversation
- Files and Code Sections — with full code snippets preserved
- Errors and Fixes — what went wrong and how it was resolved
- Problem Solving — approaches tried, decisions made
- All User Messages — non-tool-result user messages (verbatim preservation)
- Pending Tasks — what still needs to be done
- Current Work — precise details of work in progress right before summary
- Optional Next Step — with direct quotes from conversation
The model writes an <analysis> block (which gets stripped) then a <summary> block (which becomes the actual compressed message). Tools are explicitly disabled during summarization (NO_TOOLS_PREAMBLE) — only text output is allowed.
Forked agent pattern
Full compaction reuses the parent conversation's prompt cache to avoid cache_creation cost (compact.ts:1179-1248). It spawns a single-turn "forked" agent that shares the cached prefix, generates the summary, and exits. If cache sharing fails, it falls back to a streaming path.
Message grouping
Messages are grouped by API round — each group is bounded by assistant message IDs (src/services/compact/grouping.ts:22-63). This ensures tool_use/tool_result pairs are never split. When compaction needs to drop messages (e.g., for prompt-too-long retry), it drops the oldest complete groups.
Image handling
Before sending to the summarization API, images in the conversation are replaced with [image] placeholders (compact.ts:145-200). This prevents the compaction API call itself from hitting prompt-too-long errors on image-heavy conversations.
Partial compaction
Two directions (compact.ts:772-1097):
- from — summarize recent messages (suffix), preserve old cached prefix
- up_to — summarize old messages (prefix), preserve recent context
Direction choice affects cache behavior: prefix-preserving keeps the prompt cache warm, suffix-preserving is used when recent context is more expendable.
Circuit breaker
src/services/compact/autoCompact.ts:256-350
The circuit breaker tracks consecutiveFailures in AutoCompactTrackingState. After 3 consecutive compaction failures, auto-compact stops trying for the rest of the session. The counter resets to 0 on any success.
The state is threaded through the query loop — each turn receives the previous turn's failure count and passes its updated count to the next turn. This means the circuit breaker survives across turns without global mutable state.
Purpose: prevent burning API calls when context is irrecoverably over the limit (prompt_too_long errors that compaction can't fix because the conversation is simply too large).
The consolidation lock
src/services/autoDream/consolidationLock.ts
A file-based lock (.consolidate-lock in the memory directory) that prevents concurrent compaction/consolidation across multiple Claude Code sessions in the same repo. Uses:
- File mtime = last consolidation timestamp (one stat() call per turn to check)
- File content = PID of the lock holder
Acquisition logic (lines 46-84): if the lock's mtime is less than 1 hour old AND the PID is alive, the lock is held — back off. If stale or dead PID, acquire by writing current PID and verifying via re-read.
Rollback restores the prior mtime (or deletes the file if no prior lock existed).
Post-compact cleanup
src/services/compact/postCompactCleanup.ts:31-76
After compaction, the system clears cached state that referenced pre-compaction messages: - Microcompact tool registry - Context-collapse state - User context cache and memory file cache - System prompt sections - Classifier approvals - Speculative checks - Beta tracing state - Session messages cache
Intentionally NOT cleared: skill content (sentSkillNames). Re-injecting 4K+ tokens of skill content post-compact would be a cache_creation cost. Since skills are referenced in invoked_skills and the SkillTool schema, context is preserved without re-injection.
What depends on it
- QueryEngine — calls auto-compact between turns to manage context
- Agent lifecycle — sub-agents inherit compaction settings; worktree agents have their own compaction state
- Session persistence — compaction rewrites the message history, affecting what gets persisted
- Prompt cache — compaction strategy is designed around cache economics (forked agent, partial compaction directions, microcompact cache_edits)
Design trade-offs
| Decision | Trade-off |
|---|---|
| 13K buffer (not larger) | Smaller buffer = more usable context, but less room for large tool results between compaction trigger and hard wall |
| 9-section summary format | Structured = reliable extraction, but rigid — unusual conversations may not fit the template well |
| Forked agent for summarization | Cache reuse saves money, but adds complexity and a fallback path |
| Three-tier approach | Microcompact delays full compaction (cheaper), but adds three code paths to maintain |
| Circuit breaker at 3 | Stops waste fast, but if a transient error caused failures 1-2, failure 3 permanently gives up for the session |
| Session memory as summary | Zero API cost, but summary quality depends on memory extraction quality — wasn't written for this purpose |
Key claims
- Microcompaction handles the common case (large tool results aging out) without any API cost
- Session-memory compaction provides a zero-API-cost full compaction path
- The consolidation lock prevents race conditions across concurrent sessions
- Post-compact cleanup preserves skill state to avoid expensive re-injection
Relations
rel-20260410-compact-query: QueryEngine --[triggers]--> auto-compactrel-20260410-compact-cache: auto-compact --[optimizes-for]--> prompt cache economicsrel-20260410-compact-micro: microcompact --[delays]--> full compactionrel-20260410-compact-lock: consolidation lock --[prevents]--> concurrent compaction
Sources
src-20260409-e9925330d110, source code at src/services/compact/