Auto-Compact

Entity ID: ent-20260410-cf8d68784cd0
Type: service
Scope: shared
Status: active
Aliases: auto-compact, context compression, compaction

Description

Auto-compact is the context compression system that prevents Claude Code conversations from exceeding the model's context window. When token usage crosses a threshold (effective context window minus 13K tokens), auto-compact triggers a summarization pipeline that replaces older conversation turns with a structured summary — preserving critical information while freeing token budget for new work.

The system has three tiers: microcompaction (continuous trimming of old tool results), session-memory compaction (zero-API-cost compression using extracted session memory), and full compaction (API-based summarization). A circuit breaker stops retrying after 3 consecutive failures.

How it works

Trigger threshold

The trigger calculation (src/services/compact/autoCompact.ts:62-91):

effective_window = context_window - reserved_for_summary (20K)
threshold = effective_window - 13K buffer

For a 200K-context model: 200K - 20K = 180K effective, threshold at 167K tokens. When the conversation crosses 167K, compaction triggers automatically.

The 13K buffer exists so there's room for the model to generate a response and for new tool results to arrive before hitting the hard wall.

Three tiers of compression

Tier 1: Microcompaction (src/services/compact/microCompact.ts) Continuous, lightweight trimming that runs before full compaction is needed. Two modes: - Time-based (lines 422-530): If enough time has passed since the last assistant message, content-clear all but the most recent N compactable tool results (file reads, bash output, grep results, etc.) - Cache-aware (lines 305-399): Uses the API's cache_edits to remove tool results from the cached prefix without invalidating the cache — the cheapest possible compression.

Only specific tool results are compactable: file read, bash, grep, glob, web search, web fetch, file edit, and file write. Other tool results (like agent responses) are preserved.

Tier 2: Session-memory compaction (src/services/compact/sessionMemoryCompact.ts:514-630) A zero-API-cost alternative to full compaction. Instead of calling Claude to summarize, it uses the already-extracted session memory file as the summary and keeps a configurable number of recent messages.

Key parameters (controlled via GrowthBook tengu_sm_compact gate): - minTokens / minTextBlockMessages: minimum recent context to preserve - maxTokens: hard cap on kept messages - Tracks lastSummarizedMessageId to only compress new messages on subsequent compactions

The trade-off: cheaper than full compaction but produces less precise summaries since the session memory wasn't written specifically for this purpose.

Tier 3: Full compaction (src/services/compact/compact.ts:387-763) The heavyweight path. Calls Claude to produce a structured summary of the conversation.

The summarization prompt

The compaction prompt (src/services/compact/prompt.ts:61-143) instructs Claude to produce a 9-section summary:

Primary Request and Intent — what the user is trying to accomplish
Key Technical Concepts — domain knowledge from the conversation
Files and Code Sections — with full code snippets preserved
Errors and Fixes — what went wrong and how it was resolved
Problem Solving — approaches tried, decisions made
All User Messages — non-tool-result user messages (verbatim preservation)
Pending Tasks — what still needs to be done
Current Work — precise details of work in progress right before summary
Optional Next Step — with direct quotes from conversation

The model writes an <analysis> block (which gets stripped) then a <summary> block (which becomes the actual compressed message). Tools are explicitly disabled during summarization (NO_TOOLS_PREAMBLE) — only text output is allowed.

Forked agent pattern

Full compaction reuses the parent conversation's prompt cache to avoid cache_creation cost (compact.ts:1179-1248). It spawns a single-turn "forked" agent that shares the cached prefix, generates the summary, and exits. If cache sharing fails, it falls back to a streaming path.

Message grouping

Messages are grouped by API round — each group is bounded by assistant message IDs (src/services/compact/grouping.ts:22-63). This ensures tool_use/tool_result pairs are never split. When compaction needs to drop messages (e.g., for prompt-too-long retry), it drops the oldest complete groups.

Image handling

Before sending to the summarization API, images in the conversation are replaced with [image] placeholders (compact.ts:145-200). This prevents the compaction API call itself from hitting prompt-too-long errors on image-heavy conversations.

Partial compaction

Two directions (compact.ts:772-1097): - from — summarize recent messages (suffix), preserve old cached prefix - up_to — summarize old messages (prefix), preserve recent context

Direction choice affects cache behavior: prefix-preserving keeps the prompt cache warm, suffix-preserving is used when recent context is more expendable.

Circuit breaker

src/services/compact/autoCompact.ts:256-350

The circuit breaker tracks consecutiveFailures in AutoCompactTrackingState. After 3 consecutive compaction failures, auto-compact stops trying for the rest of the session. The counter resets to 0 on any success.

The state is threaded through the query loop — each turn receives the previous turn's failure count and passes its updated count to the next turn. This means the circuit breaker survives across turns without global mutable state.

Purpose: prevent burning API calls when context is irrecoverably over the limit (prompt_too_long errors that compaction can't fix because the conversation is simply too large).

The consolidation lock

src/services/autoDream/consolidationLock.ts

A file-based lock (.consolidate-lock in the memory directory) that prevents concurrent compaction/consolidation across multiple Claude Code sessions in the same repo. Uses: - File mtime = last consolidation timestamp (one stat() call per turn to check) - File content = PID of the lock holder

Acquisition logic (lines 46-84): if the lock's mtime is less than 1 hour old AND the PID is alive, the lock is held — back off. If stale or dead PID, acquire by writing current PID and verifying via re-read.

Rollback restores the prior mtime (or deletes the file if no prior lock existed).

Post-compact cleanup

src/services/compact/postCompactCleanup.ts:31-76

After compaction, the system clears cached state that referenced pre-compaction messages: - Microcompact tool registry - Context-collapse state - User context cache and memory file cache - System prompt sections - Classifier approvals - Speculative checks - Beta tracing state - Session messages cache

Intentionally NOT cleared: skill content (sentSkillNames). Re-injecting 4K+ tokens of skill content post-compact would be a cache_creation cost. Since skills are referenced in invoked_skills and the SkillTool schema, context is preserved without re-injection.

What depends on it

QueryEngine — calls auto-compact between turns to manage context
Agent lifecycle — sub-agents inherit compaction settings; worktree agents have their own compaction state
Session persistence — compaction rewrites the message history, affecting what gets persisted
Prompt cache — compaction strategy is designed around cache economics (forked agent, partial compaction directions, microcompact cache_edits)

Design trade-offs

Decision	Trade-off
13K buffer (not larger)	Smaller buffer = more usable context, but less room for large tool results between compaction trigger and hard wall
9-section summary format	Structured = reliable extraction, but rigid — unusual conversations may not fit the template well
Forked agent for summarization	Cache reuse saves money, but adds complexity and a fallback path
Three-tier approach	Microcompact delays full compaction (cheaper), but adds three code paths to maintain
Circuit breaker at 3	Stops waste fast, but if a transient error caused failures 1-2, failure 3 permanently gives up for the session
Session memory as summary	Zero API cost, but summary quality depends on memory extraction quality — wasn't written for this purpose

Key claims

Microcompaction handles the common case (large tool results aging out) without any API cost
Session-memory compaction provides a zero-API-cost full compaction path
The consolidation lock prevents race conditions across concurrent sessions
Post-compact cleanup preserves skill state to avoid expensive re-injection

Relations

rel-20260410-compact-query: QueryEngine --[triggers]--> auto-compact
rel-20260410-compact-cache: auto-compact --[optimizes-for]--> prompt cache economics
rel-20260410-compact-micro: microcompact --[delays]--> full compaction
rel-20260410-compact-lock: consolidation lock --[prevents]--> concurrent compaction

Sources

src-20260409-e9925330d110, source code at src/services/compact/