Context Window Management: The Five-Tier Compaction Cascade

Claude Code's most complex subsystem is context window management — a five-tier cascade of increasingly aggressive strategies that keep conversations within the token budget while preserving the most important context.

The Problem

Claude Code sessions can run for hours with hundreds of tool calls. Each file read, grep result, and bash output adds to the context. Without management, the context window fills and the session dies. The challenge: compress without losing the information the agent needs to continue working.

The Five Tiers

Tier	Mechanism	When	Aggressiveness
1	Auto-Compact	Context approaches limit (effective_window - 13K buffer)	Moderate — summarizes conversation, preserves file references
2	API Microcompact	API-native `context_management` beta header	Low — API handles pruning
3	Reactive Compact	After API returns context-too-large error	High — emergency response
4	Snip	Emergency — all else failed	Maximum — discard non-critical content
5	Session Memory	Background — Sonnet child agent summarizes for future compaction	Preparatory — builds summaries proactively

How Compaction Works

The compact() function: 1. Strips images (high token cost, low information density after initial viewing) 2. Calls a compression API to summarize the conversation 3. Restores file references and skill state from preservedSegment boundaries 4. The /compact command supports selective preservation: /compact keep the database schema and the authentication logic

The Autocompact Death Loop

The most dramatic failure in compaction history: 1,279 sessions had 50+ consecutive autocompact failures (worst: 3,272 retries), wasting 250,000 API calls per day globally. Root cause: compaction was retried without limit. Fix: MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3. The bug was documented March 10, 2026; shipped (fixed) March 31. Three weeks in production with a known 250K/day waste.

The 150K Threshold Problem

With the upgrade to 1M context windows, the autocompact trigger at 150K tokens now fires at only 15% utilization. Users have massive context available but the system compacts as if the window were much smaller. This is documented as a known issue, not a design choice.

Compaction as Attack Vector

A subtle security concern: the compaction step is a rewrite of the conversation history. A sophisticated prompt injection could craft content that, when compressed, produces a summarized version with different meaning — effectively laundering malicious instructions through the summarization step. The security team's assessment: "compaction laundering" is a theoretical but non-trivial attack vector.

Design Tensions

Pareto tradeoffs at every tier — each compaction level trades information loss for continuation ability
Token cost of compaction itself — calling the compression API costs tokens; aggressive compaction can cost more than the tokens it saves
Background vs. foreground — auto-compact runs in the background (forked agent), but reactive compact blocks the main loop
Selective preservation is user-controlled — the user decides what to keep via /compact keep ..., but most users don't know this exists

compaction-pipeline — the detailed entity page
auto-compact — the background compaction service
autocompact-death-loop — the 250K API calls/day bug
session-memory — background summarization for future compaction
handleStopHooks — scheduling for background compaction
cache-economics — compaction interacts with cache prefix preservation