Context Window Management: The Five-Tier Compaction Cascade

Claude Code's most complex subsystem is context window management — a five-tier cascade of increasingly aggressive strategies that keep conversations within the token budget while preserving the most important context.

The Problem

Claude Code sessions can run for hours with hundreds of tool calls. Each file read, grep result, and bash output adds to the context. Without management, the context window fills and the session dies. The challenge: compress without losing the information the agent needs to continue working.

The Five Tiers

Tier Mechanism When Aggressiveness
1 Auto-Compact Context approaches limit (effective_window - 13K buffer) Moderate — summarizes conversation, preserves file references
2 API Microcompact API-native context_management beta header Low — API handles pruning
3 Reactive Compact After API returns context-too-large error High — emergency response
4 Snip Emergency — all else failed Maximum — discard non-critical content
5 Session Memory Background — Sonnet child agent summarizes for future compaction Preparatory — builds summaries proactively

How Compaction Works

The compact() function: 1. Strips images (high token cost, low information density after initial viewing) 2. Calls a compression API to summarize the conversation 3. Restores file references and skill state from preservedSegment boundaries 4. The /compact command supports selective preservation: /compact keep the database schema and the authentication logic

The Autocompact Death Loop

The most dramatic failure in compaction history: 1,279 sessions had 50+ consecutive autocompact failures (worst: 3,272 retries), wasting 250,000 API calls per day globally. Root cause: compaction was retried without limit. Fix: MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3. The bug was documented March 10, 2026; shipped (fixed) March 31. Three weeks in production with a known 250K/day waste.

The 150K Threshold Problem

With the upgrade to 1M context windows, the autocompact trigger at 150K tokens now fires at only 15% utilization. Users have massive context available but the system compacts as if the window were much smaller. This is documented as a known issue, not a design choice.

Compaction as Attack Vector

A subtle security concern: the compaction step is a rewrite of the conversation history. A sophisticated prompt injection could craft content that, when compressed, produces a summarized version with different meaning — effectively laundering malicious instructions through the summarization step. The security team's assessment: "compaction laundering" is a theoretical but non-trivial attack vector.

Design Tensions

  1. Pareto tradeoffs at every tier — each compaction level trades information loss for continuation ability
  2. Token cost of compaction itself — calling the compression API costs tokens; aggressive compaction can cost more than the tokens it saves
  3. Background vs. foreground — auto-compact runs in the background (forked agent), but reactive compact blocks the main loop
  4. Selective preservation is user-controlled — the user decides what to keep via /compact keep ..., but most users don't know this exists