Sleep-Time Compute

Description

The UC Berkeley paper "Sleep-time Compute for LLMs" (arXiv:2504.13171, April 2025) demonstrated that LLMs can pre-compute inferences during user idle time, reducing test-time compute by approximately 5x at equal accuracy. The core insight is that models can do useful work between user interactions, amortizing expensive reasoning across idle periods.

Claude Code's auto-dream system implements sleep-time compute but inverts the direction: where the paper's approach pre-computes answers to predicted future queries, auto-dream consolidates past memory -- merging, pruning, and reorganizing the accumulated knowledge from prior sessions so that future sessions boot faster and with higher-quality context. The paper looks forward; auto-dream looks backward.

The paper's key finding that consolidation requires sufficient accumulated data maps directly to auto-dream's minSessions: 5 threshold -- the system does not trigger consolidation until at least 5 sessions have accumulated since the last dream, ensuring there is enough signal to justify the compute cost.

Key claims

Relations

Sources