Background Daemon Architecture

Claude Code is not a REPL — it's a daemon with an autonomous lifecycle. The handleStopHooks function, executed at the end of every query loop iteration, transforms idle time between user inputs into a dense background scheduling window. Nine background services run during these gaps, each implemented as a forked child agent sharing the parent's prompt cache prefix.

The Philosophy: Idle Time Is Compute

Every moment the user is thinking, typing, or reviewing output is compute time Claude Code can use. Rather than waiting passively, handleStopHooks dispatches background work: memory extraction, documentation maintenance, session summarization, context compaction, and speculative pre-computation. This is the same philosophy as UC Berkeley's Sleep-time Compute paper — use idle time to improve the next interaction.

The Nine Background Services

Service Gate Public? Purpose
Extract Memories EXTRACT_MEMORIES + auto-memory toggle Yes Persistent knowledge extraction from conversations
Session Memory tengu_session_memory Yes Session summarization for compaction
Auto-Compact Default on Yes Context window management at effective_window - 13K
Auto-Dream tengu_onyx_plover Yes Memory consolidation during inactivity
Cron Scheduler AGENT_TRIGGERS + tengu_kairos_cron Yes Scheduled task execution
Away Summary No gate Yes Re-orientation on user return
Prevent Sleep (macOS) No gate Yes caffeinate during active background tasks
Speculative Execution tengu_speculation No (Ant-only) Pre-execute predicted next input
Magic Docs USER_TYPE === 'ant' No (Ant-only) Auto-maintained living documentation

The Cache Constraint

All background services share one hard architectural rule: every forked agent must preserve the parent's prompt cache prefix. Only four parameters can safely be overridden: abortController (not sent to API), skipTranscript (client-side), skipCacheWrite (cache_control markers), canUseTool (client-side permission check).

When PR #18143 attempted to set effort: 'low' on a forked agent, cache hit rate dropped from 92.7% to 61% and cache writes spiked by 45x. The PR was reverted. This constraint means all background behavior is an expression of "what can be done without touching any API parameter."

The economic consequence: spawning five parallel background agents costs nearly the same as spawning one, because they all hit the same API cache. 92% overall prefix reuse produces measured savings of $4.85 (81% reduction) per representative task.

The Startup Path

startBackgroundHousekeeping initializes all background tasks at session start. Expensive operations (old version cleanup, message file cleanup) are deferred 10 minutes post-launch and only execute when the user has been idle for at least 1 minute. This prevents background work from impacting startup time or active usage.

Design Tensions

  1. Scheduling depends on user interaction. Background tasks only trigger when the user is actively using Claude Code. If the user hasn't opened it for 48 hours, auto-dream won't run until the next session. This is deliberate (cache constraint requires an active session) but means maintenance is tied to usage.

  2. Single scheduling point. All background pipelines funnel through handleStopHooks. If the function isn't reached (crash mid-turn, premature exit), no background work runs.

  3. Background compute doubles token cost. The hidden cost of extractMemories is that it doubles per-turn token consumption (26M vs 13M in measured traces). Users see this as unexplained cost variance.