Speculative Execution (Full)

Entity ID: ent-20260410-9fa6f1e52365
Type: service
Scope: shared
Status: active
Aliases: speculation, speculative-execution

Description

The most aggressive background feature in claude-code. After the model finishes responding, a prompt suggestion system predicts the user's next input (2-12 words), and the speculation engine pre-executes that predicted prompt in a copy-on-write overlay filesystem before the user presses Enter. When the user accepts the suggestion (Tab or Enter), speculated messages and file changes are injected instantly into the live session. When the user types something different, the overlay is deleted and the background API call is aborted. The feature is gated to Anthropic employees only (USER_TYPE === 'ant') and toggled via the speculationEnabled config flag (default: true). Analytics are emitted under the tengu_speculation event name.

Prediction mechanism

Prediction is handled by the prompt suggestion subsystem in promptSuggestion.ts. After each assistant turn, a forked-agent-pattern agent is spawned with a dedicated suggestion prompt. The prompt instructs the model to predict what the user would "naturally type next" in 2-12 words, matching the user's style. It runs with tools denied (via a canUseTool callback that returns deny for everything), using the parent conversation as the cache prefix to achieve high cache hit rates.

Suggestions are filtered through 13 heuristic guards before being shown:

Filter	Blocks
`done`	Bare "done"
`meta_text`	"nothing found", "no suggestion", bare "silence"
`meta_wrapped`	Text wrapped in parens/brackets
`error_message`	API error strings that leaked into output
`prefixed_label`	`Label: ...` format
`too_few_words`	Single words (except allowed set: yes, ok, push, commit, continue, etc.)
`too_many_words`	More than 12 words
`too_long`	100+ characters
`multiple_sentences`	Contains sentence breaks
`has_formatting`	Newlines or markdown
`evaluative`	"thanks", "looks good", "perfect", etc.
`claude_voice`	"Let me...", "I'll...", "Here's...", etc.

The suggestion is shown as autocomplete text in the prompt input. The user accepts by pressing Tab (sets acceptedAt) or Enter (submits directly).

Overlay filesystem

When a suggestion passes filters and speculation is enabled, startSpeculation() creates a copy-on-write overlay directory at:

~/.claude/tmp/speculation/<pid>/<speculation_id>/

The overlay isolates all file mutations during speculative execution:

First write to a file: the original is copied from the real working directory into the overlay path. The relative path is added to writtenPathsRef.
Subsequent writes: go directly to the overlay copy (already present).
Reads of previously-written files: redirected to the overlay copy so the speculated agent sees its own modifications.
Reads of unmodified files: pass through to the real filesystem (no rewrite).
Writes outside cwd: unconditionally denied (speculation_write_outside_root).

Path rewriting is implemented in the canUseTool callback passed to runForkedAgent. The callback inspects file_path, path, or notebook_path on tool inputs, computes the relative path from cwd, and rewrites the input to point at the overlay directory. The requireCanUseTool override on the subagent context ensures this callback runs for every tool invocation, even when hooks would normally auto-approve.

Tool permission tiers during speculation

Tool category	Tools	Behavior
Safe read-only	Read, Glob, Grep, ToolSearch, LSP, TaskGet, TaskList	Always allowed, reads from overlay if file was previously written
Write tools	Edit, Write, NotebookEdit	Redirected to overlay; stopped if permission mode is not `acceptEdits` or `bypassPermissions`
Bash (read-only)	`git status`, `ls`, etc.	Allowed if `checkReadOnlyConstraints` passes
Bash (mutating)	`rm`, `git commit`, etc.	Denied; speculation halts at this boundary
All other tools	WebFetch, AgentTool, etc.	Denied; speculation halts

When speculation halts at a boundary (bash, edit permission, or denied tool), the boundary type, tool name, and detail are recorded in CompletionBoundary. The speculation remains active with partial results, which are still usable on acceptance.

Cache reuse

Speculation achieves high cache reuse by inheriting CacheSafeParams from the parent query context. The forked agent sends byte-identical system prompt, tool definitions, model selection, and conversation prefix. Key design constraints that protect cache alignment:

No parameter overrides: PR #18143 attempted to set effortValue:'low' on the suggestion fork, causing a 45x spike in cache writes (hit rate dropped from 92.7% to 61%). The code now carries an explicit warning: "DO NOT override any API parameter that differs from the parent request."
skipCacheWrite: true on the suggestion fork: avoids writing new cache entries for ephemeral fire-and-forget calls.
Prompt cache break detection (promptCacheBreakDetection.ts) explicitly excludes speculation/prompt_suggestion/session_memory forks from tracking, as they are short-lived agents with fresh agentId values each time.

Acceptance flow

When the user accepts a suggestion that has an active speculation (speculation.status === 'active'):

handleSpeculationAccept is called from PromptInput.tsx via onSubmit.
prepareMessagesForInjection filters speculated messages: strips thinking blocks, removes tool_use/tool_result pairs where the result was an error or interruption, and drops standalone interrupt messages.
acceptSpeculation copies overlay files back to the real filesystem via copyOverlayToMain, then deletes the overlay.
Clean messages are injected into the conversation via setMessages.
File state from speculated reads is merged into the parent's readFileState cache.
If speculation completed fully (boundary.type === 'complete'), no follow-up API call is needed (queryRequired: false). If it hit a boundary, a follow-up query runs with the injected messages.
Time saved is accumulated in speculationSessionTimeSavedMs and recorded to the transcript as a speculation-accept entry.

For Anthropic employees, a feedback message is injected: [ANT-ONLY] Speculated 3 tool uses . 1,234 tokens . +2.1s saved (5.4s this session).

Rollback on misprediction

When the user types anything (any keystroke in the prompt input), abortSpeculation is called immediately:

The AbortController is aborted, canceling the in-flight API call.
The overlay directory is removed via safeRemoveOverlay (rm -rf with retries).
Speculation state resets to IDLE_SPECULATION_STATE.
The tengu_speculation event is logged with outcome: 'aborted' and abort_reason: 'user_typed'.

The user can also abort via Escape or Ctrl+C (both call abortSpeculation). No speculated changes reach the real filesystem on abort.

Recursive pipelining

When a speculation completes fully (boundary.type === 'complete'), generatePipelinedSuggestion immediately generates the next suggestion using the augmented context (original conversation + speculated messages). If a pipelined suggestion is ready when the user accepts, handleSpeculationAccept promotes it to the visible prompt suggestion and starts a new speculation with isPipelined = true. This creates a chain: predict-execute-predict-execute, staying multiple steps ahead of the user.

Safety limits

Limit	Value	Enforcement
Max turns per speculation	20	`MAX_SPECULATION_TURNS` passed to `runForkedAgent`
Max messages per speculation	100	`MAX_SPECULATION_MESSAGES`, checked in `onMessage` callback
Writes outside cwd	Blocked	Relative path check in `canUseTool`
Non-read-only bash	Blocked	`checkReadOnlyConstraints` validation
Edit without permission	Blocked	Permission mode check before allowing write tools

Error handling

Speculation is designed to fail open. If handleSpeculationAccept throws, the error is logged but the system falls back to normal query flow (queryRequired: true). The overlay is cleaned up, speculation state resets, and the user's message is processed as if no speculation existed.

Telemetry

The tengu_speculation event captures:

speculation_id, outcome (accepted/aborted/error), duration_ms
suggestion_length, tools_executed, completed (boolean)
boundary_type, boundary_tool, boundary_detail
time_saved_ms, message_count, is_pipelined
error_type, error_message, error_phase (start/accept) on failure
abort_reason on abort

Session-level time saved is aggregated in totalSpeculationTimeSavedMs via transcript speculation-accept entries, visible in the /stats command for Anthropic employees.

Key claims

clm-20260410-spec-overlay-cow: File writes during speculation use a copy-on-write overlay at ~/.claude/tmp/speculation/<pid>/<id>/; the real codebase is never modified until the user accepts.
clm-20260410-spec-cache-reuse: Forked agent inherits byte-identical CacheSafeParams from the parent to maximize prompt cache hits. Overriding any API parameter (even effort) can cause 45x cache write spikes.
clm-20260410-spec-pipeline: On completion, speculation immediately generates the next prediction and starts executing it (isPipelined = true), attempting to stay multiple steps ahead.
clm-20260410-spec-ant-only: Feature is gated to USER_TYPE === 'ant' with a speculationEnabled config toggle (default true). Not available to external users.
clm-20260410-spec-prediction: Prediction prompt generates 2-12 word suggestions matching user style, filtered through 13 heuristic guards before display.
clm-20260410-spec-fail-open: On any error during acceptance, the system falls back to normal query flow; speculated work is discarded and the user's input is processed normally.

Relations

forked-agent-pattern -- speculation uses runForkedAgent with CacheSafeParams sharing for cache-aligned execution
cache-economics -- speculation's cost model depends entirely on prompt cache hits; breaking cache alignment makes it uneconomical
bash-security -- speculative bash classifier checks (startSpeculativeClassifierCheck) run in parallel with permission setup; speculation itself uses checkReadOnlyConstraints for bash gating
growthbook -- prompt suggestion enablement gated by tengu_chomp_inflection feature flag
permission-pipeline -- speculation checks the permission mode (acceptEdits/bypassPermissions) before allowing write tools

Sources

src-20260409-e9925330d110 - src/services/PromptSuggestion/speculation.ts -- core speculation engine (992 lines) - src/services/PromptSuggestion/promptSuggestion.ts -- prediction/suggestion generation and filtering - src/utils/forkedAgent.ts -- shared forked agent infrastructure - src/components/PromptInput/PromptInput.tsx -- UI integration, accept/abort handling - src/screens/REPL.tsx -- handleSpeculationAccept integration, clearSpeculativeChecks on turn end - src/state/AppStateStore.ts -- SpeculationState, CompletionBoundary, SpeculationResult types - src/utils/config.ts -- speculationEnabled config field - src/utils/stats.ts -- totalSpeculationTimeSavedMs aggregation from transcript entries - src/components/Settings/Config.tsx -- ant-only speculation toggle in settings UI - src/tools/BashTool/bashPermissions.ts -- speculative bash classifier check lifecycle - src/services/tools/toolExecution.ts -- startSpeculativeClassifierCheck invocation before tool execution