Speculative Execution (Full)

Description

The most aggressive background feature in claude-code. After the model finishes responding, a prompt suggestion system predicts the user's next input (2-12 words), and the speculation engine pre-executes that predicted prompt in a copy-on-write overlay filesystem before the user presses Enter. When the user accepts the suggestion (Tab or Enter), speculated messages and file changes are injected instantly into the live session. When the user types something different, the overlay is deleted and the background API call is aborted. The feature is gated to Anthropic employees only (USER_TYPE === 'ant') and toggled via the speculationEnabled config flag (default: true). Analytics are emitted under the tengu_speculation event name.

Prediction mechanism

Prediction is handled by the prompt suggestion subsystem in promptSuggestion.ts. After each assistant turn, a forked-agent-pattern agent is spawned with a dedicated suggestion prompt. The prompt instructs the model to predict what the user would "naturally type next" in 2-12 words, matching the user's style. It runs with tools denied (via a canUseTool callback that returns deny for everything), using the parent conversation as the cache prefix to achieve high cache hit rates.

Suggestions are filtered through 13 heuristic guards before being shown:

Filter Blocks
done Bare "done"
meta_text "nothing found", "no suggestion", bare "silence"
meta_wrapped Text wrapped in parens/brackets
error_message API error strings that leaked into output
prefixed_label Label: ... format
too_few_words Single words (except allowed set: yes, ok, push, commit, continue, etc.)
too_many_words More than 12 words
too_long 100+ characters
multiple_sentences Contains sentence breaks
has_formatting Newlines or markdown
evaluative "thanks", "looks good", "perfect", etc.
claude_voice "Let me...", "I'll...", "Here's...", etc.

The suggestion is shown as autocomplete text in the prompt input. The user accepts by pressing Tab (sets acceptedAt) or Enter (submits directly).

Overlay filesystem

When a suggestion passes filters and speculation is enabled, startSpeculation() creates a copy-on-write overlay directory at:

~/.claude/tmp/speculation/<pid>/<speculation_id>/

The overlay isolates all file mutations during speculative execution:

  1. First write to a file: the original is copied from the real working directory into the overlay path. The relative path is added to writtenPathsRef.
  2. Subsequent writes: go directly to the overlay copy (already present).
  3. Reads of previously-written files: redirected to the overlay copy so the speculated agent sees its own modifications.
  4. Reads of unmodified files: pass through to the real filesystem (no rewrite).
  5. Writes outside cwd: unconditionally denied (speculation_write_outside_root).

Path rewriting is implemented in the canUseTool callback passed to runForkedAgent. The callback inspects file_path, path, or notebook_path on tool inputs, computes the relative path from cwd, and rewrites the input to point at the overlay directory. The requireCanUseTool override on the subagent context ensures this callback runs for every tool invocation, even when hooks would normally auto-approve.

Tool permission tiers during speculation

Tool category Tools Behavior
Safe read-only Read, Glob, Grep, ToolSearch, LSP, TaskGet, TaskList Always allowed, reads from overlay if file was previously written
Write tools Edit, Write, NotebookEdit Redirected to overlay; stopped if permission mode is not acceptEdits or bypassPermissions
Bash (read-only) git status, ls, etc. Allowed if checkReadOnlyConstraints passes
Bash (mutating) rm, git commit, etc. Denied; speculation halts at this boundary
All other tools WebFetch, AgentTool, etc. Denied; speculation halts

When speculation halts at a boundary (bash, edit permission, or denied tool), the boundary type, tool name, and detail are recorded in CompletionBoundary. The speculation remains active with partial results, which are still usable on acceptance.

Cache reuse

Speculation achieves high cache reuse by inheriting CacheSafeParams from the parent query context. The forked agent sends byte-identical system prompt, tool definitions, model selection, and conversation prefix. Key design constraints that protect cache alignment:

Acceptance flow

When the user accepts a suggestion that has an active speculation (speculation.status === 'active'):

  1. handleSpeculationAccept is called from PromptInput.tsx via onSubmit.
  2. prepareMessagesForInjection filters speculated messages: strips thinking blocks, removes tool_use/tool_result pairs where the result was an error or interruption, and drops standalone interrupt messages.
  3. acceptSpeculation copies overlay files back to the real filesystem via copyOverlayToMain, then deletes the overlay.
  4. Clean messages are injected into the conversation via setMessages.
  5. File state from speculated reads is merged into the parent's readFileState cache.
  6. If speculation completed fully (boundary.type === 'complete'), no follow-up API call is needed (queryRequired: false). If it hit a boundary, a follow-up query runs with the injected messages.
  7. Time saved is accumulated in speculationSessionTimeSavedMs and recorded to the transcript as a speculation-accept entry.

For Anthropic employees, a feedback message is injected: [ANT-ONLY] Speculated 3 tool uses . 1,234 tokens . +2.1s saved (5.4s this session).

Rollback on misprediction

When the user types anything (any keystroke in the prompt input), abortSpeculation is called immediately:

  1. The AbortController is aborted, canceling the in-flight API call.
  2. The overlay directory is removed via safeRemoveOverlay (rm -rf with retries).
  3. Speculation state resets to IDLE_SPECULATION_STATE.
  4. The tengu_speculation event is logged with outcome: 'aborted' and abort_reason: 'user_typed'.

The user can also abort via Escape or Ctrl+C (both call abortSpeculation). No speculated changes reach the real filesystem on abort.

Recursive pipelining

When a speculation completes fully (boundary.type === 'complete'), generatePipelinedSuggestion immediately generates the next suggestion using the augmented context (original conversation + speculated messages). If a pipelined suggestion is ready when the user accepts, handleSpeculationAccept promotes it to the visible prompt suggestion and starts a new speculation with isPipelined = true. This creates a chain: predict-execute-predict-execute, staying multiple steps ahead of the user.

Safety limits

Limit Value Enforcement
Max turns per speculation 20 MAX_SPECULATION_TURNS passed to runForkedAgent
Max messages per speculation 100 MAX_SPECULATION_MESSAGES, checked in onMessage callback
Writes outside cwd Blocked Relative path check in canUseTool
Non-read-only bash Blocked checkReadOnlyConstraints validation
Edit without permission Blocked Permission mode check before allowing write tools

Error handling

Speculation is designed to fail open. If handleSpeculationAccept throws, the error is logged but the system falls back to normal query flow (queryRequired: true). The overlay is cleaned up, speculation state resets, and the user's message is processed as if no speculation existed.

Telemetry

The tengu_speculation event captures:

Session-level time saved is aggregated in totalSpeculationTimeSavedMs via transcript speculation-accept entries, visible in the /stats command for Anthropic employees.

Key claims

Relations

Sources

src-20260409-e9925330d110 - src/services/PromptSuggestion/speculation.ts -- core speculation engine (992 lines) - src/services/PromptSuggestion/promptSuggestion.ts -- prediction/suggestion generation and filtering - src/utils/forkedAgent.ts -- shared forked agent infrastructure - src/components/PromptInput/PromptInput.tsx -- UI integration, accept/abort handling - src/screens/REPL.tsx -- handleSpeculationAccept integration, clearSpeculativeChecks on turn end - src/state/AppStateStore.ts -- SpeculationState, CompletionBoundary, SpeculationResult types - src/utils/config.ts -- speculationEnabled config field - src/utils/stats.ts -- totalSpeculationTimeSavedMs aggregation from transcript entries - src/components/Settings/Config.tsx -- ant-only speculation toggle in settings UI - src/tools/BashTool/bashPermissions.ts -- speculative bash classifier check lifecycle - src/services/tools/toolExecution.ts -- startSpeculativeClassifierCheck invocation before tool execution