Speculative Execution (Full)
- Entity ID:
ent-20260410-9fa6f1e52365 - Type:
service - Scope:
shared - Status:
active - Aliases: speculation, speculative-execution
Description
The most aggressive background feature in claude-code. After the model finishes responding, a prompt suggestion system predicts the user's next input (2-12 words), and the speculation engine pre-executes that predicted prompt in a copy-on-write overlay filesystem before the user presses Enter. When the user accepts the suggestion (Tab or Enter), speculated messages and file changes are injected instantly into the live session. When the user types something different, the overlay is deleted and the background API call is aborted. The feature is gated to Anthropic employees only (USER_TYPE === 'ant') and toggled via the speculationEnabled config flag (default: true). Analytics are emitted under the tengu_speculation event name.
Prediction mechanism
Prediction is handled by the prompt suggestion subsystem in promptSuggestion.ts. After each assistant turn, a forked-agent-pattern agent is spawned with a dedicated suggestion prompt. The prompt instructs the model to predict what the user would "naturally type next" in 2-12 words, matching the user's style. It runs with tools denied (via a canUseTool callback that returns deny for everything), using the parent conversation as the cache prefix to achieve high cache hit rates.
Suggestions are filtered through 13 heuristic guards before being shown:
| Filter | Blocks |
|---|---|
done |
Bare "done" |
meta_text |
"nothing found", "no suggestion", bare "silence" |
meta_wrapped |
Text wrapped in parens/brackets |
error_message |
API error strings that leaked into output |
prefixed_label |
Label: ... format |
too_few_words |
Single words (except allowed set: yes, ok, push, commit, continue, etc.) |
too_many_words |
More than 12 words |
too_long |
100+ characters |
multiple_sentences |
Contains sentence breaks |
has_formatting |
Newlines or markdown |
evaluative |
"thanks", "looks good", "perfect", etc. |
claude_voice |
"Let me...", "I'll...", "Here's...", etc. |
The suggestion is shown as autocomplete text in the prompt input. The user accepts by pressing Tab (sets acceptedAt) or Enter (submits directly).
Overlay filesystem
When a suggestion passes filters and speculation is enabled, startSpeculation() creates a copy-on-write overlay directory at:
~/.claude/tmp/speculation/<pid>/<speculation_id>/
The overlay isolates all file mutations during speculative execution:
- First write to a file: the original is copied from the real working directory into the overlay path. The relative path is added to
writtenPathsRef. - Subsequent writes: go directly to the overlay copy (already present).
- Reads of previously-written files: redirected to the overlay copy so the speculated agent sees its own modifications.
- Reads of unmodified files: pass through to the real filesystem (no rewrite).
- Writes outside
cwd: unconditionally denied (speculation_write_outside_root).
Path rewriting is implemented in the canUseTool callback passed to runForkedAgent. The callback inspects file_path, path, or notebook_path on tool inputs, computes the relative path from cwd, and rewrites the input to point at the overlay directory. The requireCanUseTool override on the subagent context ensures this callback runs for every tool invocation, even when hooks would normally auto-approve.
Tool permission tiers during speculation
| Tool category | Tools | Behavior |
|---|---|---|
| Safe read-only | Read, Glob, Grep, ToolSearch, LSP, TaskGet, TaskList | Always allowed, reads from overlay if file was previously written |
| Write tools | Edit, Write, NotebookEdit | Redirected to overlay; stopped if permission mode is not acceptEdits or bypassPermissions |
| Bash (read-only) | git status, ls, etc. |
Allowed if checkReadOnlyConstraints passes |
| Bash (mutating) | rm, git commit, etc. |
Denied; speculation halts at this boundary |
| All other tools | WebFetch, AgentTool, etc. | Denied; speculation halts |
When speculation halts at a boundary (bash, edit permission, or denied tool), the boundary type, tool name, and detail are recorded in CompletionBoundary. The speculation remains active with partial results, which are still usable on acceptance.
Cache reuse
Speculation achieves high cache reuse by inheriting CacheSafeParams from the parent query context. The forked agent sends byte-identical system prompt, tool definitions, model selection, and conversation prefix. Key design constraints that protect cache alignment:
- No parameter overrides: PR #18143 attempted to set
effortValue:'low'on the suggestion fork, causing a 45x spike in cache writes (hit rate dropped from 92.7% to 61%). The code now carries an explicit warning: "DO NOT override any API parameter that differs from the parent request." skipCacheWrite: trueon the suggestion fork: avoids writing new cache entries for ephemeral fire-and-forget calls.- Prompt cache break detection (
promptCacheBreakDetection.ts) explicitly excludes speculation/prompt_suggestion/session_memory forks from tracking, as they are short-lived agents with freshagentIdvalues each time.
Acceptance flow
When the user accepts a suggestion that has an active speculation (speculation.status === 'active'):
handleSpeculationAcceptis called fromPromptInput.tsxviaonSubmit.prepareMessagesForInjectionfilters speculated messages: strips thinking blocks, removes tool_use/tool_result pairs where the result was an error or interruption, and drops standalone interrupt messages.acceptSpeculationcopies overlay files back to the real filesystem viacopyOverlayToMain, then deletes the overlay.- Clean messages are injected into the conversation via
setMessages. - File state from speculated reads is merged into the parent's
readFileStatecache. - If speculation completed fully (
boundary.type === 'complete'), no follow-up API call is needed (queryRequired: false). If it hit a boundary, a follow-up query runs with the injected messages. - Time saved is accumulated in
speculationSessionTimeSavedMsand recorded to the transcript as aspeculation-acceptentry.
For Anthropic employees, a feedback message is injected: [ANT-ONLY] Speculated 3 tool uses . 1,234 tokens . +2.1s saved (5.4s this session).
Rollback on misprediction
When the user types anything (any keystroke in the prompt input), abortSpeculation is called immediately:
- The
AbortControlleris aborted, canceling the in-flight API call. - The overlay directory is removed via
safeRemoveOverlay(rm -rfwith retries). - Speculation state resets to
IDLE_SPECULATION_STATE. - The
tengu_speculationevent is logged withoutcome: 'aborted'andabort_reason: 'user_typed'.
The user can also abort via Escape or Ctrl+C (both call abortSpeculation). No speculated changes reach the real filesystem on abort.
Recursive pipelining
When a speculation completes fully (boundary.type === 'complete'), generatePipelinedSuggestion immediately generates the next suggestion using the augmented context (original conversation + speculated messages). If a pipelined suggestion is ready when the user accepts, handleSpeculationAccept promotes it to the visible prompt suggestion and starts a new speculation with isPipelined = true. This creates a chain: predict-execute-predict-execute, staying multiple steps ahead of the user.
Safety limits
| Limit | Value | Enforcement |
|---|---|---|
| Max turns per speculation | 20 | MAX_SPECULATION_TURNS passed to runForkedAgent |
| Max messages per speculation | 100 | MAX_SPECULATION_MESSAGES, checked in onMessage callback |
| Writes outside cwd | Blocked | Relative path check in canUseTool |
| Non-read-only bash | Blocked | checkReadOnlyConstraints validation |
| Edit without permission | Blocked | Permission mode check before allowing write tools |
Error handling
Speculation is designed to fail open. If handleSpeculationAccept throws, the error is logged but the system falls back to normal query flow (queryRequired: true). The overlay is cleaned up, speculation state resets, and the user's message is processed as if no speculation existed.
Telemetry
The tengu_speculation event captures:
speculation_id,outcome(accepted/aborted/error),duration_mssuggestion_length,tools_executed,completed(boolean)boundary_type,boundary_tool,boundary_detailtime_saved_ms,message_count,is_pipelinederror_type,error_message,error_phase(start/accept) on failureabort_reasonon abort
Session-level time saved is aggregated in totalSpeculationTimeSavedMs via transcript speculation-accept entries, visible in the /stats command for Anthropic employees.
Key claims
clm-20260410-spec-overlay-cow: File writes during speculation use a copy-on-write overlay at~/.claude/tmp/speculation/<pid>/<id>/; the real codebase is never modified until the user accepts.clm-20260410-spec-cache-reuse: Forked agent inherits byte-identicalCacheSafeParamsfrom the parent to maximize prompt cache hits. Overriding any API parameter (eveneffort) can cause 45x cache write spikes.clm-20260410-spec-pipeline: On completion, speculation immediately generates the next prediction and starts executing it (isPipelined = true), attempting to stay multiple steps ahead.clm-20260410-spec-ant-only: Feature is gated toUSER_TYPE === 'ant'with aspeculationEnabledconfig toggle (default true). Not available to external users.clm-20260410-spec-prediction: Prediction prompt generates 2-12 word suggestions matching user style, filtered through 13 heuristic guards before display.clm-20260410-spec-fail-open: On any error during acceptance, the system falls back to normal query flow; speculated work is discarded and the user's input is processed normally.
Relations
- forked-agent-pattern -- speculation uses
runForkedAgentwithCacheSafeParamssharing for cache-aligned execution - cache-economics -- speculation's cost model depends entirely on prompt cache hits; breaking cache alignment makes it uneconomical
- bash-security -- speculative bash classifier checks (
startSpeculativeClassifierCheck) run in parallel with permission setup; speculation itself usescheckReadOnlyConstraintsfor bash gating - growthbook -- prompt suggestion enablement gated by
tengu_chomp_inflectionfeature flag - permission-pipeline -- speculation checks the permission mode (
acceptEdits/bypassPermissions) before allowing write tools
Sources
src-20260409-e9925330d110
- src/services/PromptSuggestion/speculation.ts -- core speculation engine (992 lines)
- src/services/PromptSuggestion/promptSuggestion.ts -- prediction/suggestion generation and filtering
- src/utils/forkedAgent.ts -- shared forked agent infrastructure
- src/components/PromptInput/PromptInput.tsx -- UI integration, accept/abort handling
- src/screens/REPL.tsx -- handleSpeculationAccept integration, clearSpeculativeChecks on turn end
- src/state/AppStateStore.ts -- SpeculationState, CompletionBoundary, SpeculationResult types
- src/utils/config.ts -- speculationEnabled config field
- src/utils/stats.ts -- totalSpeculationTimeSavedMs aggregation from transcript entries
- src/components/Settings/Config.tsx -- ant-only speculation toggle in settings UI
- src/tools/BashTool/bashPermissions.ts -- speculative bash classifier check lifecycle
- src/services/tools/toolExecution.ts -- startSpeculativeClassifierCheck invocation before tool execution