Retry System

What it is

Claude Code's retry system handles API failures with 10 retries using exponential backoff, automatic OAuth token refresh, a 90-second watchdog timer, and intelligent model fallback. It is the resilience layer between the agent loop and the Anthropic API.

Why it exists

API calls fail for many reasons — rate limits (HTTP 529), authentication expiry, network issues, context-too-large errors. Without a robust retry system, any failure would terminate the user's session. The retry system ensures the agent loop survives transient failures and degrades gracefully under sustained load.

The model fallback (Opus to Sonnet after 3 consecutive HTTP 529 errors) is particularly important: during peak usage, Opus may be rate-limited. Rather than failing, the system automatically drops to a smaller model to keep the session alive. This mirrors the error recovery cascade in query.ts.

What depends on it

Trade-offs and limitations

Key claims

Relationships

Evidence