Telemetry and Observability

Claude Code has extensive internal observability — 1,000+ telemetry event types, frustration detection, feature flag-controlled experiments, and model quality metrics. This infrastructure serves both product development and safety monitoring.

Telemetry Architecture

All telemetry events are prefixed with tengu_ (the primary analytics namespace, also possibly a model codename). Events cover: - Tool grants and denials - YOLO mode decisions - Permission changes - Model switches - Cache hit/miss rates - Session duration and turn counts - Background task execution

The telemetry pipeline sends events to Anthropic's analytics infrastructure. The GrowthBook feature flag system polls for updates every 6 hours (external) or 20 minutes (internal Anthropic).

Frustration Detection

userPromptKeywords.ts uses a regex to detect profanity, insults, and frustration expressions. This fires a telemetry event — it does NOT change Claude's in-session behavior.

Why regex instead of LLM inference: Zero milliseconds, zero cost, zero latency. At Claude Code's scale, adding 300-500ms per prompt for sentiment analysis would degrade the experience it's trying to measure.

What it enables: Correlation of frustration spikes with model versions, features, and task types. When Capybara v8 was deployed and false claims jumped to 29-30%, frustration telemetry would have been a leading indicator.

Privacy concern: The specific words that trigger the regex are sent upstream. This is not explained in Anthropic's privacy documentation.

Model Quality Metrics

Feature Flag System

79+ GrowthBook runtime flags control feature rollout. Examples: - tengu_onyx_plover — auto-dream, auto-memory - tengu_amber_flint — agent teams - tengu_speculation — speculative execution - tengu_chomp_inflection — prompt suggestion (public prediction display) - tengu_session_memory — session summarization - tengu_kairos_cron — scheduled tasks

Additionally, 9 build-time flags (EXTRACT_MEMORIES, AGENT_TRIGGERS, etc.) control code compilation. These are binary — either compiled in or stripped out.

The Quality Regression Crisis

The Feb-March 2026 regression demonstrated what happens when behavioral engineering breaks down: - Thinking depth collapsed 67-75% - Read:edit ratio collapsed from 6.6:1 to 2.0:1 (Claude stopped reading code before editing) - Three streaming hang bugs appeared - The @MODEL_LAUNCH counterweights lost effectiveness

Telemetry should have caught this earlier — but the regression was in reasoning quality, not in quantitative metrics that are easy to instrument.