Telemetry System
- Entity ID:
ent-20260410-613817e31735 - Type:
service - Scope:
shared - Status:
active - Aliases: analytics, telemetry, GrowthBook, event logging, observability
Description
The telemetry system (src/services/analytics/) provides event logging, feature flags, and observability for Claude Code. It uses a 3-layer architecture: a public event API with queue-before-sink pattern, a routing sink that dispatches to backends, and backend-specific exporters for Datadog and OpenTelemetry. GrowthBook provides feature flags and A/B testing.
Architecture
Layer 1: Public API (index.ts)
logEvent(name, metadata) and logEventAsync() are the only public entry points. Events queue in eventQueue[] until attachAnalyticsSink() is called during initialization. Drain happens async via queueMicrotask.
Type safety for PII prevention: metadata type is { [key]: boolean | number | undefined } — deliberately no strings. Logging strings (which could contain code or filepaths) requires explicit cast to AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS (a never type used as a compile-time documentation gate).
Layer 2: Routing sink (sink.ts)
initializeAnalyticsSink() creates and attaches the sink. Events are routed to:
- Datadog — if tengu_log_datadog_events GrowthBook gate is enabled, not killed, and using first-party API
- 1P event logging — always enabled, via OpenTelemetry
_PROTO_* metadata keys are stripped before Datadog (PII-tagged values restricted to privileged 1P columns). Event sampling via shouldSampleEvent() checks GrowthBook dynamic config tengu_event_sampling_config.
Layer 3: Backends
Datadog (datadog.ts):
- Endpoint: https://http-intake.logs.us5.datadoghq.com/api/v2/logs
- Client token: pubbbf48e6d78dae54bceaa4acf463299bf
- Service name: claude-code
- DEFAULT_FLUSH_INTERVAL_MS = 15000 (15s batch flush)
- MAX_BATCH_SIZE = 100
- NETWORK_TIMEOUT_MS = 5000
- NUM_USER_BUCKETS = 30 — privacy-preserving user bucketing via SHA-256 hash
- Allowed events whitelist: DATADOG_ALLOWED_EVENTS (~40 event names, all tengu_* or chrome_bridge_*)
- MCP tool names normalized to 'mcp' for cardinality reduction
- Only fires in production, only for first-party API provider
OpenTelemetry 1P (firstPartyEventLogger.ts + exporter):
- Uses @opentelemetry/sdk-logs with BatchLogRecordProcessor
- Custom FirstPartyEventLoggingExporter (implements LogRecordExporter)
- Event types: ClaudeCodeInternalEvent, GrowthbookExperimentEvent (protobuf)
- Failed events stored in ~/.claude/telemetry/ as JSONL files (1p_failed_events.*)
- BATCH_UUID = randomUUID() per process run
GrowthBook (growthbook.ts)
Feature flags and A/B testing via GrowthBook SDK.
User attributes for targeting: id, sessionId, deviceID, platform, apiBaseUrlHost, organizationUUID, accountUUID, userType, subscriptionType, rateLimitTier, firstTokenTime, email, appVersion, github.
- Re-initialization on auth change (tracked via
clientCreatedWithAuth) - Env overrides:
CLAUDE_INTERNAL_FC_OVERRIDES(ant-only, JSON object) - Exposure dedup:
loggedExposuresSet prevents duplicate experiment logs onGrowthBookRefresh(listener)— callback for long-lived objects that bake feature values
Event enrichment (metadata.ts)
sanitizeToolNameForAnalytics(toolName)— MCP tools become'mcp_tool'isToolDetailsLoggingEnabled()— checksOTEL_LOG_TOOL_DETAILS=1- Detailed logging allowed only for: Cowork (local-agent), claude.ai connectors, official MCP registry URLs
Lazy loading
Telemetry modules are lazy-loaded to minimize startup time:
- OpenTelemetry (~400KB + protobuf) loaded via await import() in init.ts
- gRPC exporters (~700KB) further lazy-loaded within instrumentation
- Total deferred: ~1.1MB of code
Trade-offs
- Queue-before-sink — events never lost during startup, but queue grows unbounded until sink attaches. A crash before sink attachment loses all queued events.
- No strings in metadata — effective PII prevention but makes it harder to log legitimate text data. The
never-cast workaround is ugly but intentional. - Datadog allowlist — only ~40 event names are forwarded, preventing accidental PII leaks but requiring manual allowlist updates for new events.
- 15s batch flush — reduces network overhead but means the last 15s of events may be lost on crash.
- Lazy loading — saves ~1.1MB on startup but means early events (before telemetry init) may lack tracing context.
Depends on
- GrowthBook SDK — feature flags and experiment tracking
@opentelemetry/sdk-logs— log export- Datadog Logs API — event ingestion
Key claims
- No-string metadata type prevents accidental PII logging at compile time
- Datadog allowlist restricts forwarding to ~40 approved event names
- Failed OTel events are persisted to disk as JSONL for retry
- GrowthBook re-initializes on auth change to pick up org-specific flags
- ~1.1MB of telemetry code is lazy-loaded after startup
Relations
used_bycost-tracker (OpenTelemetry counters)used_byservice-layer (all services log through analytics)depends_onGrowthBook SDKdepends_onOpenTelemetry SDK
Sources
src-20260409-a5fc157bc756, source code analysis of src/services/analytics/