Telemetry System

Description

The telemetry system (src/services/analytics/) provides event logging, feature flags, and observability for Claude Code. It uses a 3-layer architecture: a public event API with queue-before-sink pattern, a routing sink that dispatches to backends, and backend-specific exporters for Datadog and OpenTelemetry. GrowthBook provides feature flags and A/B testing.

Architecture

Layer 1: Public API (index.ts)

logEvent(name, metadata) and logEventAsync() are the only public entry points. Events queue in eventQueue[] until attachAnalyticsSink() is called during initialization. Drain happens async via queueMicrotask.

Type safety for PII prevention: metadata type is { [key]: boolean | number | undefined } — deliberately no strings. Logging strings (which could contain code or filepaths) requires explicit cast to AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS (a never type used as a compile-time documentation gate).

Layer 2: Routing sink (sink.ts)

initializeAnalyticsSink() creates and attaches the sink. Events are routed to: - Datadog — if tengu_log_datadog_events GrowthBook gate is enabled, not killed, and using first-party API - 1P event logging — always enabled, via OpenTelemetry

_PROTO_* metadata keys are stripped before Datadog (PII-tagged values restricted to privileged 1P columns). Event sampling via shouldSampleEvent() checks GrowthBook dynamic config tengu_event_sampling_config.

Layer 3: Backends

Datadog (datadog.ts): - Endpoint: https://http-intake.logs.us5.datadoghq.com/api/v2/logs - Client token: pubbbf48e6d78dae54bceaa4acf463299bf - Service name: claude-code - DEFAULT_FLUSH_INTERVAL_MS = 15000 (15s batch flush) - MAX_BATCH_SIZE = 100 - NETWORK_TIMEOUT_MS = 5000 - NUM_USER_BUCKETS = 30 — privacy-preserving user bucketing via SHA-256 hash - Allowed events whitelist: DATADOG_ALLOWED_EVENTS (~40 event names, all tengu_* or chrome_bridge_*) - MCP tool names normalized to 'mcp' for cardinality reduction - Only fires in production, only for first-party API provider

OpenTelemetry 1P (firstPartyEventLogger.ts + exporter): - Uses @opentelemetry/sdk-logs with BatchLogRecordProcessor - Custom FirstPartyEventLoggingExporter (implements LogRecordExporter) - Event types: ClaudeCodeInternalEvent, GrowthbookExperimentEvent (protobuf) - Failed events stored in ~/.claude/telemetry/ as JSONL files (1p_failed_events.*) - BATCH_UUID = randomUUID() per process run

GrowthBook (growthbook.ts)

Feature flags and A/B testing via GrowthBook SDK.

User attributes for targeting: id, sessionId, deviceID, platform, apiBaseUrlHost, organizationUUID, accountUUID, userType, subscriptionType, rateLimitTier, firstTokenTime, email, appVersion, github.

Event enrichment (metadata.ts)

Lazy loading

Telemetry modules are lazy-loaded to minimize startup time: - OpenTelemetry (~400KB + protobuf) loaded via await import() in init.ts - gRPC exporters (~700KB) further lazy-loaded within instrumentation - Total deferred: ~1.1MB of code

Trade-offs

  1. Queue-before-sink — events never lost during startup, but queue grows unbounded until sink attaches. A crash before sink attachment loses all queued events.
  2. No strings in metadata — effective PII prevention but makes it harder to log legitimate text data. The never-cast workaround is ugly but intentional.
  3. Datadog allowlist — only ~40 event names are forwarded, preventing accidental PII leaks but requiring manual allowlist updates for new events.
  4. 15s batch flush — reduces network overhead but means the last 15s of events may be lost on crash.
  5. Lazy loading — saves ~1.1MB on startup but means early events (before telemetry init) may lack tracing context.

Depends on

Key claims

Relations

Sources

src-20260409-a5fc157bc756, source code analysis of src/services/analytics/