Token Estimation Service

Description

The token estimation service (src/services/tokenEstimation.ts) provides token counting at three accuracy levels: exact (via API), cheap (via Haiku), and rough (local heuristic). It handles different content block types (text, images, documents, tool results, thinking blocks) and supports platform-specific counting for Bedrock and Vertex AI.

Three estimation methods

1. API-based (exact)

countTokensWithAPI(content) and countMessagesTokensWithAPI(messages, tools) use anthropic.beta.messages.countTokens() for exact counts.

Constants for the counting call: TOKEN_COUNT_THINKING_BUDGET = 1024, TOKEN_COUNT_MAX_TOKENS = 2048.

2. Haiku fallback (cheap)

countTokensViaHaikuFallback(messages, tools) uses a cheap model when the primary counting API is unavailable. Returns inputTokens + cacheCreationTokens + cacheReadTokens.

Model selection logic: - Vertex global region: Sonnet (Haiku not available) - Bedrock with thinking: Sonnet (Haiku 3.5 doesn't support thinking) - Vertex with thinking: Sonnet - Default: getSmallFastModel() (Haiku)

3. Rough estimation (local, no API)

roughTokenCountEstimation(content, bytesPerToken=4)Math.round(content.length / bytesPerToken).

File-type-aware variant: roughTokenCountEstimationForFileType(content, ext) uses bytesPerTokenForFileType(): - json, jsonl, jsonc -> 2 bytes/token (JSON has many short tokens) - Everything else -> 4 bytes/token

Message-level estimation

roughTokenCountEstimationForMessages(messages) handles by content block type:

Block type Estimation method
text Rough estimation (4 bytes/token)
image / document Hardcoded 2000 tokens (IMAGE_MAX_TOKEN_SIZE)
tool_result Recursive into content blocks
tool_use name + JSON.stringify(input)
thinking Rough on block.thinking
redacted_thinking Rough on block.data
Other (server_tool_use, web_search_tool_result) JSON.stringify the block

Bedrock-specific

countTokensWithBedrock() uses @aws-sdk/client-bedrock-runtime CountTokensCommand, dynamically imported to avoid bundling the AWS SDK when not needed.

Preprocessing

stripToolSearchFieldsFromMessages() removes caller from tool_use blocks and tool_reference from tool_result content before sending for API counting. These fields are added by the tool system but aren't part of the actual API request, so they shouldn't be counted.

VCR support

All API counting calls are wrapped with withTokenCountVCR() for test recording/replay, enabling deterministic testing without actual API calls.

Trade-offs

  1. Three accuracy levels — rough estimation is instant but can be off by 2x for JSON-heavy content; API counting is exact but costs an API call. The three tiers let callers choose the right tradeoff.
  2. Hardcoded image tokens2000 tokens per image/document is a rough constant. Actual image token cost varies with resolution, but this is good enough for compaction threshold decisions.
  3. 4 bytes/token default — conservative for English text (~3.5 bytes/token typical) but reasonable. JSON at 2 bytes/token reflects its higher token density.
  4. Dynamic AWS SDK import — avoids bloating the bundle for non-Bedrock users but adds a delay on first Bedrock counting call.

Depends on

Key claims

Relations

Sources

src-20260409-a5fc157bc756, source code analysis of src/services/tokenEstimation.ts