Token Estimation Service

Entity ID: ent-20260410-e12d8d8d45d2
Type: service
Scope: shared
Status: active
Aliases: token counting, token estimation, tokenEstimation.ts

Description

The token estimation service (src/services/tokenEstimation.ts) provides token counting at three accuracy levels: exact (via API), cheap (via Haiku), and rough (local heuristic). It handles different content block types (text, images, documents, tool results, thinking blocks) and supports platform-specific counting for Bedrock and Vertex AI.

Three estimation methods

1. API-based (exact)

countTokensWithAPI(content) and countMessagesTokensWithAPI(messages, tools) use anthropic.beta.messages.countTokens() for exact counts.

Constants for the counting call: TOKEN_COUNT_THINKING_BUDGET = 1024, TOKEN_COUNT_MAX_TOKENS = 2048.

2. Haiku fallback (cheap)

countTokensViaHaikuFallback(messages, tools) uses a cheap model when the primary counting API is unavailable. Returns inputTokens + cacheCreationTokens + cacheReadTokens.

Model selection logic: - Vertex global region: Sonnet (Haiku not available) - Bedrock with thinking: Sonnet (Haiku 3.5 doesn't support thinking) - Vertex with thinking: Sonnet - Default: getSmallFastModel() (Haiku)

3. Rough estimation (local, no API)

roughTokenCountEstimation(content, bytesPerToken=4) — Math.round(content.length / bytesPerToken).

File-type-aware variant: roughTokenCountEstimationForFileType(content, ext) uses bytesPerTokenForFileType(): - json, jsonl, jsonc -> 2 bytes/token (JSON has many short tokens) - Everything else -> 4 bytes/token

Message-level estimation

roughTokenCountEstimationForMessages(messages) handles by content block type:

Block type	Estimation method
`text`	Rough estimation (4 bytes/token)
`image` / `document`	Hardcoded `2000` tokens (`IMAGE_MAX_TOKEN_SIZE`)
`tool_result`	Recursive into content blocks
`tool_use`	name + JSON.stringify(input)
`thinking`	Rough on `block.thinking`
`redacted_thinking`	Rough on `block.data`
Other (server_tool_use, web_search_tool_result)	JSON.stringify the block

Bedrock-specific

countTokensWithBedrock() uses @aws-sdk/client-bedrock-runtime CountTokensCommand, dynamically imported to avoid bundling the AWS SDK when not needed.

Preprocessing

stripToolSearchFieldsFromMessages() removes caller from tool_use blocks and tool_reference from tool_result content before sending for API counting. These fields are added by the tool system but aren't part of the actual API request, so they shouldn't be counted.

VCR support

All API counting calls are wrapped with withTokenCountVCR() for test recording/replay, enabling deterministic testing without actual API calls.

Trade-offs

Three accuracy levels — rough estimation is instant but can be off by 2x for JSON-heavy content; API counting is exact but costs an API call. The three tiers let callers choose the right tradeoff.
Hardcoded image tokens — 2000 tokens per image/document is a rough constant. Actual image token cost varies with resolution, but this is good enough for compaction threshold decisions.
4 bytes/token default — conservative for English text (~3.5 bytes/token typical) but reasonable. JSON at 2 bytes/token reflects its higher token density.
Dynamic AWS SDK import — avoids bloating the bundle for non-Bedrock users but adds a delay on first Bedrock counting call.

Depends on

Anthropic API — messages.countTokens() for exact counting
@aws-sdk/client-bedrock-runtime — Bedrock-specific counting (dynamic import)

Key claims

Three accuracy levels: exact (API), cheap (Haiku), rough (local heuristic, 4 bytes/token)
JSON files use 2 bytes/token; everything else uses 4 bytes/token
Images/documents are estimated at a fixed 2000 tokens
Bedrock SDK is dynamically imported to avoid bundle bloat

Relations

used_by auto-compact (threshold calculations)
used_by cost-tracker (token counting for cost estimation)
related_to cache-economics

Sources

src-20260409-a5fc157bc756, source code analysis of src/services/tokenEstimation.ts