Token Estimation Service
- Entity ID:
ent-20260410-e12d8d8d45d2 - Type:
service - Scope:
shared - Status:
active - Aliases: token counting, token estimation, tokenEstimation.ts
Description
The token estimation service (src/services/tokenEstimation.ts) provides token counting at three accuracy levels: exact (via API), cheap (via Haiku), and rough (local heuristic). It handles different content block types (text, images, documents, tool results, thinking blocks) and supports platform-specific counting for Bedrock and Vertex AI.
Three estimation methods
1. API-based (exact)
countTokensWithAPI(content) and countMessagesTokensWithAPI(messages, tools) use anthropic.beta.messages.countTokens() for exact counts.
Constants for the counting call: TOKEN_COUNT_THINKING_BUDGET = 1024, TOKEN_COUNT_MAX_TOKENS = 2048.
2. Haiku fallback (cheap)
countTokensViaHaikuFallback(messages, tools) uses a cheap model when the primary counting API is unavailable. Returns inputTokens + cacheCreationTokens + cacheReadTokens.
Model selection logic:
- Vertex global region: Sonnet (Haiku not available)
- Bedrock with thinking: Sonnet (Haiku 3.5 doesn't support thinking)
- Vertex with thinking: Sonnet
- Default: getSmallFastModel() (Haiku)
3. Rough estimation (local, no API)
roughTokenCountEstimation(content, bytesPerToken=4) — Math.round(content.length / bytesPerToken).
File-type-aware variant: roughTokenCountEstimationForFileType(content, ext) uses bytesPerTokenForFileType():
- json, jsonl, jsonc -> 2 bytes/token (JSON has many short tokens)
- Everything else -> 4 bytes/token
Message-level estimation
roughTokenCountEstimationForMessages(messages) handles by content block type:
| Block type | Estimation method |
|---|---|
text |
Rough estimation (4 bytes/token) |
image / document |
Hardcoded 2000 tokens (IMAGE_MAX_TOKEN_SIZE) |
tool_result |
Recursive into content blocks |
tool_use |
name + JSON.stringify(input) |
thinking |
Rough on block.thinking |
redacted_thinking |
Rough on block.data |
| Other (server_tool_use, web_search_tool_result) | JSON.stringify the block |
Bedrock-specific
countTokensWithBedrock() uses @aws-sdk/client-bedrock-runtime CountTokensCommand, dynamically imported to avoid bundling the AWS SDK when not needed.
Preprocessing
stripToolSearchFieldsFromMessages() removes caller from tool_use blocks and tool_reference from tool_result content before sending for API counting. These fields are added by the tool system but aren't part of the actual API request, so they shouldn't be counted.
VCR support
All API counting calls are wrapped with withTokenCountVCR() for test recording/replay, enabling deterministic testing without actual API calls.
Trade-offs
- Three accuracy levels — rough estimation is instant but can be off by 2x for JSON-heavy content; API counting is exact but costs an API call. The three tiers let callers choose the right tradeoff.
- Hardcoded image tokens —
2000tokens per image/document is a rough constant. Actual image token cost varies with resolution, but this is good enough for compaction threshold decisions. - 4 bytes/token default — conservative for English text (~3.5 bytes/token typical) but reasonable. JSON at 2 bytes/token reflects its higher token density.
- Dynamic AWS SDK import — avoids bloating the bundle for non-Bedrock users but adds a delay on first Bedrock counting call.
Depends on
- Anthropic API —
messages.countTokens()for exact counting @aws-sdk/client-bedrock-runtime— Bedrock-specific counting (dynamic import)
Key claims
- Three accuracy levels: exact (API), cheap (Haiku), rough (local heuristic, 4 bytes/token)
- JSON files use 2 bytes/token; everything else uses 4 bytes/token
- Images/documents are estimated at a fixed 2000 tokens
- Bedrock SDK is dynamically imported to avoid bundle bloat
Relations
used_byauto-compact (threshold calculations)used_bycost-tracker (token counting for cost estimation)related_tocache-economics
Sources
src-20260409-a5fc157bc756, source code analysis of src/services/tokenEstimation.ts