ToolSearch System
- Entity ID:
ent-20260409-49515b9f7b7b - Type:
service - Scope:
shared - Status:
active - Aliases: ToolSearch, lazy tool loading, tool discovery
Description
The ToolSearch System implements lazy-loading tool discovery that dramatically reduces context window consumption by replacing eager tool loading with on-demand retrieval. Instead of including full schema definitions for all available tools in every API request (which consumed approximately 77,000 tokens), ToolSearch maintains a lightweight index of tool names and brief descriptions costing only 8,700 tokens -- an 89% reduction. Full tool schemas are loaded on demand via BM25 similarity search when the model identifies a relevant tool, accessed through the tool_search_tool_regex_20251119 beta header.
Activation Threshold
ToolSearch activates at the 10% context window consumption threshold. During the initial portion of a conversation (before 10% of the context window is used), all tools are loaded eagerly in the traditional manner. Once the conversation grows past 10%, the system switches to lazy-loading mode, replacing full tool definitions with the compact index. This threshold balances two concerns: early in a conversation, having all tools immediately available improves first-response quality; later, conserving context space for actual conversation content becomes more important.
Token Economics
| Mode | Token Cost | Tools Available |
|---|---|---|
| Eager loading (< 10% context) | ~77,000 tokens | All tools with full schemas |
| Lazy loading (>= 10% context) | ~8,700 tokens | Lightweight index + on-demand schema loading |
| Savings | ~68,300 tokens | ~89% reduction in tool definition overhead |
This savings is significant because tool definitions compete with conversation content for the same context window. In long sessions, especially those involving the compaction-pipeline, the 68K token savings can mean the difference between maintaining useful context and hitting the compaction threshold prematurely.
BM25 Similarity Matching
When the model needs a tool in lazy-loading mode, it calls the ToolSearch tool with a natural language query. The system uses BM25 similarity scoring to rank tools against the query and returns the full schema definitions for the top matches. BM25 was chosen over embedding-based similarity because it requires no external model calls, adds negligible latency, and works well for the relatively small tool corpus (40+ tools).
The search supports multiple query forms:
- Direct selection: select:Read,Edit,Grep fetches specific tools by name.
- Keyword search: notebook jupyter returns semantically relevant matches.
- Name-required search: +slack send requires "slack" in the tool name and ranks by remaining terms.
Interaction with MCP and Plugins
Tools registered by MCP servers and the plugin-system are included in the ToolSearch index alongside built-in tools. As the tool count grows (through MCP connections and plugin installations), the token savings from lazy loading become even more pronounced. Without ToolSearch, a session with 100+ MCP tools would consume a substantial fraction of the context window just for tool definitions.
Integration
ToolSearch interacts with the system-prompt-assembly system, which conditionally includes either full tool definitions or the lightweight index based on context consumption. The anti-distillation-defenses benefit from ToolSearch because the lazy-loading mechanism means that decoy tools are not always visible, making tool enumeration harder. The cache-economics system treats the tool definition section as a cache-stable prefix when using eager loading, improving prompt cache hit rates.
Key claims
- none yet
Relations
- none yet
Sources
src-20260409-0929a9552e6b