ToolSearch System

Description

The ToolSearch System implements lazy-loading tool discovery that dramatically reduces context window consumption by replacing eager tool loading with on-demand retrieval. Instead of including full schema definitions for all available tools in every API request (which consumed approximately 77,000 tokens), ToolSearch maintains a lightweight index of tool names and brief descriptions costing only 8,700 tokens -- an 89% reduction. Full tool schemas are loaded on demand via BM25 similarity search when the model identifies a relevant tool, accessed through the tool_search_tool_regex_20251119 beta header.

Activation Threshold

ToolSearch activates at the 10% context window consumption threshold. During the initial portion of a conversation (before 10% of the context window is used), all tools are loaded eagerly in the traditional manner. Once the conversation grows past 10%, the system switches to lazy-loading mode, replacing full tool definitions with the compact index. This threshold balances two concerns: early in a conversation, having all tools immediately available improves first-response quality; later, conserving context space for actual conversation content becomes more important.

Token Economics

Mode Token Cost Tools Available
Eager loading (< 10% context) ~77,000 tokens All tools with full schemas
Lazy loading (>= 10% context) ~8,700 tokens Lightweight index + on-demand schema loading
Savings ~68,300 tokens ~89% reduction in tool definition overhead

This savings is significant because tool definitions compete with conversation content for the same context window. In long sessions, especially those involving the compaction-pipeline, the 68K token savings can mean the difference between maintaining useful context and hitting the compaction threshold prematurely.

BM25 Similarity Matching

When the model needs a tool in lazy-loading mode, it calls the ToolSearch tool with a natural language query. The system uses BM25 similarity scoring to rank tools against the query and returns the full schema definitions for the top matches. BM25 was chosen over embedding-based similarity because it requires no external model calls, adds negligible latency, and works well for the relatively small tool corpus (40+ tools).

The search supports multiple query forms: - Direct selection: select:Read,Edit,Grep fetches specific tools by name. - Keyword search: notebook jupyter returns semantically relevant matches. - Name-required search: +slack send requires "slack" in the tool name and ranks by remaining terms.

Interaction with MCP and Plugins

Tools registered by MCP servers and the plugin-system are included in the ToolSearch index alongside built-in tools. As the tool count grows (through MCP connections and plugin installations), the token savings from lazy loading become even more pronounced. Without ToolSearch, a session with 100+ MCP tools would consume a substantial fraction of the context window just for tool definitions.

Integration

ToolSearch interacts with the system-prompt-assembly system, which conditionally includes either full tool definitions or the lightweight index based on context consumption. The anti-distillation-defenses benefit from ToolSearch because the lazy-loading mechanism means that decoy tools are not always visible, making tool enumeration harder. The cache-economics system treats the tool definition section as a cache-stable prefix when using eager loading, improving prompt cache hit rates.

Key claims

Relations

Sources

src-20260409-0929a9552e6b