Grep Over RAG
The defining architectural decision in claude-code: abandoning RAG with Voyage embeddings in favor of agentic grep-based search. Confirmed by Boris Cherny, Claude Code's creator, on X in January 2026.
The Decision
Early Claude Code used a vector DB with Voyage embeddings and standard RAG. The team switched to agentic search — the model calling grep, glob, and ls iteratively — before the February 2025 launch.
Boris Cherny:
"Early versions of Claude Code used RAG + a local vector db, but we found pretty quickly that agentic search generally works better. It is also simpler and doesn't have the same issues around security, privacy, staleness, and reliability."
Four Structural Reasons
| Reason | RAG | Agentic Search |
|---|---|---|
| Accuracy | Cosine similarity introduces noise on identifiers | Exact string matching finds handleAuthError precisely |
| Security | Requires storing embeddings on external infrastructure | No external storage — reads files on demand from user's machine |
| Staleness | Index drifts out of sync with code changes | Always reads current file state |
| No preprocessing | Indexing step required before first use | claude . in a new repo works immediately |
The Broader Design Philosophy
The source architecture notes: "choose regex over embeddings for search, Markdown files over databases for memory."
This philosophy extends beyond code search to auto-memory: memory retrieval uses LLM reasoning over filenames rather than vector similarity. Claude calls ls() to list memory files, reasons about relevance, then reads selected files. The LLM outperforms opaque vector chunk matching for structured, human-readable files.
Competitive Implication
From the Aakash Gupta X post:
"The creator of a tool generating $1B+ in ARR is telling you that a model using basic Unix search commands beat a sophisticated retrieval pipeline. The RAG stack didn't lose on some edge case. It lost on the metric that actually matters for developer tools: does the output feel right."
The Formal Distinction
Claude Code did not abandon retrieval. It abandoned pre-indexed retrieval. Agentic search is retrieval — the model deciding what to retrieve, when, in what order, following clues iteratively rather than querying a pre-built index. It is RAG where the model controls the retrieval loop.
Key Claims
clm-20260409-285074076579: Claude Code abandoned RAG for agentic grep after testing both
Sources
src-20260409-cbf9b6837f5f— Round 10: Quality Gap, CVE, Security