Claude Code Leak Round 31 — VILA Paper, Proxy Token Injection, v2.1.116/117, Three-Stage Pipeline (2026-04-21 to 2026-04-23)
- Source ID:
src-20260423-542f02260352 - Kind:
analysis - Scope:
shared - Origin:
community-analysis/round-31 - Raw path:
sources/raw/claude-code-leak-round-31-vila-paper-proxy-token-injection-v2-1-116-117-three-st__src-20260423-542f02260352.md - Status:
active
Tags
community-analysis leak-round round-31
Content
Claude Code Leak Analysis — Round 31: April 21–23, 2026
Executive Summary
This installment covers the six most important new threads emerging in the April 21–23 window: the first peer-reviewed academic paper on the leaked codebase (VILA Lab, MBZUAI), a proxy-verified server-side token injection at v2.1.100+, a newly confirmed git-status cache-bust destroying prompt caching, the v2.1.116/117 security and performance patches, the full three-stage message pipeline architecture from Southbridge Research, and the 20-days-post-leak community reassessment on Reddit. Together, these threads push the codebase understanding significantly deeper than any single prior installment, especially on the token economics and the precise layered architecture.
1. The VILA Lab Academic Paper: Most Authoritative Architecture Document to Date
On April 16, 2026, researchers from the VILA Lab at Mohamed bin Zayed University of Artificial Intelligence published "Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems," the first peer-reviewed academic analysis of the leaked v2.1.88 source.[^1] This is the most rigorous and comprehensive architectural document produced from the leak, combining direct source-level analysis with a comparative study against OpenClaw, an open-source agent system.[^1]
5 Values → 13 Principles → Implementation
The paper's central contribution is tracing the architecture from five human values through thirteen design principles to specific source files.[^1] This framework is not inferred from behavior — it is derived directly from source code, the principal hierarchy definition, and internal commentary:
| Value | Key Principles | Source Evidence |
|---|---|---|
| Human Decision Authority | Deny-first escalation, graduated trust spectrum | permissions.ts, conversationRecovery.ts |
| Safety, Security, Privacy | Defense in depth, reversibility-weighted risk | shouldUseSandbox.ts, yoloClassifier.ts |
| Reliable Execution | Context as scarce resource, append-only state | query.ts:365–453, sessionStorage.ts |
| Capability Amplification | Minimal scaffolding, composable extensibility | tools.ts, AgentTool.tsx |
| Contextual Adaptability | Transparent file-based memory, externalized policy | claudemd.ts, types/hooks.ts |
The paper also applies a sixth evaluative lens — long-term human capability preservation — which it identifies as conspicuously absent from Anthropic's stated design values.[^1] Anthropic's own internal study of 132 engineers and researchers documents a "paradox of supervision" where AI assistance risks atrophying the skills needed to supervise the AI, and independent research found that developers in AI-assisted conditions scored 17% lower on code comprehension tests.[^1]
The 98.4% / 1.6% Ratio — Now Formally Established
The paper formally establishes and documents the codebase ratio that community analysis had been estimating: 98.4% of the Claude Code codebase is deterministic operational infrastructure; 1.6% is AI decision logic.[^2] The agent loop itself is a simple while (true) cycle. All architectural complexity lives in the surrounding systems: the five-layer compaction pipeline, seven-mode permission system with ML classifier, 27-event hook pipeline, four-mechanism extensibility layer, and append-only session persistence.[^1]
The Seven-Component Architecture — Source-File Mapped
The paper provides the first source-file-mapped description of all seven components and their exact interaction topology:[^1]
- User — submits prompts, approves permissions
- Interfaces — Interactive CLI, Headless CLI (
claude -p), Agent SDK, IDE/Desktop/Browser — all feed the samequeryLoop()function inquery.ts, with only the rendering layer varying - Agent loop —
queryLoop()async generator,query.ts - Permission system — deny-first rule evaluation (
permissions.ts), ML auto-classifier (yoloClassifier.ts), hook interception (types/hooks.ts) - Tools — up to 54 built-in tools: 19 unconditional + 35 conditional on feature flags/user type, assembled by
assembleToolPool()intools.ts, merged with MCP-provided tools - State & persistence — append-only JSONL session transcripts (
sessionStorage.ts), global prompt history (history.ts), subagent sidechain files - Execution environment — shell execution with optional sandboxing (
shouldUseSandbox.ts), 42 tool subdirectories, MCP server connections across 8+ transport variants
The critical QueryEngine clarification the paper provides: the class is a conversation wrapper for non-interactive surfaces, not the execution engine itself. The actual shared code path is queryLoop() in query.ts, which both QueryEngine.submitMessage() and the interactive CLI call directly — QueryEngine is bypassed by the interactive CLI entirely.[^1]
Seven Safety Layers — Definitive Enumeration
The paper is the first analysis to formally enumerate all seven independent safety layers, each of which can independently block a request:[^1]
- Tool pre-filtering — blanket-denied tools removed from the model's view before any API call, in
tools.ts - Deny-first rule evaluation — deny rules override allow rules regardless of specificity, in
permissions.ts - Permission mode constraints — active mode determines baseline handling for rules matching no explicit pattern
- Auto-mode ML classifier — two-stage fast filter + chain-of-thought evaluation in
yoloClassifier.ts - Shell sandboxing — approved shell commands may still execute inside a restricted sandbox, in
shouldUseSandbox.ts - Non-restoration on resume — session-scoped permissions are deliberately not restored on resume or fork, in
conversationRecovery.ts - Hook-based interception —
PreToolUseandPermissionRequesthooks, intypes/hooks.ts
The paper notes that this is a "defense in depth" architecture specifically so that a compromised or adversarially manipulated model cannot override sandboxing through its reasoning — the model's only interface to the outside world is the tool_use structured protocol, which the harness validates before execution.[^1]
Context as Bottleneck: Beyond the Five-Layer Pipeline
The five-layer compaction pipeline (Budget Reduction → Snip → Microcompact → Context Collapse → Auto-Compact) at query.ts:365–453 is now widely known.[^1] The paper documents five additional context-conservation mechanisms that exist beyond the pipeline:[^1]
- CLAUDE.md lazy loading — base hierarchy loaded at session start, but nested directory files and conditional rules loaded only when the agent reads files in those directories
- Deferred tool schemas (ToolSearch) — tools start as name-only placeholders; full schemas injected only when the model requests them, reducing initial token consumption by ~85%[^3]
- Subagent summary-only return — subagents return only summary text to the parent, never their full conversation history
- Per-tool-result budget — individual tool results capped at a configurable size
- Subagent sidechain transcripts — each subagent's full conversation stored in a separate
.jsonlfile atsessionStorage.ts:247, preventing subagent content from inflating parent context
Open Design Questions the Paper Formally Identifies
The VILA Lab paper is the first analysis to frame six structured open questions for future agent systems, derived from the codebase gaps:[^1]
- Observability-evaluation gap — 78% of AI failures are invisible in current tooling
- Cross-session memory substrate — CLAUDE.md file-based memory does not scale beyond individual projects
- Harness boundary evolution — where/when/what/with whom the harness boundary should expand
- Horizon scaling — adapting the session-bound architecture to scientific program-length work
- Governance interfaces — EU AI Act compliance mechanisms not present in current architecture
- Long-term human capability preservation — the sixth evaluative lens, where the architecture has the least coverage
2. The v2.1.100 Phantom Token Injection: Server-Side, Proxy-Verified
The most practically significant billing discovery since the extractMemories doubler (Round 18) emerged through proxy-layer investigation of v2.1.100 and v2.1.101. A developer routed identical API calls through an HTTP proxy across three versions:[^4]
| Version | Content-Length (bytes) | cache_creation_input_tokens |
Total |
|---|---|---|---|
| v2.1.98 | 169,514 | 49,726 | 49,726 |
| v2.1.100 | 168,536 (−978 B) | 69,922 (+20,196) | 69,922 |
| v2.1.101 | 171,903 | ~72,000 | ~72,000 |
v2.1.100 sends 978 fewer bytes than v2.1.98 but is billed 20,196 more tokens.[^4] This was confirmed across 40+ sessions, with a clean bimodal distribution: one cluster around 50K tokens (pre-v2.1.100), one around 71K (v2.1.100+).[^5] The extra tokens are classified as cache_creation_input_tokens, meaning they enter the model's actual context window and compete with user instructions, CLAUDE.md content, and conversation history for the effective context budget.[^5]
The cause remains unconfirmed by Anthropic. Community speculation points to expanded session memory features in v2.1.100 (summary injection or additional tool schema expansion), expanded safety classifier context, or a server-side routing change tied to the User-Agent version string.[^4] The issue remains open as of April 23 (GitHub #46917).[^4] The workaround confirmed by multiple users: downgrade via npx claude-code@2.1.98 or use npx @anthropic-ai/claude-code for the latest version that avoids the issue.[^6]
The context window impact is the less-discussed consequence: 20K invisible tokens consumed before any user content means approximately 50 pages of code that could have been in context are displaced.[^5] For users who have carefully crafted CLAUDE.md instructions, those instructions are now diluted by unknown server-side content that cannot be audited through any currently available tool.
3. The Git-Status Cache-Bust: The Largest Unpatched Cost Bug
Independent of the v2.1.100 phantom token issue, a separate cache invalidation bug was discovered and documented on HackerNews (April 12) and filed as GitHub issue #47098.[^7] The finding: every git commit invalidates Claude Code's prompt cache entirely, because git status is embedded in the second cache block ({system-prompt | ~/.claude/claude.md | git-status}).[^8]
The cache structure has three blocks:
Block 1: {tools | claude-version}
Block 2: {system-prompt | ~/.claude/claude.md | git-status} ← busted by every commit
Block 3: {skills | ./claude.md | user-prompt}
When git status changes (which it does on every commit, every staged file, every checkout), block 2 is invalidated, which cascades to block 3. The result: new sessions after any git activity start with a completely cold cache, costing full cache_creation_input_tokens for the entire system prompt and CLAUDE.md.[^9] For a typical project context of 15–20K tokens, this adds $0.0015–$0.002 per session cold start at Opus rates — small in isolation, but a major driver of the March 23 rate acceleration.[^10]
Workaround (community-verified): CLAUDE_CODE_DISABLE_GIT_INSTRUCTIONS=1 claude "Hello" before starting work.[^8] This pre-warms the cache with skills and CLAUDE.md cached before any git activity, then subsequent calls in the same session hit the cache normally. Alternatively, includeGitInstructions: false in settings.json permanently removes git status from the cache block.[^9] The "Hello" priming call costs ~6K cache_write tokens once, then delivers 10–16K cache_read savings on every subsequent call.[^8]
4. v2.1.115/116/117: Three Critical Patches in Five Days
Three releases between April 17–21 addressed previously undocumented security and correctness bugs:[^11][^12][^13]
v2.1.115/116: Sandbox rm/rmdir Dangerous-Path Bypass — Patched
The most consequential security fix: sandbox auto-allow was bypassing the dangerous-path check for rm and rmdir.[^14] This means that when sandbox auto-allow mode was active (the default in many team setups), rm and rmdir could silently target /home, the user's home directory, or other critical system paths without triggering a permission prompt. This is directly related to the long-standing rm -rf issues documented as far back as GitHub issue #6608 (August 2025).[^15] The fix in v2.1.116 ensures rm/rmdir targeting critical paths always trigger a permission prompt regardless of sandbox mode.[^14]
v2.1.116: /resume 67% Faster on Large Sessions
/resume on sessions 40 MB and above is up to 67% faster.[^14] The underlying cause was that dead fork entries — abandoned fork branches from multi-agent sessions — were being replayed in full on every resume. With many dead-fork entries (common in heavy agent-teams usage), resume was O(n) in dead entries.[^12] The fix indexes live entries only. Related: the thinking-signature resume problem documented in GitHub #42260 (25% of resume payload being invisible encrypted thinking signatures) remains open and unaddressed.[^16]
v2.1.117: CLAUDE_CODE_FORK_SUBAGENT=1 for External Builds
CLAUDE_CODE_FORK_SUBAGENT=1 enables forked subagents on external builds.[^17] Previously, the fork-subagent feature (which duplicates the parent's entire conversational context into child agents for parallel exploration of different solution trajectories) was production-internal only.[^18] The env var gates it for third-party integrations. Agent frontmatter mcpServers are now also loaded for main-thread agent sessions via --agent.[^13]
v2.1.117: Opus 4.7 Context Window Fix
A significant correctness bug was patched: Opus 4.7 sessions were computing context percentages against a 200K window instead of the native 1M, causing autocompact to fire when sessions were only 20% full.[^13] This was a direct consequence of the 200K pricing boundary (above which premium pricing applies) being reused as the context window ceiling in an internal constant — the same confusion documented in GitHub #23432.[^19] The fix correctly uses Opus 4.7's 1M window for percentage calculations and autocompact threshold.
5. Southbridge Research: Three-Stage Message Pipeline — Primary Architecture
Southbridge Research published a detailed data structures analysis focusing on the three-stage message transformation pipeline that underlies all of Claude Code's streaming behavior.[^20] This is distinct from the VILA Lab paper's control-flow analysis — it documents the data representation layer.
The Dual-Representation Message System
The core innovation is that Claude Code maintains three simultaneous representations of every message:[^20]
// Stage 1: CLI Internal Representation
interface CliMessage {
type: "user" | "assistant" | "attachment" | "progress"
uuid: string // CLI-specific tracking
timestamp: string
message?: APICompatibleMessage // only for user/assistant
attachment?: AttachmentContent // only for attachment
progress?: ProgressUpdate // only for progress
}
// Stage 2: API Wire Format
interface APIMessage {
role: "user" | "assistant"
content: string | ContentBlock[]
// no CLI-specific fields — clean API contract
}
// Stage 3: Streaming Accumulator
interface StreamAccumulator {
partial: Partial<APIMessage>
deltas: ContentBlockDelta[]
buffers: Map<string, string> // tool_use_id → accumulating JSON
}
The separation between Stage 1 and Stage 2 is architecturally critical: CliMessage carries UUID tracking, timestamps, and progress state that the API must never see. The APICompatibleMessage field within CliMessage holds only the clean API-format content. This is what allows Claude Code to update progress indicators and render partial results while maintaining a correct, minimal API payload.[^20]
ContentBlock Polymorphism: 9 Types, Platform-Specific
The ContentBlock discriminated union has 9 variants, several of which are platform-specific and not documented in the public API reference:[^20]
TextBlock,ImageBlock,ToolUseBlock,ToolResultBlock,ThinkingBlock— standardDocumentBlock,VideoBlock— platform-specific extensionsGuardContentBlock— the safety classifier's content annotation typeReasoningBlock,CachePointBlock— the prompt cache insertion point type
CachePointBlock is the most operationally significant: it marks where the prompt cache boundary is inserted in the assembled context. Its position in the content array determines which prefix is cached, which is why block ordering (tools → system prompt → git status → CLAUDE.md → user content) governs the cache invalidation hierarchy.[^20]
6. Proxy-Layer Investigations: What Claude Code Actually Sends
Multiple independent developers have now published complete proxy capture analyses, providing primary-source evidence of the actual API payload structure.[^21][^22][^23]
The Justacuriousengineer Analysis (March 30, 2026)
A Substack deep-dive intercepted and published the full request structure for a session with agents, rules, skills, and CLAUDE.md.[^21] Key findings:
- The assembled request has a fixed structure: deferred tool list → skill list → CLAUDE.md → project rules → memories → user message
- Skills use lazy tool invocation: the first API call carries only skill names; when Claude invokes the
Skilltool, it receives the full skill markdown as a tool result that then persists in conversation context for the remainder of the session[^21] - This means skills are not pre-loaded — they cost tokens only when invoked, but once invoked, their full text remains in context permanently for the session
- CLAUDE.md content appears in a
<system-reminder>block at the start of the first user message — not in the APIsystemfield[^3]
The Kangraemin MITM Analysis (March 18, 2026)
A macOS desktop app ("Claude Inspector") was built to intercept real-time traffic.[^22] Findings that contradict or refine common assumptions:
- CLAUDE.md is sent on every single request, not just the first — prepended as a
system-reminderblock on every turn's user message[^22] - MCP tools use lazy-loaded schemas: built-in tools ship with full JSON schemas on every request; MCP tools start as name-only placeholders, schemas injected only on first use[^22]
- Screenshot cost: a single screenshot adds hundreds of kilobytes to the request payload via base64 encoding[^22]
- Skills vs. commands are handled fundamentally differently: local commands (
/clear,/mcp) result in only the output reaching the model; skills (/commit, invoked skills) inject full prompt text that persists for the rest of the session[^22]
The JSONL Token Undercount: Every Community Tool Is Working with Wrong Numbers
A preliminary but methodologically rigorous finding from Gille AI (February 23, GitHub #28197): every tool that reads JSONL session logs for token accounting is working with bad data.[^24] The root cause: Claude Code writes JSONL entries during streaming, when input token counts haven't been finalized. The usage.input_tokens field is set to a streaming placeholder value (0 or 1) and never updated after the request completes.
The measurement across two full days of Opus 4.6 usage:
- 75% of all JSONL entries have usage.input_tokens of 0 or 1 (placeholder values)
- Input tokens undercounted by 100–174×
- Output tokens undercounted by 10–17×
- Tools affected: ccusage, ccaudit, claude-code-log, all JSONL-based auditors[^24]
The accurate source is the statusbar context JSON (piped to statusline scripts on every status update), which maintains cumulative totals from finalized API responses. This is completely separate from the JSONL log path — same process, same API calls, entirely different recording mechanisms.[^24]
7. The 20-Day Retrospective: Community Reassessment
On April 20, r/LocalLLaMA published a community retrospective thread: "20 days post-Claude Code leak: Did the accidental 'open sourcing' actually change anything?"[^25] The top comment captured the community's settled view succinctly:
"What struck me most about the leak wasn't the actual code, but rather the realization of how much of the impressive functionality comes down to orchestration."
The thread's consensus across 847 comments:[^25]
- The harness is the product. The model is powerful but generic. The 98.4% operational infrastructure — permission gates, context management, streaming execution, recovery logic — is what makes Claude Code genuinely different from
curl | model. - The architectural patterns are now widely borrowed but not widely understood. Fork-subagent, sidechain transcripts, deny-first layered permissions, and the five-compaction pipeline are being replicated, but without the design reasoning behind them, implementations tend to get the surface pattern right and the invariants wrong.
- Anthropic's DMCA campaign is failing silently. The community reports that the GitHub mirrors have collectively accumulated 370,000+ stars and forks across surviving repos, with new mirrors appearing faster than takedowns can process them. Archive.org snapshots and torrent seedings are now the canonical preservation mechanism.
- Nothing changed for users. No pricing adjustment, no transparency improvements, no acknowledgment of the phantom token issue or the git cache-bust. The tool continues to improve rapidly in features while billing transparency remains zero.
8. Thinking Signature Resume Tax: Still Unresolved
GitHub issue #42260 (filed April 1) remains open with no Anthropic response as of April 23.[^16] The core finding: when resuming a long conversation, thinking block signatures from prior turns are replayed as input tokens, accounting for ~25% of the entire resume payload despite being invisible to users.
From a documented 24-hour brainstorming session (480 messages, 33 turns):
| Component | Est. tokens |
|---|---|
| User text messages | ~15,500 |
| Assistant text | ~39,500 |
| Tool results | ~52,900 |
| Tool use inputs | ~9,000 |
| Thinking signatures | ~38,800 |
| Total on resume | ~156,000 |
The 54 thinking blocks across 33 turns average 3,835 characters per signature, with a maximum of 13,184 characters.[^16] The thinking fields are empty strings (stripped for local storage) — only the encrypted signature field survives, but that signature must be replayed to the API for extended thinking to function correctly. This makes long extended-thinking sessions increasingly expensive to resume, precisely in the workloads where resume is most needed.[^16]
Combined with the v2.1.100 phantom tokens (+20K per request) and the git-status cache-bust (full cold start after any commit), a worst-case session resumption on a long extended-thinking session costs approximately: - ~156K tokens for thinking signatures - +20K phantom tokens (v2.1.100+) - +15–20K cold cache rebuild (post-commit)
~191–196K tokens before the user types a single character in the resumed session.
9. New Tooling Ecosystem: Proxy and Audit Layer
The community has assembled a mature proxy and audit tooling layer around Claude Code, most of it emerging from the leak investigation.[^26][^27][^28]
| Tool | Type | What It Reveals |
|---|---|---|
| llm-interceptor (formerly claude-code-inspector) | MITM proxy, cross-platform | Full request/response bodies, system prompts, tool defs, streaming SSE |
| agent-super-spy | LLM proxy + HTTP MiTM + LLMetry | Full traffic analysis including OpenTelemetry tracing |
ccaudit (kmcheung12/ccaudit) |
JSONL TUI explorer | Session/exchange/category token breakdown from local logs |
ccusage (ryoppippi/ccusage) |
JSONL CLI analyzer | Daily/monthly/session reports, 5-hour billing block tracking |
claude-code-log (daaain/claude-code-log) |
JSONL → HTML converter | Session navigation, message filtering, token tracking |
| cc-trace (Claude Plugin skill) | mitmproxy skill | Interactive setup guide for traffic capture within Claude Code itself |
claude-code-proxy (seifghazi) |
In-flight visualizer | Captures and visualizes requests as they happen |
Critical caveat: all tools using JSONL as their data source (ccaudit, ccusage, claude-code-log) are working with undercounted token data (100–174× undercount on inputs).[^24] Tools using proxy interception (llm-interceptor, agent-super-spy) capture actual billed values. For accurate billing analysis, proxy-layer tools are required.
10. Version Trajectory: v2.1.113–117 Feature Map
| Version | Date | Key Change |
|---|---|---|
| v2.1.113 | Apr 17 | Native binary spawn (per-platform optional dependency) instead of bundled JS; sandbox.network.deniedDomains |
| v2.1.114 | Apr 17 | Fix: crash in permission dialog when agent-teams teammate requested tool permission |
| v2.1.115 | Apr 18 | Fix: sandbox auto-allow bypassing dangerous-path check for rm/rmdir |
| v2.1.116 | Apr 20 | /resume 67% faster on 40MB+ sessions; handles dead-fork entries efficiently |
| v2.1.117 | Apr 21 | CLAUDE_CODE_FORK_SUBAGENT=1 external builds; agent frontmatter mcpServers for --agent; Opus 4.7 context window fix (1M not 200K); model selection persists across restarts |
The native binary spawn change in v2.1.113 is architecturally significant: Claude Code no longer runs bundled JavaScript directly but spawns a native binary (pre-compiled, per-platform).[^11] This aligns the public distribution with the Zig-layer attestation architecture exposed in the leak and makes shimming or republishing a modified CLI significantly harder without breaking attestation.[^29]
11. Open Questions as of April 23, 2026
The following threads have been raised but remain unresolved:
- What exactly are the 20K phantom tokens in v2.1.100+? Server-side, invisible,
cache_creation_input_tokensclassified. No official explanation. GitHub #46917 open.[^4] - Will the git-status cache-bust be fixed, or is it architectural? The fix requires either moving git status out of the second cache block or providing a stable git-status digest. No response from Anthropic on #47098.[^7]
- Why is thinking-signature resume accumulation not being addressed? GitHub #42260 filed April 1, 22 days open with no response.[^16]
- What are the 64 undocumented gated modules? The VILA Lab paper confirms 35 conditional built-in tools; the full 108-module gate catalogue documented in prior rounds has not been cross-validated against the paper's 54-tool count.
- What is the COORDINATOR_MODE flag? Referenced in the leak source, described in community analysis as a master coordinator for multi-agent parallel workers, but not documented in any official release notes.[^30]
- Is the long-term human capability degradation concern being tracked? The VILA Lab paper identifies this as a genuine risk, citing a 17% score reduction in comprehension tests for AI-assisted developers.[^1] No Anthropic response.
References
-
[PDF] Dive into Claude Code: The Design Space of Today's and Future AI ... - Deferred tool schemas: When ToolSearch is enabled, some tools include only their names in the initia...
-
VILA-Lab/Dive-into-Claude-Code - GitHub - ... Tool dispatch → Permission gate → Tool execution → Stop condition. Two execution paths: Streamin...
-
I intercepted Claude Code's API calls and broke down exactly what it ... - I intercepted Claude Code's API calls and broke down exactly what it sends — here's what I found · Y...
-
20K tokens vs v2.1.98 — same payload, server-side · Issue #46917 - Claude Code versions 2.1.100 and 2.1.101 consume ~20,000 more cache_creation_input_tokens per reques...
-
Claude Code Silently Burns 40% More Tokens Since v2.1.100 - A developer used an HTTP proxy to capture full API requests across four Claude Code versions and fou...
-
Claude Code may be burning your limits with invisible tokens - Claude Code users report that version 2.1.100 silently injects approximately 20,000 invisible tokens...
-
Claude Code may be burning your limits with invisible tokens - Claude code caches a big chunk of context (all messages of current session). While a lot of data is ...
-
Tell HN: Claude-code prompt-cache workaround/fix - Hacker News
-
Why Claude Code Burns Through Tokens So Fast — 3 Causes and ... - Claude Code token usage drains several times faster since March 2026. Three converging factors and a...
-
Claude Code Changelog & Release Notes | Havoptic - Latest: v2.1.114 · Apr 17, 2026. 261 releases tracked. Every Claude Code update, feature, and versio...
-
Changelog - Claude Code Docs - Release notes for Claude Code, including new features, improvements, and bug fixes by version.
-
Claude Code v2.1.116 — Faster Resumes & Safety Fix - YouTube - resume is up to 67% faster on large sessions. Plus a critical sandbox safety fix that closes a gap i...
-
[BUG] Claude code ran
rm -rfcommand without permission - Environment Platform (select one): Anthropic API AWS Bedrock Google Vertex AI Other: Claude CLI vers... -
Resume of long sessions loads disproportionate tokens ... - GitHub - Summary. When resuming a long conversation, thinking block signatures from prior turns are replayed ...
-
anthropics/claude-code v2.1.117 on GitHub - NewReleases.io - New release anthropics/claude-code version v2.1.117 on GitHub.
-
Deep dive into the Claude Code source leak - The codebase has multiple compaction layers, including microcompact which strips stale tool calls (o...
-
[Bug] /context command reports 200K max for Claude Opus ... - Description The /context slash command reports a max context window of 200K tokens for claude-opus-4...
-
Data Structures & The Information Architecture | Southbridge.AI - Claude Code's data structures: the three-stage message pipeline and streaming JSON parsers that tran...
-
Intercepting API calls and breaking down Claude Code’s prompt structure — here’s what I learned. - Deferred tools, skills, CLAUDE.md — it's all just prompts.
-
I Built a MITM Proxy to See What Claude Code Actually ... - Ever wondered what Claude Code is actually sending to the Anthropic API behind the scenes? I did —.....
-
Capture Claude Code with
mitmproxy— step-by-step guide (with ... - A practical, repeatable workflow to intercept, save, and analyze Claude Code traffic using mitmproxy... -
Claude Code's JSONL Logs Undercount Tokens by 100x - Every tool that reads Claude Code JSONL conversation logs for token accounting is working with bad d...
-
20 days post-Claude Code leak: Did the accidental "open sourcing ... - Now that its been about 20 days since Claude code source code got leaked, what really came out of it...
-
claude-code-inspector 1.1.0 on PyPI - DEPRECATED: Use llm-interceptor instead. Intercept and analyze LLM traffic from AI coding tools
-
cc-trace - Claude Skills - Interactive assistant for intercepting, debugging, analyzing and reviewing Claude Code API requests ...
-
The great big Ai LLM thread. Github code, blogs & opinions ... - https://efficienist.com/claude-code-may-be-burning-your-limits-with-invisible-tokens-you-cant-see-or...
-
Claude Code Source Leak: A Timeline - by Darko - Kilo Blog - A factual roundup of the incident.
-
Inside Claude Code: Leaked Source Analysis | Articles - O-mega.ai - Anthropic's leaked 512k-line Claude Code source reveals how production AI agents work with thin loop...