Custom Shell Parser
- Entity ID:
ent-20260410-2af58a6c258c - Type:
service - Scope:
shared - Status:
active - Aliases: shell-parser
Description
Claude Code contains a multi-layered bash parsing system. The primary parser is a pure-TypeScript bash parser in utils/bash/bashParser.ts that produces tree-sitter-bash-compatible ASTs. This parser was validated against a 3,449-input golden corpus generated from the WASM parser and includes a 50ms wall-clock timeout and a 50,000-node budget cap to bail out on pathological or adversarial input. On top of it, the AST-based security analyzer in utils/bash/ast.ts walks the tree with an explicit allowlist of node types to extract trustworthy argv[] arrays for permission matching, using a fail-closed design where any unrecognized node type causes the command to be classified as "too-complex" and routed to the user for approval.
The pure-TS parser in bashParser.ts is a character-by-character lexer that tracks both JS string indices and UTF-8 byte offsets (for tree-sitter position compatibility). It handles the full range of bash syntax: single/double/ANSI-C quoting, heredocs (including tab-stripping <<-), command substitution ($()), process substitution (<() / >()), parameter expansion (${}), arithmetic expansion ($(())), backtick substitution, pipelines, list operators (&&, ||), and redirections. The lexer is context-sensitive: [ is treated as an operator in command position (test command) but as a word character in argument position (glob/subscript).
Separately, a safe wrapper around the shell-quote NPM library in utils/bash/shellQuote.ts provides tryParseShellCommand() and includes critical security hardening: hasMalformedTokens() detects when shell-quote misinterprets commands containing ambiguous patterns (like JSON-like strings with semicolons), preventing command injection via HackerOne report #3482049. The hasShellQuoteSingleQuoteBug() function detects a specific differential between shell-quote and bash's handling of backslashes inside single quotes, where '\' <payload> '\' could hide payloads from security checks. The AST-based tree-sitter approach in ast.ts was built to replace these fragile differential-detection patches.
Key claims
clm-20260410-e1: The pure-TypeScript bash parser produces tree-sitter-bash-compatible ASTs and was validated against a 3,449-input golden corpus. It has a 50ms timeout and 50,000-node budget to prevent DoS. Evidence:bashParser.tslines 1-10 -- "Validated against a 3449-input golden corpus generated from the WASM parser"; lines 28-32 --PARSE_TIMEOUT_MS = 50,MAX_NODES = 50_000.clm-20260410-e2: The AST-based security analyzer uses a fail-closed allowlist design: any node type not explicitly allowlisted causes the entire command to be classified as "too-complex", routing it through the permission prompt flow. Evidence:ast.tslines 1-19 -- "The key design property is FAIL-CLOSED: we never interpret structure we don't understand. If tree-sitter produces a node we haven't explicitly allowlisted, we refuse to extract argv."clm-20260410-e3: Theshell-quotewrapper includes patches for two specific security vulnerabilities: HackerOne #3482049 (command injection via ambiguous JSON-like strings causingshell-quoteto treat;as an operator) and a single-quote backslash differential (whereshell-quoteincorrectly treats\as escape inside single quotes while bash treats it as literal). Evidence:shellQuote.tslines 116-117 -- "Security: This prevents command injection via HackerOne #3482049"; lines 179-189 --hasShellQuoteSingleQuoteBugdescription.clm-20260410-e4: The parser uses placeholder strings (__CMDSUB_OUTPUT__for command substitutions,__TRACKED_VAR__for tracked variable expansions) in argv arrays, allowing the outer command to be permission-checked while acknowledging that inner substitution values are runtime-determined. Evidence:ast.tslines 67-74 --CMDSUB_PLACEHOLDER = '__CMDSUB_OUTPUT__',VAR_PLACEHOLDER = '__TRACKED_VAR__'.
Relations
rel-20260410-e1: ent-20260410-2af58a6c258c --[primary_implementation]-->src/utils/bash/bashParser.ts(pure-TS lexer/parser)rel-20260410-e2: ent-20260410-2af58a6c258c --[security_layer]-->src/utils/bash/ast.ts(AST-based security analysis)rel-20260410-e3: ent-20260410-2af58a6c258c --[legacy_wrapper]-->src/utils/bash/shellQuote.ts(shell-quote library wrapper with security patches)rel-20260410-e4: ent-20260410-2af58a6c258c --[consumed_by]-->src/tools/BashTool/bashPermissions.ts(permission decision pipeline)rel-20260410-e5: ent-20260410-2af58a6c258c --[init_via]-->src/utils/bash/parser.ts(tree-sitter initialization and feature gating)
Sources
src-20260409-e9925330d110
src-20260410-shell-parser-a: src/utils/bash/bashParser.ts, src/utils/bash/ast.ts, src/utils/bash/shellQuote.ts, src/utils/bash/parser.ts