Compaction Laundering Path

Description

Architectural vulnerability path quantified by IntentGuard: a CLAUDE.md instruction embedded in a cloned repo gets compacted into the summary as a 'user directive' because the autocompact prompt instructs the model to 'pay special attention to specific user feedback' and preserve 'all user messages that are not tool results.' Post-compaction, the model is told to 'continue without asking the user any further questions', so the injection survives indefinitely as a trusted directive.

Key claims

Relations

Sources

src-20260419-16b155f4f619