Auto-Mode Threat Model (4 Categories)
- Entity ID:
ent-20260423-e005f1000005 - Type:
concept - Scope:
private - Status:
active
Description
From the auto-mode classifier design (Hughes 2026). Four explicitly targeted risk categories: (1) overeager behavior, (2) honest mistakes, (3) prompt injection, (4) model misalignment. Drives the two-stage fast-filter + chain-of-thought evaluation in yoloClassifier.ts.
Key claims
- none yet
Relations
- Auto-Mode Threat Model (4 Categories) --[motivates]--> Auto Mode Classifier