Safety, Security, and Privacy (Value)

Description

Second motivating value. Protects humans, code, data, and infrastructure from harm even when the user is inattentive. Distinct from Human Decision Authority: authority is the human's power to choose; safety is the system's obligation to protect when that power lapses. The auto-mode threat model explicitly targets four risk categories: overeager behavior, honest mistakes, prompt injection, and model misalignment. Motivates defense in depth, deny-first, reversibility-weighted assessment, externalized policy, and isolated subagent boundaries.

Key claims

Relations

Sources

src-20260423-0cff68d3291b