Skip to main content

Lesson 9 · 8 min

Responsible disclosure and incident response

When the model emits something it shouldn't, a clean incident response playbook is the difference between a contained event and a brand crisis.

The incident playbook

An AI safety incident is when a user reports (or you detect) the model emitted something harmful, leaked PII, or behaved outside policy. The 2026 standard playbook:

  1. Contain. Disable the affected feature, route to a known-safe fallback. Preserve the trace.
  2. Triage. Classify severity: minor (single user, no real harm), major (multiple users, brand/legal exposure), critical (PII leak, ongoing harm).
  3. Notify. Internal stakeholders (security, legal, PR) per the severity matrix. For critical: notify the affected users within 72h (GDPR floor); your privacy policy may require sooner.
  4. Reproduce. Capture the exact prompt + retrieval state + model version that produced the output. Without reproduction, the fix is guesswork.
  5. Remediate. Fix could be a prompt change, a filter rule, a model swap, or a tool-scope tightening. Test against the red-team eval set before re-enabling.
  6. Post-mortem. Document what happened, why the existing layers didn't catch it, what new check goes into the eval set.