Lesson 8 · 11 min

Agent safety and guardrails

Agents are LLMs with the ability to act. The blast radius is bigger. Defense is layered.

What can go wrong

The attack surface beyond a normal LLM:

Prompt injection escalation. A malicious doc the agent reads tells it to email your secrets out. Now it can.
Runaway loops. The agent gets stuck and burns through your billing.
Privilege escalation. A read-only agent finds a write tool and uses it.
Tool side-effect amplification. A single bad decision triggers an irreversible action.