Lesson 8 · 11 min
Agent safety and guardrails
Agents are LLMs with the ability to act. The blast radius is bigger. Defense is layered.
What can go wrong
The attack surface beyond a normal LLM:
- Prompt injection escalation. A malicious doc the agent reads tells it to email your secrets out. Now it can.
- Runaway loops. The agent gets stuck and burns through your billing.
- Privilege escalation. A read-only agent finds a write tool and uses it.
- Tool side-effect amplification. A single bad decision triggers an irreversible action.