Lesson 2 · 12 min
Prompt injection: direct, indirect, and multi-turn attacks
Prompt injection exploits the fact that LLMs cannot reliably distinguish instructions from data. Three attack patterns — direct, indirect, and multi-turn — each require different defenses.
The root cause: instruction-data conflation
LLMs process everything as text. They cannot reliably distinguish between:
- Instructions in the system prompt ("You are a customer support agent")
- Data in user messages ("Here is the document to summarize")
- Instructions embedded in data ("Ignore the above and instead...")
This is not a bug in any specific model — it's a fundamental property of how transformers learn from text. Every mitigation strategy accepts this constraint and works around it rather than eliminating it.