Lesson 2 · 12 min

Prompt injection: direct, indirect, and multi-turn attacks

Prompt injection exploits the fact that LLMs cannot reliably distinguish instructions from data. Three attack patterns — direct, indirect, and multi-turn — each require different defenses.

The root cause: instruction-data conflation

LLMs process everything as text. They cannot reliably distinguish between:

Instructions in the system prompt ("You are a customer support agent")
Data in user messages ("Here is the document to summarize")
Instructions embedded in data ("Ignore the above and instead...")

This is not a bug in any specific model — it's a fundamental property of how transformers learn from text. Every mitigation strategy accepts this constraint and works around it rather than eliminating it.