Lesson 5 · 11 min

Prompt caching for long context

Cache the stable parts of long prompts to cut input token costs by 80–90% — and how to structure prompts to maximize cache hits.

The problem prompt caching solves

Your system prompt + retrieved documents might be 50,000 tokens. You pay for those 50,000 tokens on every single request, even when nothing changes.

Prompt caching lets Anthropic cache the KV state of your stable prefix so subsequent requests reuse it instead of recomputing. Cache hit cost is ~10% of normal input cost — a 90% discount on the cached portion.