Lesson 6 · 11 min

Reliability engineering: retries, circuit breakers, and graceful degradation

LLM APIs have higher error rates and longer tail latencies than traditional APIs. Retries with exponential backoff, circuit breakers, and explicit degradation modes keep your application stable when the model is not.

Why LLMs need special reliability patterns

A typical REST API has p99 latency under 200ms and error rate under 0.1%. LLM APIs in 2026:

p99 latency: 5–30 seconds for complex requests
Error rate: 0.5–2% during normal operation; spikes to 5–10% during provider incidents
Error types: rate limit (429), overload (529), timeout, content filter, context length exceeded

Standard web reliability patterns (3 retries with 100ms backoff) don't work. A 10-second timeout means you need retries measured in seconds, not milliseconds. A rate limit error needs exponential backoff, not immediate retry.

Each error type has a different correct response.