Lesson 2 · 11 min
Request patterns: sync, async, streaming, and batching
Choosing the wrong request pattern wastes money or kills UX. Sync, async, streaming, and batch each suit a different latency-cost-throughput trade-off.
The four patterns
Not every LLM call needs an answer in 200ms. Choosing the right request pattern is the single easiest way to cut costs and improve user experience simultaneously.
| Pattern | Latency | Cost | Best for |
|---|---|---|---|
| Synchronous | Lowest perceived | Standard | Chat, Q&A, interactive features |
| Streaming | Same wall-clock, lower perceived | Standard | Chat, long responses |
| Async (background) | Minutes | Standard | Reports, analysis, email drafts |
| Batch | Hours | 50% discount (Batch API) | Nightly jobs, training data, bulk eval |