Skip to main content

Lesson 2 · 11 min

Request patterns: sync, async, streaming, and batching

Choosing the wrong request pattern wastes money or kills UX. Sync, async, streaming, and batch each suit a different latency-cost-throughput trade-off.

The four patterns

Not every LLM call needs an answer in 200ms. Choosing the right request pattern is the single easiest way to cut costs and improve user experience simultaneously.

| Pattern | Latency | Cost | Best for |

|---|---|---|---|

| Synchronous | Lowest perceived | Standard | Chat, Q&A, interactive features |

| Streaming | Same wall-clock, lower perceived | Standard | Chat, long responses |

| Async (background) | Minutes | Standard | Reports, analysis, email drafts |

| Batch | Hours | 50% discount (Batch API) | Nightly jobs, training data, bulk eval |