Lesson 6 · 10 min
Streaming structured outputs
Stream JSON token-by-token and parse progressively — why it matters for UX, how to do it in Python and Next.js.
Why stream structured output?
Waiting for a complete JSON blob before doing anything costs you latency. For a 500-token response at ~50 tokens/second, that's 10 seconds before the user sees anything. Streaming lets you:
- Show partial results as fields arrive (e.g. render a loading card that fills in)
- Start processing the first array items while the rest are still generating
- Detect early failures — if the first 50 tokens are garbled, abort and retry rather than waiting for the full response