Lesson 10 · 14 min

Capstone: design a production stack

Volume: ~50,000 requests/day with bursty traffic (3× daytime spikes)
Latency target: p95 &lt; 1.5s end-to-end
Quality target: &gt; 90% pass rate on a 100-case eval set
Cost target: &lt; $1,500/month
Stack constraint: company already runs on AWS

Make every choice. Stack, GPU, rollout, monitoring, cost.

The brief

You're shipping a customer-facing AI feature for a B2B SaaS:

You need to pick: serving stack, GPU, rollout strategy, monitoring, cost controls.