Lesson 10 · 14 min
Capstone: design a production stack
Make every choice. Stack, GPU, rollout, monitoring, cost.
The brief
You're shipping a customer-facing AI feature for a B2B SaaS:
- Volume: ~50,000 requests/day with bursty traffic (3× daytime spikes)
- Latency target: p95 < 1.5s end-to-end
- Quality target: > 90% pass rate on a 100-case eval set
- Cost target: < $1,500/month
- Stack constraint: company already runs on AWS
You need to pick: serving stack, GPU, rollout strategy, monitoring, cost controls.