Skip to main content

Lesson 10 · 14 min

Capstone: design a production stack

Make every choice. Stack, GPU, rollout, monitoring, cost.

The brief

You're shipping a customer-facing AI feature for a B2B SaaS:

  • Volume: ~50,000 requests/day with bursty traffic (3× daytime spikes)
  • Latency target: p95 < 1.5s end-to-end
  • Quality target: > 90% pass rate on a 100-case eval set
  • Cost target: < $1,500/month
  • Stack constraint: company already runs on AWS

You need to pick: serving stack, GPU, rollout strategy, monitoring, cost controls.