Lesson 7 · 11 min
Capstone — taking a feature from idea to GA
Walk one realistic feature — 'Auto-suggest a tag for incoming support tickets' — from initial idea through eval, prototype, beta, GA, and quarterly review.
Week 1: Spec
PM writes the 7-section spec. Eval criteria: 30 real tickets with hand-labelled correct tags from the team's existing 12-tag taxonomy. Success metric: ≥85% top-1 accuracy on eval, ≤€0.005/call. Failure recovery: when confidence is low (multi-tag possibility), surface 3 suggestions for the human to pick. Cost envelope: cap at 2k tags/tenant/month on Pro, unlimited on Enterprise. Lifecycle: re-eval each model release.
Week 2: Joint labelling + prototype
PM, eng, designer label 30 cases together. Eng builds a prompt + classifier + cheap model + cache. First eval: 79% accuracy. Iterate: trim system prompt, add 3 few-shot examples covering edge cases. Eval: 87%. Within target.
Week 3: Beta
Ship to 5 friendly customers. Diary study: 4 of 5 use it 5+ times in week one. Refusal acceptability: when the model surfaces 3 suggestions instead of 1, users adapt without complaint.