Skip to main content

Lesson 8 · 11 min

Capstone — wiring observability into a real feature

The capstone. Walk through adding the four LLM signals, a probe set, and an on-call playbook to one real feature. The before/after worth the work.

Starting state

Feature: a customer-support assistant. Currently:

  • Standard SRE metrics (requests/sec, p95 latency, error rate). Green.
  • No LLM-specific signals.
  • No probe set.
  • No tracing per request.
  • On-call playbook: 'wake up, look at Datadog, scratch head'.

Last week, an outage cost the company a day of customer trust. Postmortem revealed: the model started refusing 9% of requests after a provider update. The team detected it 18 hours later via a support ticket.