Skip to main content

Lesson 5 · 11 min

Model routing — the right model for the right call

Calling a frontier model for 'classify this support ticket into one of 8 categories' is a bug. The router pattern that trades 50-100ms of classifier latency for 5-30× cost reduction.

The pattern

A cheap classifier sees the request and decides difficulty. Easy traffic (~80%) goes to a small fine-tuned model or the cheap tier. Hard traffic (~20%) goes to the frontier model.

Implemented well, this trades 50-100ms of classifier latency for a 5-30× bill reduction. The cost gap is so large that even a noticeable quality drop on your distribution can still favor the routed setup.