Skip to main content

Lesson 3 · 12 min

The gateway layer: routing, rate limiting, and fallbacks

The gateway is the control plane for all LLM traffic: it routes requests to the right model, enforces rate limits, tracks spend, and automatically fails over when a provider is down.

Why a gateway?

Calling an LLM provider directly from application code creates problems that scale:

  • No centralized logging — you can't reconstruct what the model received for an incident
  • No rate limit visibility — you hit the provider's limit and get 429s in production
  • Provider lock-in — switching from GPT-4o to Claude requires touching every API call
  • No spend control — a bad deployment can run up a $10k bill before anyone notices

A gateway sits between your application and the provider, handling all of these cross-cutting concerns in one place.