Skip to main content

Lesson 6 · 10 min

Hybrid retrieval and reranking — the production pattern

Dense alone is a demo; hybrid + rerank is what ships. The pattern that consistently improves precision@5 by 10-25 points on real datasets.

The pipeline

Query
  ├──→ Dense (embedding model + ANN, top-50)
  └──→ Sparse (BM25, top-50)
         ↓
      Fuse with RRF
         ↓
      Cross-encoder rerank (top-50 → top-5)
         ↓
      LLM sees only top-5

Three facts that make this consistent:

  1. Dense and BM25 catch different failures. Dense gets paraphrase; BM25 gets exact match (codes, IDs, technical terms).
  2. RRF (Reciprocal Rank Fusion) merges them well without parameter tuning.
  3. A cross-encoder rerank costs ~50ms but cuts hallucinations roughly in half compared to bi-encoder retrieval alone.