Lesson 6 · 10 min
Hybrid retrieval and reranking — the production pattern
Dense alone is a demo; hybrid + rerank is what ships. The pattern that consistently improves precision@5 by 10-25 points on real datasets.
The pipeline
Query
├──→ Dense (embedding model + ANN, top-50)
└──→ Sparse (BM25, top-50)
↓
Fuse with RRF
↓
Cross-encoder rerank (top-50 → top-5)
↓
LLM sees only top-5Three facts that make this consistent:
- Dense and BM25 catch different failures. Dense gets paraphrase; BM25 gets exact match (codes, IDs, technical terms).
- RRF (Reciprocal Rank Fusion) merges them well without parameter tuning.
- A cross-encoder rerank costs ~50ms but cuts hallucinations roughly in half compared to bi-encoder retrieval alone.