Lesson 3 · 13 min
Fine-tuning embedding models for domain-specific retrieval
When off-the-shelf embeddings fail on your domain, fine-tuning on your own (query, positive, hard negative) triplets reliably improves nDCG@10 by 15–30%. The data pipeline matters more than the training code.
When to fine-tune vs. when to use off-the-shelf
Fine-tune when:
- Your domain has specialized vocabulary not well-represented in web data (clinical notes, legal contracts, source code, scientific papers)
- Retrieval quality on a domain-specific eval set is > 10% below a general MTEB leader
- You have or can generate 1,000+ (query, positive document) pairs
Don't fine-tune when:
- Your retrieval corpus is general web-like text
- You have < 500 training pairs (overfitting risk)
- The off-the-shelf model already achieves > 0.80 nDCG@10 on your domain
The most common mistake: fine-tuning on a small domain dataset and overfitting, producing a model that memorizes training pairs but generalizes worse than the base model.