Lesson 6 · 12 min
RAG evaluation — retrieval and answer quality
RAG systems fail in two different places: the retrieval step and the generation step. Evaluating both separately — with the right metrics for each — is what separates a stable RAG feature from one that mysteriously degrades.
RAG fails twice
A RAG pipeline has two failure modes that look identical to the user: a wrong answer.
Failure 1: Retrieval failure — the right document was not retrieved. The generator then answers from nothing or hallucinates.
Failure 2: Generation failure — the right documents were retrieved, but the generator ignored them or misread them.
If you only measure final answer quality, you can't tell which one broke — or fix it efficiently. Evaluate retrieval and generation separately.