Skip to main content

Lesson 4 · 10 min

Chunking strategies for high-quality embeddings

How you chunk a document before embedding determines what the embedding captures. The wrong chunking strategy degrades retrieval quality regardless of how good the embedding model is.

Why chunking matters for embedding quality

An embedding model compresses a text chunk into a single vector. That vector must capture the chunk's semantic content. Two problems emerge from bad chunking:

Too large: A 2,000-word chunk on "machine learning" contains 40 different sub-topics. The embedding is a blurred average. When a user asks about gradient descent specifically, the chunk's embedding is diluted by all the other content — retrieval misses it.

Too small: A 1-sentence chunk often lacks enough context to embed meaningfully. "The threshold is set to 0.5" means nothing without the surrounding context explaining what threshold and for what.

The optimal chunk size is task-dependent. A good starting point: 256–512 tokens for Q&A retrieval, with sentence boundaries respected.