Lesson 4 · 13 min
Long document processing patterns
Process documents that exceed the context window with chunking, map-reduce, and hierarchical summarization — without losing coherence.
When the document is bigger than the window
A 500-page legal contract is ~400k tokens. A codebase is millions of tokens. Even with a 1M-token context window, naively dumping documents degrades quality and costs a fortune.
Three patterns handle large documents reliably: