Retrieval-Augmented Generation at Scale: Lessons Learned

We process millions of documents across our products. Here's what we learned about chunking strategies, embedding models, and hybrid retrieval that the benchmarks don't tell you.

RAG is harder than the demos suggest. At our scale — millions of documents across thirty products — the small choices compound into huge quality differences.

This post shares everything the benchmarks won't tell you: chunking that respects semantic boundaries, hybrid retrieval that fuses BM25 with embeddings, re-rankers that actually earn their latency cost, and the eval harness that keeps us honest.