Back to writing
Writing
Building Production RAG Pipelines: Lessons from Real Deployments
What actually breaks once you leave the demo notebook.
A demo RAG pipeline takes a weekend. A production one takes months — and most of the cost is in the parts you can’t see in a notebook: drift detection, eval suites, caching strategy, and what happens when retrieval gets it wrong in front of a user.
Chunking is not preprocessing — it’s product design
Most RAG fails because the chunking strategy was treated as a one-line decision…
Retrieval without grounding is just search
If your model can’t cite the chunk it pulled from, you can’t debug a hallucination…
The eval harness is the real codebase
Models change. Prompts change. Embeddings change. Without an automated regression suite you’ll ship silently broken retrieval and never know…