A demo RAG pipeline takes a weekend. A production one takes months — and most of the cost is in the parts you can’t see in a notebook: drift detection, eval suites, caching strategy, and what happens when retrieval gets it wrong in front of a user.

Chunking is not preprocessing — it’s product design

Most RAG fails because the chunking strategy was treated as a one-line decision…

If your model can’t cite the chunk it pulled from, you can’t debug a hallucination…

The eval harness is the real codebase

Models change. Prompts change. Embeddings change. Without an automated regression suite you’ll ship silently broken retrieval and never know…