Mykhailo Bryndzak is breaking down the engineering decisions that actually matter: chunking, OCR pipeline, hybrid search, citations. The parts tutorials skip because they're hard.
I spent the last few months at SYNDICODE building a RAG system in production for one of our clients. Not a notebook demo. A real one — multi-tenant, used daily by domain experts, with answers that move money. I want to write up what I actually learned. The parts that were hard. The business problem the system solves is one I've seen in three different industries already: a team has thousands of pages of dense, scanned, table-heavy documents, and the answers their people need every day are buried inside. Today they search by hand. It takes hours. Sometimes they get it wrong, and a wrong answer is expensive. The economics are brutally simple: → A specialist who finds the right reference in 10 seconds instead of an hour bills the next project sooner. → A wrong number on a deliverable can mean a rework, a fine, or a failed audit. → Tribal knowledge stops walking out the door every time someone retires. The system I built lets users upload a PDF (or DOCX, or XLSX) and chat with it. Every answer cites the exact source page so a human can verify. It runs on cloud infra (managed Postgres with vector search, a small container service, object storage with direct browser-to-storage uploads), costs a few dollars a day per active user, and handles tables, OCR errors, and multi-turn conversation. Over the next few weeks I'll break down the parts that were actually hard, in roughly the order a request flows through the system: → upload that doesn't melt → table-aware chunking → OCR-aware pipeline → hybrid search with RRF → domain glossary, used twice → citations + page highlight → conversational rephrase → pgvector, not Pinecone → intent routing off the LLM → cross-doc dedup, same vectors If you're building anything that turns documents into answers — for legal, support, finance, healthcare, internal knowledge — these are the same problems you'll hit. I'd rather you hit them with the playbook. What's the most broken part of the standard RAG stack you've had to fix for production? #RAG #LLM #AIEngineering