angela shi, Author at Towards Data Science

Amplify the Expert: A Philosophy for Building Enterprise RAG

Large Language Model

Enterprise Document Intelligence [Vol.1 #M1] – The thesis behind every architectural choice in this series

angela shi

June 26, 2026

20 min read

An LLM as arbiter in RAG retrieval: picking the right candidate with reasons

Large Language Models

Enterprise Document Intelligence [Vol.1 #7C] – One LLM call ranks the candidates with reasons. The…

angela shi

June 25, 2026

31 min read

Finding the right anchors for RAG: keyword, embedding, and TOC signals in parallel

Large Language Models

Enterprise Document Intelligence [Vol.1 #7B] – Retrieval is filtering on structured tables: keywords first, TOC…

angela shi

June 24, 2026

33 min read

Retrieval Is Filtering, Not Search: A Mental Model for Enterprise RAG

Large Language Models

Enterprise Document Intelligence [Vol.1 #7A] – Stop searching strings. Filter line_df and toc_df. Pick anchors…

angela shi

June 23, 2026

21 min read

When RAG Users Ask Vague Questions: Clarify Once, Learn the Default

Large Language Models

Enterprise Document Intelligence [Vol.1 #6bis] – Ask one focused clarification, learn the default from the…

angela shi

June 22, 2026

11 min read

Dispatching the Parsed RAG Question: Chunk Strategy, Model Tier, Activations, Audit

Large Language Models

Enterprise Document Intelligence [Vol.1 #6c] – The decisions the parser makes on top of the…

angela shi

June 18, 2026

28 min read

Five fields RAG should extract from any question: keywords, scope, shape, decomposition, clarification

Large Language Models

Enterprise Document Intelligence [Vol.1 #6b] – The five field families the parser reads straight from…

angela shi

June 17, 2026

31 min read

Parse the question before you search: the missing step in most RAG pipelines

Large Language Models

Enterprise Document Intelligence [Vol.1 #6a] – Why a user question deserves the same parsing as…

angela shi

June 16, 2026

13 min read

Why this image: a row of instruments laid out side by side, the diagnostic this article gives the reader before picking a RAG technique: the right tool for the right structure of document and question, not the heaviest tool by default.

From Regex to Vision Models: Which RAG Technique Fits Which Problem

Large Language Models

Enterprise Document Intelligence [Vol.1 #4] – A diagnostic across PDFs and questions, and a map…

angela shi

June 2, 2026

23 min read

Why this image: an instrument for examining the small print, what this article does to the casual claim that RAG is an ML problem, looking closely at where the ML toolkit fits and where it does not.

RAG Is Not Machine Learning, and the ML Toolkit Solves the Wrong Problem

Large Language Models

Enterprise Document Intelligence [Vol.1 #3] – Why the ML toolkit (hyperparameter sweeps, train/test splits, explainability…

angela shi

June 1, 2026

30 min read