-

Enterprise Document Intelligence [Vol.1 #7bis] – Tobi Lütke and Andrej Karpathy named the practice in…
19 min read -

Reconstructing the Table of Contents a PDF Forgot to Ship, So RAG Can Scope by Section
Large Language ModelsEnterprise Document Intelligence [Vol.1 #5septies] – When a PDF prints a contents page but exposes…
14 min read -

Enterprise Document Intelligence [Vol.1 #5sexies] – image_df tells you where every picture is. Turning the…
17 min read -

Parse Scanned PDFs for RAG with EasyOCR: Free OCR Gives You Words, Not a Document
Large Language ModelsEnterprise Document Intelligence [Vol.1 #5quinquies] – Same 1974 scanned PDF, two engines. EasyOCR recovers text.…
15 min read -

Enterprise Document Intelligence [Vol.1 #5quater] – The other parsers read the words on a page.…
15 min read -

Enterprise Document Intelligence [Vol.1 #5ter] – Table cells, OCR, captions, headings: cloud-grade structure, running on…
19 min read -

Enterprise Document Intelligence [Vol.1 #5bis] – The same relational tables. Native table cells. OCR for…
16 min read -

Enterprise Document Intelligence [Vol.1 #5B] – One PDF in, a relational set of DataFrames out:…
29 min read -

Enterprise Document Intelligence [Vol.1 #5A] – Document signals (metadata, native TOC, source software) and page-level…
23 min read -

Enterprise Document Intelligence [Vol.1 #4bis] – A coauthor note on the brick-by-brick pitfalls that justified…
28 min read