Your Document Assistant — Local PDF Question-Answering App

A simple, fully local application that lets you upload PDF documents and ask questions about them using AI — all running on your own computer. No data ever leaves your machine.

What It Does

Drop PDFs — Place files in the pdf/ folder or upload through the browser
Select documents — Choose which files the assistant should read
Ask questions — Get answers drawn directly from your selected documents

Tech Stack

Component	Technology	Purpose
Frontend	Streamlit	Web-based user interface
Vector Database	Qdrant	Stores and searches document content
AI Models	Ollama (llama3 + nomic-embed-text)	Understands questions and generates answers
Orchestration	LangChain (LCEL)	Connects all components together

Everything runs locally via Docker — no API keys, no cloud services, no internet required after setup.

Hardware Requirements

Resource	Minimum	Recommended
RAM	8 GB	16 GB+
Disk	10 GB free	20 GB free
CPU	4 cores	8+ cores
GPU	Not required	NVIDIA GPU speeds up responses significantly

Note: The llama3 model is ~4.7 GB. First-time setup downloads ~6 GB total.

Setup & Installation

Prerequisites

Docker Desktop installed and running
Git (to clone this repository)

Step 1: Clone the Repository

git clone <repository-url>
cd pythonLLM1

Step 2: Start All Services

docker compose up --build -d

This starts three services:

Qdrant (vector database) on port 6333
Ollama (AI models) on port 11434
Streamlit app (user interface) on port 8501

Step 3: Pull the Required AI Models

Run once after first startup:

# Language model (~4.7 GB)
docker compose exec ollama ollama pull llama3

# Embedding model (~274 MB)
docker compose exec ollama ollama pull nomic-embed-text

Models are stored in a Docker volume and persist across restarts.

Step 4: Open the App

http://localhost:8501

Usage

Adding Documents

Option A — Folder drop (recommended): Copy any PDF into the pdf/ folder in this directory. Refresh the app and it loads automatically.

Option B — Upload: Use the file uploader in Step 1 of the app.

Asking Questions

Select which documents to search (Step 2)
Type your question in the chat box (Step 3)
The assistant answers using only your selected documents and shows which file(s) the answer came from

Stopping the App

docker compose down

To also delete all stored documents and models:

docker compose down -v

GPU Support (Optional)

For faster responses with an NVIDIA GPU:

Install the NVIDIA Container Toolkit
Uncomment the GPU section in docker-compose.yml under the ollama service
Restart: docker compose up -d

LangChain Architecture

This app uses modern LangChain (LCEL-based) throughout:

Component	What it does
`SemanticChunker`	Splits PDFs at meaning boundaries instead of fixed character counts
`MultiQueryRetriever`	Generates multiple phrasings of each question to improve search recall
`create_history_aware_retriever`	Reformulates questions using chat history before searching
`create_retrieval_chain` (LCEL)	Composes retrieval + answer generation (replaces legacy chain)
`RunnableWithMessageHistory`	Manages conversation memory automatically
`ChatOllama`	Uses Ollama's chat interface for proper system prompts and roles

See scope.md for full architecture details.

Developer Notes

Qdrant Dashboard: http://localhost:6333/dashboard
Metadata filtering: Each chunk stores source_filename. Qdrant MatchAny filters restrict search to user-selected documents.
PDF folder: ./pdf/ on the host maps to /app/pdf inside the container

Project Structure

├── app.py               # Streamlit app + LangChain LCEL orchestration
├── requirements.txt     # Python dependencies
├── Dockerfile           # Container for the Streamlit app
├── docker-compose.yml   # Orchestrates all services
├── pdf/                 # Drop PDFs here — auto-loaded on startup
├── scope.md             # Architecture and feature scope
├── CLAUDE.md            # Project context for Claude Code
└── README.md            # This file

Things You Might Want to Know

What AI models does this app use?

Two models run locally via Ollama:

Model	Role	Size
`llama3`	Answers your questions and generates query variants for better search	~4.7 GB
`nomic-embed-text`	Converts text into vectors for storage and search in Qdrant	~274 MB

Both are configured in app.py and pulled with:

docker compose exec ollama ollama pull llama3
docker compose exec ollama ollama pull nomic-embed-text

Where is the LangChain code and what does each part do?

All LangChain logic lives in app.py. Here's where each function is and what it does:

Function	What it does
`get_embeddings()`	Loads `OllamaEmbeddings` — converts text to vectors
`get_llm()`	Loads `ChatOllama` — the language model
`ingest_pdf_from_path()`	Uses `SemanticChunker` + `QdrantVectorStore` to split and store a PDF
`ingest_pdf()`	Handles uploaded files, delegates to `ingest_pdf_from_path`
`build_chain()`	Assembles the full LCEL chain: `MultiQueryRetriever` → `history_aware_retriever` → `create_retrieval_chain` → `RunnableWithMessageHistory`

build_chain() is the brain of the app — everything else is plumbing or UI.

How LangChain is used (four jobs):

Ingestion pipeline — SemanticChunker splits PDFs at meaning boundaries (not arbitrary character counts), OllamaEmbeddings converts chunks to vectors, QdrantVectorStore stores them.
Smart retrieval — MultiQueryRetriever rephrases your question 3 ways, searches Qdrant for each, then deduplicates results.
Conversation awareness — create_history_aware_retriever rewrites your question as standalone before searching, so follow-up questions like "Who wrote it?" work correctly.
Memory management — RunnableWithMessageHistory automatically saves every question and answer into InMemoryChatMessageHistory — no manual history tracking needed.

The full request flow:

Your question
    → RunnableWithMessageHistory  (injects chat history)
    → history_aware_retriever     (rewrites as standalone question)
    → MultiQueryRetriever         (searches Qdrant with 3 phrasings)
    → create_stuff_documents_chain (feeds docs + history to ChatOllama)
    → Answer + history auto-updated

Can this app handle images?

Not currently — but it can be expanded. There are two scenarios:

Images inside PDFs (charts, scanned pages) Swap PyPDFLoader for UnstructuredPDFLoader (from langchain-unstructured) which can OCR embedded images. Add pytesseract as the OCR engine.

Standalone image files (JPG, PNG) This requires a vision-language model:

Addition	Why
`llava` model via Ollama (`ollama pull llava`)	Reads and describes image content
Updated ingest function	Image → vision model → text description → embed → store in Qdrant
File uploader updated to accept `jpg`, `png`	UI change
`Pillow` dependency	Image file handling

The Qdrant + retrieval pipeline stays the same because you're ultimately storing text descriptions of images. The new step is: image → vision model → description → store as text.

How does this app work? (For developers new to AI)

Think of it like a smart librarian for your own private files.

Setup (one-time): You drop PDFs into a folder. The app reads every page and breaks it into chunks — not by counting characters, but by detecting where the topic changes. Each chunk gets converted into a list of numbers called a vector (think GPS coordinates for meaning — similar ideas have similar coordinates). All vectors are stored in Qdrant, a searchable database of meaning.

When you ask a question: Your question becomes its own vector. The app finds chunks in Qdrant whose coordinates are closest to your question — that's how it finds relevant paragraphs without keyword matching. It does this 3 times with different phrasings to catch things a single search might miss. The relevant paragraphs go to the language model (llama3), which reads them and writes a natural-language answer using only what's in your documents.

Why Docker? The app needs three programs running simultaneously (Streamlit, Qdrant, Ollama). Docker starts all three with one command and wires them together.

Top 3 examples of how to use it:

Insurance or legal documents — Drop your health insurance policy or lease agreement. Ask "Does my plan cover physiotherapy?" or "What is the notice period if I want to leave?" Get the exact answer without reading 40 pages of fine print.
Study tool for course materials — Drop lecture PDFs or textbook chapters. Ask "Explain the difference between TCP and UDP" or "What were the main causes of World War I according to chapter 3?" Answers come only from your course material — not generic internet content.
Work contracts or HR documents — Drop an employment contract or company handbook. Ask "How many vacation days am I entitled to?" or "What is the remote work policy?" The assistant cites the exact clause so you know where the answer came from.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Your Document Assistant — Local PDF Question-Answering App

What It Does

Tech Stack

Hardware Requirements

Setup & Installation

Prerequisites

Step 1: Clone the Repository

Step 2: Start All Services

Step 3: Pull the Required AI Models

Step 4: Open the App

Usage

Adding Documents

Asking Questions

Stopping the App

GPU Support (Optional)

LangChain Architecture

Developer Notes

Project Structure

Things You Might Want to Know

What AI models does this app use?

Where is the LangChain code and what does each part do?

Can this app handle images?

How does this app work? (For developers new to AI)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
pdf		pdf
.claudeskills		.claudeskills
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
scope.md		scope.md

Folders and files

Latest commit

History

Repository files navigation

Your Document Assistant — Local PDF Question-Answering App

What It Does

Tech Stack

Hardware Requirements

Setup & Installation

Prerequisites

Step 1: Clone the Repository

Step 2: Start All Services

Step 3: Pull the Required AI Models

Step 4: Open the App

Usage

Adding Documents

Asking Questions

Stopping the App

GPU Support (Optional)

LangChain Architecture

Developer Notes

Project Structure

Things You Might Want to Know

What AI models does this app use?

Where is the LangChain code and what does each part do?

Can this app handle images?

How does this app work? (For developers new to AI)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages