A simple, fully local application that lets you upload PDF documents and ask questions about them using AI — all running on your own computer. No data ever leaves your machine.
- Drop PDFs — Place files in the
pdf/folder or upload through the browser - Select documents — Choose which files the assistant should read
- Ask questions — Get answers drawn directly from your selected documents
| Component | Technology | Purpose |
|---|---|---|
| Frontend | Streamlit | Web-based user interface |
| Vector Database | Qdrant | Stores and searches document content |
| AI Models | Ollama (llama3 + nomic-embed-text) | Understands questions and generates answers |
| Orchestration | LangChain (LCEL) | Connects all components together |
Everything runs locally via Docker — no API keys, no cloud services, no internet required after setup.
| Resource | Minimum | Recommended |
|---|---|---|
| RAM | 8 GB | 16 GB+ |
| Disk | 10 GB free | 20 GB free |
| CPU | 4 cores | 8+ cores |
| GPU | Not required | NVIDIA GPU speeds up responses significantly |
Note: The llama3 model is ~4.7 GB. First-time setup downloads ~6 GB total.
- Docker Desktop installed and running
- Git (to clone this repository)
git clone <repository-url>
cd pythonLLM1docker compose up --build -dThis starts three services:
- Qdrant (vector database) on port 6333
- Ollama (AI models) on port 11434
- Streamlit app (user interface) on port 8501
Run once after first startup:
# Language model (~4.7 GB)
docker compose exec ollama ollama pull llama3
# Embedding model (~274 MB)
docker compose exec ollama ollama pull nomic-embed-textModels are stored in a Docker volume and persist across restarts.
http://localhost:8501
Option A — Folder drop (recommended):
Copy any PDF into the pdf/ folder in this directory. Refresh the app and it loads automatically.
Option B — Upload: Use the file uploader in Step 1 of the app.
- Select which documents to search (Step 2)
- Type your question in the chat box (Step 3)
- The assistant answers using only your selected documents and shows which file(s) the answer came from
docker compose downTo also delete all stored documents and models:
docker compose down -vFor faster responses with an NVIDIA GPU:
- Install the NVIDIA Container Toolkit
- Uncomment the GPU section in
docker-compose.ymlunder theollamaservice - Restart:
docker compose up -d
This app uses modern LangChain (LCEL-based) throughout:
| Component | What it does |
|---|---|
SemanticChunker |
Splits PDFs at meaning boundaries instead of fixed character counts |
MultiQueryRetriever |
Generates multiple phrasings of each question to improve search recall |
create_history_aware_retriever |
Reformulates questions using chat history before searching |
create_retrieval_chain (LCEL) |
Composes retrieval + answer generation (replaces legacy chain) |
RunnableWithMessageHistory |
Manages conversation memory automatically |
ChatOllama |
Uses Ollama's chat interface for proper system prompts and roles |
See scope.md for full architecture details.
- Qdrant Dashboard:
http://localhost:6333/dashboard - Metadata filtering: Each chunk stores
source_filename. QdrantMatchAnyfilters restrict search to user-selected documents. - PDF folder:
./pdf/on the host maps to/app/pdfinside the container
├── app.py # Streamlit app + LangChain LCEL orchestration
├── requirements.txt # Python dependencies
├── Dockerfile # Container for the Streamlit app
├── docker-compose.yml # Orchestrates all services
├── pdf/ # Drop PDFs here — auto-loaded on startup
├── scope.md # Architecture and feature scope
├── CLAUDE.md # Project context for Claude Code
└── README.md # This file
Two models run locally via Ollama:
| Model | Role | Size |
|---|---|---|
llama3 |
Answers your questions and generates query variants for better search | ~4.7 GB |
nomic-embed-text |
Converts text into vectors for storage and search in Qdrant | ~274 MB |
Both are configured in app.py and pulled with:
docker compose exec ollama ollama pull llama3
docker compose exec ollama ollama pull nomic-embed-textAll LangChain logic lives in app.py. Here's where each function is and what it does:
| Function | What it does |
|---|---|
get_embeddings() |
Loads OllamaEmbeddings — converts text to vectors |
get_llm() |
Loads ChatOllama — the language model |
ingest_pdf_from_path() |
Uses SemanticChunker + QdrantVectorStore to split and store a PDF |
ingest_pdf() |
Handles uploaded files, delegates to ingest_pdf_from_path |
build_chain() |
Assembles the full LCEL chain: MultiQueryRetriever → history_aware_retriever → create_retrieval_chain → RunnableWithMessageHistory |
build_chain() is the brain of the app — everything else is plumbing or UI.
How LangChain is used (four jobs):
- Ingestion pipeline —
SemanticChunkersplits PDFs at meaning boundaries (not arbitrary character counts),OllamaEmbeddingsconverts chunks to vectors,QdrantVectorStorestores them. - Smart retrieval —
MultiQueryRetrieverrephrases your question 3 ways, searches Qdrant for each, then deduplicates results. - Conversation awareness —
create_history_aware_retrieverrewrites your question as standalone before searching, so follow-up questions like "Who wrote it?" work correctly. - Memory management —
RunnableWithMessageHistoryautomatically saves every question and answer intoInMemoryChatMessageHistory— no manual history tracking needed.
The full request flow:
Your question
→ RunnableWithMessageHistory (injects chat history)
→ history_aware_retriever (rewrites as standalone question)
→ MultiQueryRetriever (searches Qdrant with 3 phrasings)
→ create_stuff_documents_chain (feeds docs + history to ChatOllama)
→ Answer + history auto-updated
Not currently — but it can be expanded. There are two scenarios:
Images inside PDFs (charts, scanned pages)
Swap PyPDFLoader for UnstructuredPDFLoader (from langchain-unstructured) which can OCR embedded images. Add pytesseract as the OCR engine.
Standalone image files (JPG, PNG) This requires a vision-language model:
| Addition | Why |
|---|---|
llava model via Ollama (ollama pull llava) |
Reads and describes image content |
| Updated ingest function | Image → vision model → text description → embed → store in Qdrant |
File uploader updated to accept jpg, png |
UI change |
Pillow dependency |
Image file handling |
The Qdrant + retrieval pipeline stays the same because you're ultimately storing text descriptions of images. The new step is: image → vision model → description → store as text.
Think of it like a smart librarian for your own private files.
Setup (one-time): You drop PDFs into a folder. The app reads every page and breaks it into chunks — not by counting characters, but by detecting where the topic changes. Each chunk gets converted into a list of numbers called a vector (think GPS coordinates for meaning — similar ideas have similar coordinates). All vectors are stored in Qdrant, a searchable database of meaning.
When you ask a question: Your question becomes its own vector. The app finds chunks in Qdrant whose coordinates are closest to your question — that's how it finds relevant paragraphs without keyword matching. It does this 3 times with different phrasings to catch things a single search might miss. The relevant paragraphs go to the language model (llama3), which reads them and writes a natural-language answer using only what's in your documents.
Why Docker? The app needs three programs running simultaneously (Streamlit, Qdrant, Ollama). Docker starts all three with one command and wires them together.
Top 3 examples of how to use it:
-
Insurance or legal documents — Drop your health insurance policy or lease agreement. Ask "Does my plan cover physiotherapy?" or "What is the notice period if I want to leave?" Get the exact answer without reading 40 pages of fine print.
-
Study tool for course materials — Drop lecture PDFs or textbook chapters. Ask "Explain the difference between TCP and UDP" or "What were the main causes of World War I according to chapter 3?" Answers come only from your course material — not generic internet content.
-
Work contracts or HR documents — Drop an employment contract or company handbook. Ask "How many vacation days am I entitled to?" or "What is the remote work policy?" The assistant cites the exact clause so you know where the answer came from.