Skip to content

lorenzo9616/pythonLLM1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Your Document Assistant — Local PDF Question-Answering App

A simple, fully local application that lets you upload PDF documents and ask questions about them using AI — all running on your own computer. No data ever leaves your machine.

What It Does

  1. Drop PDFs — Place files in the pdf/ folder or upload through the browser
  2. Select documents — Choose which files the assistant should read
  3. Ask questions — Get answers drawn directly from your selected documents

Tech Stack

Component Technology Purpose
Frontend Streamlit Web-based user interface
Vector Database Qdrant Stores and searches document content
AI Models Ollama (llama3 + nomic-embed-text) Understands questions and generates answers
Orchestration LangChain (LCEL) Connects all components together

Everything runs locally via Docker — no API keys, no cloud services, no internet required after setup.

Hardware Requirements

Resource Minimum Recommended
RAM 8 GB 16 GB+
Disk 10 GB free 20 GB free
CPU 4 cores 8+ cores
GPU Not required NVIDIA GPU speeds up responses significantly

Note: The llama3 model is ~4.7 GB. First-time setup downloads ~6 GB total.

Setup & Installation

Prerequisites

Step 1: Clone the Repository

git clone <repository-url>
cd pythonLLM1

Step 2: Start All Services

docker compose up --build -d

This starts three services:

  • Qdrant (vector database) on port 6333
  • Ollama (AI models) on port 11434
  • Streamlit app (user interface) on port 8501

Step 3: Pull the Required AI Models

Run once after first startup:

# Language model (~4.7 GB)
docker compose exec ollama ollama pull llama3

# Embedding model (~274 MB)
docker compose exec ollama ollama pull nomic-embed-text

Models are stored in a Docker volume and persist across restarts.

Step 4: Open the App

http://localhost:8501

Usage

Adding Documents

Option A — Folder drop (recommended): Copy any PDF into the pdf/ folder in this directory. Refresh the app and it loads automatically.

Option B — Upload: Use the file uploader in Step 1 of the app.

Asking Questions

  1. Select which documents to search (Step 2)
  2. Type your question in the chat box (Step 3)
  3. The assistant answers using only your selected documents and shows which file(s) the answer came from

Stopping the App

docker compose down

To also delete all stored documents and models:

docker compose down -v

GPU Support (Optional)

For faster responses with an NVIDIA GPU:

  1. Install the NVIDIA Container Toolkit
  2. Uncomment the GPU section in docker-compose.yml under the ollama service
  3. Restart: docker compose up -d

LangChain Architecture

This app uses modern LangChain (LCEL-based) throughout:

Component What it does
SemanticChunker Splits PDFs at meaning boundaries instead of fixed character counts
MultiQueryRetriever Generates multiple phrasings of each question to improve search recall
create_history_aware_retriever Reformulates questions using chat history before searching
create_retrieval_chain (LCEL) Composes retrieval + answer generation (replaces legacy chain)
RunnableWithMessageHistory Manages conversation memory automatically
ChatOllama Uses Ollama's chat interface for proper system prompts and roles

See scope.md for full architecture details.

Developer Notes

  • Qdrant Dashboard: http://localhost:6333/dashboard
  • Metadata filtering: Each chunk stores source_filename. Qdrant MatchAny filters restrict search to user-selected documents.
  • PDF folder: ./pdf/ on the host maps to /app/pdf inside the container

Project Structure

├── app.py               # Streamlit app + LangChain LCEL orchestration
├── requirements.txt     # Python dependencies
├── Dockerfile           # Container for the Streamlit app
├── docker-compose.yml   # Orchestrates all services
├── pdf/                 # Drop PDFs here — auto-loaded on startup
├── scope.md             # Architecture and feature scope
├── CLAUDE.md            # Project context for Claude Code
└── README.md            # This file

Things You Might Want to Know

What AI models does this app use?

Two models run locally via Ollama:

Model Role Size
llama3 Answers your questions and generates query variants for better search ~4.7 GB
nomic-embed-text Converts text into vectors for storage and search in Qdrant ~274 MB

Both are configured in app.py and pulled with:

docker compose exec ollama ollama pull llama3
docker compose exec ollama ollama pull nomic-embed-text

Where is the LangChain code and what does each part do?

All LangChain logic lives in app.py. Here's where each function is and what it does:

Function What it does
get_embeddings() Loads OllamaEmbeddings — converts text to vectors
get_llm() Loads ChatOllama — the language model
ingest_pdf_from_path() Uses SemanticChunker + QdrantVectorStore to split and store a PDF
ingest_pdf() Handles uploaded files, delegates to ingest_pdf_from_path
build_chain() Assembles the full LCEL chain: MultiQueryRetrieverhistory_aware_retrievercreate_retrieval_chainRunnableWithMessageHistory

build_chain() is the brain of the app — everything else is plumbing or UI.

How LangChain is used (four jobs):

  1. Ingestion pipelineSemanticChunker splits PDFs at meaning boundaries (not arbitrary character counts), OllamaEmbeddings converts chunks to vectors, QdrantVectorStore stores them.
  2. Smart retrievalMultiQueryRetriever rephrases your question 3 ways, searches Qdrant for each, then deduplicates results.
  3. Conversation awarenesscreate_history_aware_retriever rewrites your question as standalone before searching, so follow-up questions like "Who wrote it?" work correctly.
  4. Memory managementRunnableWithMessageHistory automatically saves every question and answer into InMemoryChatMessageHistory — no manual history tracking needed.

The full request flow:

Your question
    → RunnableWithMessageHistory  (injects chat history)
    → history_aware_retriever     (rewrites as standalone question)
    → MultiQueryRetriever         (searches Qdrant with 3 phrasings)
    → create_stuff_documents_chain (feeds docs + history to ChatOllama)
    → Answer + history auto-updated

Can this app handle images?

Not currently — but it can be expanded. There are two scenarios:

Images inside PDFs (charts, scanned pages) Swap PyPDFLoader for UnstructuredPDFLoader (from langchain-unstructured) which can OCR embedded images. Add pytesseract as the OCR engine.

Standalone image files (JPG, PNG) This requires a vision-language model:

Addition Why
llava model via Ollama (ollama pull llava) Reads and describes image content
Updated ingest function Image → vision model → text description → embed → store in Qdrant
File uploader updated to accept jpg, png UI change
Pillow dependency Image file handling

The Qdrant + retrieval pipeline stays the same because you're ultimately storing text descriptions of images. The new step is: image → vision model → description → store as text.


How does this app work? (For developers new to AI)

Think of it like a smart librarian for your own private files.

Setup (one-time): You drop PDFs into a folder. The app reads every page and breaks it into chunks — not by counting characters, but by detecting where the topic changes. Each chunk gets converted into a list of numbers called a vector (think GPS coordinates for meaning — similar ideas have similar coordinates). All vectors are stored in Qdrant, a searchable database of meaning.

When you ask a question: Your question becomes its own vector. The app finds chunks in Qdrant whose coordinates are closest to your question — that's how it finds relevant paragraphs without keyword matching. It does this 3 times with different phrasings to catch things a single search might miss. The relevant paragraphs go to the language model (llama3), which reads them and writes a natural-language answer using only what's in your documents.

Why Docker? The app needs three programs running simultaneously (Streamlit, Qdrant, Ollama). Docker starts all three with one command and wires them together.

Top 3 examples of how to use it:

  1. Insurance or legal documents — Drop your health insurance policy or lease agreement. Ask "Does my plan cover physiotherapy?" or "What is the notice period if I want to leave?" Get the exact answer without reading 40 pages of fine print.

  2. Study tool for course materials — Drop lecture PDFs or textbook chapters. Ask "Explain the difference between TCP and UDP" or "What were the main causes of World War I according to chapter 3?" Answers come only from your course material — not generic internet content.

  3. Work contracts or HR documents — Drop an employment contract or company handbook. Ask "How many vacation days am I entitled to?" or "What is the remote work policy?" The assistant cites the exact clause so you know where the answer came from.

About

Python LLM implementation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors