LLM · RAG · Full-Stack
Document Intelligence Hub
Overview
A fully local RAG (Retrieval-Augmented Generation) system built for business document Q&A. Users upload PDFs, Word documents, or spreadsheets and ask natural-language questions — the system returns precise, cited answers with source attribution, entirely offline using Llama 3.1 8B via Ollama.
What was built
- Hybrid retrieval pipeline combining BM25 sparse search and dense vector search (nomic-embed-text + ChromaDB), fused via Reciprocal Rank Fusion for best-of-both-worlds recall.
- FastAPI backend with SQLAlchemy and PostgreSQL for document metadata, session management, and query history.
- React + Vite frontend with Zustand state management — multi-document upload, question interface, and cited answer display.
- Document parsing layer using PyMuPDF, python-docx, and openpyxl; NLP entity extraction via spaCy.
- Automated document summarization and cross-document comparison endpoints.
- Full Docker Compose deployment — single command spins up the LLM server, backend, frontend, and database.
- Security-conscious dependency management (axios pinned to avoid compromised release).
Why it matters
Most RAG demos rely on OpenAI or cloud APIs, exposing private documents. This system runs entirely on-device — practical for legal, healthcare, or enterprise use cases where data cannot leave the network. Hybrid retrieval with RRF meaningfully outperforms either BM25 or dense search alone on long-form documents.
Project Info
- Category: LLM / RAG System
- Stack: Llama 3.1 8B, Ollama, ChromaDB, FastAPI, React, PostgreSQL, spaCy, Docker
- GitHub: doc-intelligence-hub