Document Intelligence Hub

LLM · RAG · Full-Stack

Document Intelligence Hub

Overview

A fully local RAG (Retrieval-Augmented Generation) system built for business document Q&A. Users upload PDFs, Word documents, or spreadsheets and ask natural-language questions — the system returns precise, cited answers with source attribution, entirely offline using Llama 3.1 8B via Ollama.

What was built

Hybrid retrieval pipeline combining BM25 sparse search and dense vector search (nomic-embed-text + ChromaDB), fused via Reciprocal Rank Fusion for best-of-both-worlds recall.
FastAPI backend with SQLAlchemy and PostgreSQL for document metadata, session management, and query history.
React + Vite frontend with Zustand state management — multi-document upload, question interface, and cited answer display.
Document parsing layer using PyMuPDF, python-docx, and openpyxl; NLP entity extraction via spaCy.
Automated document summarization and cross-document comparison endpoints.
Full Docker Compose deployment — single command spins up the LLM server, backend, frontend, and database.
Security-conscious dependency management (axios pinned to avoid compromised release).

Why it matters

Most RAG demos rely on OpenAI or cloud APIs, exposing private documents. This system runs entirely on-device — practical for legal, healthcare, or enterprise use cases where data cannot leave the network. Hybrid retrieval with RRF meaningfully outperforms either BM25 or dense search alone on long-form documents.

Project Info

Category: LLM / RAG System
Stack: Llama 3.1 8B, Ollama, ChromaDB, FastAPI, React, PostgreSQL, spaCy, Docker
GitHub: doc-intelligence-hub