LLM · RAG · Full-Stack

Document Intelligence Hub

Overview

A fully local RAG (Retrieval-Augmented Generation) system built for business document Q&A. Users upload PDFs, Word documents, or spreadsheets and ask natural-language questions — the system returns precise, cited answers with source attribution, entirely offline using Llama 3.1 8B via Ollama.

What was built

  • Hybrid retrieval pipeline combining BM25 sparse search and dense vector search (nomic-embed-text + ChromaDB), fused via Reciprocal Rank Fusion for best-of-both-worlds recall.
  • FastAPI backend with SQLAlchemy and PostgreSQL for document metadata, session management, and query history.
  • React + Vite frontend with Zustand state management — multi-document upload, question interface, and cited answer display.
  • Document parsing layer using PyMuPDF, python-docx, and openpyxl; NLP entity extraction via spaCy.
  • Automated document summarization and cross-document comparison endpoints.
  • Full Docker Compose deployment — single command spins up the LLM server, backend, frontend, and database.
  • Security-conscious dependency management (axios pinned to avoid compromised release).

Why it matters

Most RAG demos rely on OpenAI or cloud APIs, exposing private documents. This system runs entirely on-device — practical for legal, healthcare, or enterprise use cases where data cannot leave the network. Hybrid retrieval with RRF meaningfully outperforms either BM25 or dense search alone on long-form documents.

Project Info

  • Category: LLM / RAG System
  • Stack: Llama 3.1 8B, Ollama, ChromaDB, FastAPI, React, PostgreSQL, spaCy, Docker
  • GitHub: doc-intelligence-hub