CareSight — Hospital Readmission Prediction

Overview

CareSight predicts Excess Readmission Ratios (ERR) across 18,774 hospital-condition records from the CMS FY2024 dataset, covering six HRRP conditions: AMI, CABG, COPD, Heart Failure, Hip/Knee Replacement, and Pneumonia. Hospitals with avoidable readmissions face direct CMS financial penalties — this pipeline identifies at-risk facilities and explains why.

What was built

Modular ML pipeline: src/preprocess.py, train.py, evaluate.py, pipeline.py — clean separation from notebook to production-ready scripts.
XGBoost regressor with cross-validated performance: R² = 0.9480, 5-fold CV R² = 0.9378 (±0.0031).
Binary ERR classification (ERR > 1.0 = penalty risk): F1 = 0.9718, Precision = 0.9773, Recall = 0.9663.
SHAP TreeExplainer with waterfall plots — top drivers: expected readmission rate, predicted rate, number of discharges, number of readmissions, facility name (target-encoded), and condition measure.
Three-tab Streamlit dashboard: prediction input form with live SHAP waterfall, model performance metrics, and ranked feature importance visualization.
Saved model artifacts (xgboost_model.pkl, scaler.pkl) and pre-generated SHAP asset exports for reproducibility.

Why it matters

Healthcare ML has real regulatory stakes. A facility flagged before the CMS penalty cycle can intervene on discharge protocols and care transitions. SHAP waterfall plots make the model's per-hospital reasoning auditable — not just accurate — which is the standard expected in any clinical decision-support context.

Project Info

Category: Healthcare ML
R²: 0.9480 | F1: 0.9718
Dataset: CMS FY2024 — 18,774 hospital-condition records
Stack: Python, XGBoost, SHAP, Streamlit, Scikit-learn, Pandas
GitHub: Hospital-Readmission-Prediction