Overview
CareSight predicts excess hospital readmission ratios across approximately 18,000 US facilities using CMS public data. Hospital readmission rates are a direct quality metric — CMS financially penalizes hospitals with avoidable readmissions. This project builds and compares multiple regression models to surface which facilities are at risk and why.
What was built
- Multi-model benchmark: Ridge regression, Lasso regression, XGBoost, and a neural network (TensorFlow/Keras) trained on CMS hospital-level data.
- Best model: Neural Network (R² = 0.975), XGBoost (R² = 0.952) — results reported on a held-out test set.
- SHAP-based feature importance analysis to explain which discharge and patient-volume features drive predicted readmission risk.
- Saved model artifacts:
.pkl(XGBoost) and.h5(neural network) for reproducible inference. - Clean data pipeline: raw CMS CSV ingestion, preprocessing, feature engineering, and train/test split in a single notebook.
- Well-structured README with problem statement, methodology, results table, and setup instructions.
Why it matters
Healthcare ML has real regulatory stakes. Identifying high-risk facilities before CMS penalties are issued gives administrators actionable lead time. SHAP explanations make the model's signals interpretable to non-technical stakeholders — a requirement in any healthcare decision-support context.
Project Info
- Category: Healthcare ML
- Best model: Neural Network — R² = 0.975
- Dataset: CMS hospital readmission data (~18,000 facilities)
- Stack: Python, XGBoost, TensorFlow, SHAP, Scikit-learn, Pandas
- GitHub: Hospital-Readmission-Prediction