hospital-readmission-lgbm - Hospital Readmission Risk Prediction
Model Description
This hospital-readmission-lgbm model predicts the risk of 30-day hospital readmission for diabetic patients. The model was trained on the UCI Diabetes 130-US Hospitals dataset with robust cross-validation and comprehensive evaluation.
Task: Hospital 30-Day Readmission Risk Prediction
Model Type: Gradient Boosting Machine (LightGBM)
Training Date: 2025-12-01 18:33:39
Environment: kaggle (GPU)
Performance Metrics
Cross-Validation Results (5-Fold CV)
| Metric | Value |
|---|---|
| Mean ROC-AUC | 0.8399 ± 0.0055 |
Final Test Set Results
Primary Metrics
| Metric | Value |
|---|---|
| ROC-AUC | 0.8424 |
| PR-AUC | 0.4000 |
| F1 Score | 0.1053 |
Classification Metrics
| Metric | Value |
|---|---|
| Precision | 0.6978 |
| Recall | 0.0569 |
Clinical Metrics
| Metric | Value |
|---|---|
| Sensitivity (TPR) | 0.0569 |
| Specificity (TNR) | 0.9969 |
Model Visualizations
ROC Curve
Precision-Recall Curve
Confusion Matrix
Calibration Curve
Feature Importance
Learning Curves
Validation Curves
Cross-Fold Metrics Comparison
Dataset Information
| Property | Value |
|---|---|
| Total Samples | 101,766 |
| Features | 113 |
| Development Set | 86,501 |
| Final Test Set | 15,265 |
Training Configuration
Evaluation Pipeline
- Final Holdout Split: Stratified split into development and test sets
- Hyperparameter Search: Grid search with 5-fold cross-validation
- Nested Early Stopping: Inner validation split within each fold
- Final Evaluation: Untouched holdout test set
Best Hyperparameters
{
"n_estimators": 150,
"learning_rate": 0.05,
"num_leaves": 31,
"max_depth": -1,
"subsample": 0.9,
"colsample_bytree": 0.7,
"reg_alpha": 0.0,
"reg_lambda": 0.1
}
Training Details
- Total Training Time: 215.04 minutes
- Hyperparameter Search Time: 127.61 minutes
- Cross-Validation Folds: 5
- Early Stopping: Yes
- Device: GPU
Usage
Loading the Model
import joblib
import pandas as pd
# Load the trained model
model = joblib.load('gradient_boosting_model.joblib')
# Load your preprocessed features
X_new = pd.read_csv('your_features.csv')
# Make predictions
predictions = model.predict(X_new)
probabilities = model.predict_proba(X_new)[:, 1]
Feature Requirements
The model expects preprocessed features from the UCI Diabetes 130-US Hospitals dataset. Features include:
- Patient demographics (age, gender, race)
- Admission details (admission type, source, length of stay)
- Medical history (number of diagnoses, procedures)
- Medication information
- Lab results (A1c test results, glucose serum test)
- Previous utilization (outpatient, inpatient, emergency visits)
See feature_importance.csv for complete feature list and importance scores.
Limitations and Biases
- Domain-Specific: Model is trained specifically for diabetic patient readmissions
- Dataset Bias: Training data from 130 US hospitals (1999-2008) may not generalize to all healthcare settings
- Class Imbalance: Dataset may have imbalanced readmission rates
- Temporal Drift: Healthcare practices have evolved since data collection
- Geographic Limitation: US-based dataset may not apply to other healthcare systems
Ethical Considerations
This model is intended to assist healthcare providers in identifying patients at risk of readmission. It should:
- NOT be used as the sole basis for treatment decisions
- Be validated on your specific patient population before deployment
- Be monitored for fairness across different demographic groups
- Be regularly retrained with recent data to account for changing patterns
Citation
@misc{hospital-readmission-lgbm,
author = {Your Name},
title = {LightGBM Model for Hospital Readmission Prediction},
year = {2025},
url = {https://huggingface.co/your-repo}
}
Dataset Citation
@misc{strack2014impact,
title={Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records},
author={Strack, Beata and DeShazo, Jonathan P and Gennings, Chris and Olmo, Juan L and Ventura, Sebastian and Cios, Krzysztof J and Clore, John N},
journal={BioMed Research International},
volume={2014},
year={2014},
publisher={Hindawi}
}
License
This model is released under the MIT License. The underlying dataset has its own license terms.
Contact
For questions or issues, please open an issue in the repository.
Disclaimer: This model is for research and educational purposes. Always consult healthcare professionals for medical decisions.







