hospital-readmission-lgbm - Hospital Readmission Risk Prediction

Model Description

This hospital-readmission-lgbm model predicts the risk of 30-day hospital readmission for diabetic patients. The model was trained on the UCI Diabetes 130-US Hospitals dataset with robust cross-validation and comprehensive evaluation.

Task: Hospital 30-Day Readmission Risk Prediction
Model Type: Gradient Boosting Machine (LightGBM)
Training Date: 2025-12-01 18:33:39
Environment: kaggle (GPU)

Performance Metrics

Cross-Validation Results (5-Fold CV)

Metric Value
Mean ROC-AUC 0.8399 ± 0.0055

Final Test Set Results

Primary Metrics

Metric Value
ROC-AUC 0.8424
PR-AUC 0.4000
F1 Score 0.1053

Classification Metrics

Metric Value
Precision 0.6978
Recall 0.0569

Clinical Metrics

Metric Value
Sensitivity (TPR) 0.0569
Specificity (TNR) 0.9969

Model Visualizations

ROC Curve

ROC Curve

Precision-Recall Curve

Precision-Recall Curve

Confusion Matrix

Confusion Matrix

Calibration Curve

Calibration Curve

Feature Importance

Feature Importance

Learning Curves

Learning Curves

Validation Curves

Validation Curves

Cross-Fold Metrics Comparison

Metrics Comparison

Dataset Information

Property Value
Total Samples 101,766
Features 113
Development Set 86,501
Final Test Set 15,265

Training Configuration

Evaluation Pipeline

  • Final Holdout Split: Stratified split into development and test sets
  • Hyperparameter Search: Grid search with 5-fold cross-validation
  • Nested Early Stopping: Inner validation split within each fold
  • Final Evaluation: Untouched holdout test set

Best Hyperparameters

{
  "n_estimators": 150,
  "learning_rate": 0.05,
  "num_leaves": 31,
  "max_depth": -1,
  "subsample": 0.9,
  "colsample_bytree": 0.7,
  "reg_alpha": 0.0,
  "reg_lambda": 0.1
}

Training Details

  • Total Training Time: 215.04 minutes
  • Hyperparameter Search Time: 127.61 minutes
  • Cross-Validation Folds: 5
  • Early Stopping: Yes
  • Device: GPU

Usage

Loading the Model

import joblib
import pandas as pd

# Load the trained model
model = joblib.load('gradient_boosting_model.joblib')

# Load your preprocessed features
X_new = pd.read_csv('your_features.csv')

# Make predictions
predictions = model.predict(X_new)
probabilities = model.predict_proba(X_new)[:, 1]

Feature Requirements

The model expects preprocessed features from the UCI Diabetes 130-US Hospitals dataset. Features include:

  • Patient demographics (age, gender, race)
  • Admission details (admission type, source, length of stay)
  • Medical history (number of diagnoses, procedures)
  • Medication information
  • Lab results (A1c test results, glucose serum test)
  • Previous utilization (outpatient, inpatient, emergency visits)

See feature_importance.csv for complete feature list and importance scores.

Limitations and Biases

  • Domain-Specific: Model is trained specifically for diabetic patient readmissions
  • Dataset Bias: Training data from 130 US hospitals (1999-2008) may not generalize to all healthcare settings
  • Class Imbalance: Dataset may have imbalanced readmission rates
  • Temporal Drift: Healthcare practices have evolved since data collection
  • Geographic Limitation: US-based dataset may not apply to other healthcare systems

Ethical Considerations

This model is intended to assist healthcare providers in identifying patients at risk of readmission. It should:

  • NOT be used as the sole basis for treatment decisions
  • Be validated on your specific patient population before deployment
  • Be monitored for fairness across different demographic groups
  • Be regularly retrained with recent data to account for changing patterns

Citation

@misc{hospital-readmission-lgbm,
  author = {Your Name},
  title = {LightGBM Model for Hospital Readmission Prediction},
  year = {2025},
  url = {https://huggingface.co/your-repo}
}

Dataset Citation

@misc{strack2014impact,
  title={Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records},
  author={Strack, Beata and DeShazo, Jonathan P and Gennings, Chris and Olmo, Juan L and Ventura, Sebastian and Cios, Krzysztof J and Clore, John N},
  journal={BioMed Research International},
  volume={2014},
  year={2014},
  publisher={Hindawi}
}

License

This model is released under the MIT License. The underlying dataset has its own license terms.

Contact

For questions or issues, please open an issue in the repository.


Disclaimer: This model is for research and educational purposes. Always consult healthcare professionals for medical decisions.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support