Hotel Booking Cancellation Predictor
Predicts probability that a hotel booking will be cancelled (Sri Lankan hospitality context). The champion model is XGBoost; threshold based decisions currently use 0.35000000000000003 (see champion_meta.json).
Last updated: 2025-10-05 16:43 UTC
Key Metrics (Holdout)
| Metric | Value |
|---|---|
| F1 | 0.8046506137865911 |
| ROC-AUC | 0.9384035807110922 |
| Precision | 0.841708852944808 |
| Recall | 0.7707179197286602 |
| Accuracy | 0.8613786749308987 |
Top Features (SHAP importance)
- deposit_type
- country__te
- market_segment
- total_of_special_requests
- lead_time
- required_car_parking_spaces
- assigned_room_type
- customer_type_target_encoded
- reserved_room_type
- previous_cancellations
Quickstart
from huggingface_hub import snapshot_download
import joblib, json, pandas as pd
local_dir = snapshot_download(repo_id="j2damax/hotel-cancel-model")
model = joblib.load(f"{local_dir}/champion_model.pkl")
preprocessor = joblib.load(f"{local_dir}/preprocessor.pkl")
meta = json.load(open(f"{local_dir}/champion_meta.json"))
sample = pd.DataFrame([{
'lead_time': 45, 'arrival_month': 7, 'adults': 2, 'children': 0, 'adr': 110.0
}])
X = preprocessor.transform(sample)
proba = float(model.predict_proba(X)[:,1][0])
print('Cancellation probability:', round(proba, 4))
Files
champion_model.pklβ serialized champion estimatorpreprocessor.pklβ unified preprocessing / feature pipelinechampion_meta.jsonβ metrics & threshold- Optional SHAP / feature importance JSON artifacts
Notes
Model trained with stratified 5-fold CV; primary selection metric: F1; tie-breaker: ROC-AUC. Class imbalance handled via class weights.
Citation
Academic coursework (NIB 7072) β Sri Lankan tourism cancellation risk analysis.
Space using j2damax/hotel-cancel-model 1
Evaluation results
- f1self-reported0.805
- roc_aucself-reported0.938
- precisionself-reported0.842
- recallself-reported0.771
- accuracyself-reported0.861