Model Card for Model ID
Binary classifier trained with AutoGluon Tabular on the augmented split of ccm/2025-24679-tabular-dataset to predict
“Do you usually listen to music alone or with others?”. We report metrics on a held-out test portion of the augmented split
and perform external validation on the original split to highlight any synthetic→original performance gap. Artifacts include (1) a cloudpickled predictor and (2) a zipped native AutoGluon predictor directory for robust reuse.
Model Details
Model Description
- Developed by: Fall 2025 24-679 (CMU) — instructor: Christopher McComb
- Shared by: Christopher McComb
- Model type: AutoML (AutoGluon Tabular ensemble; model family chosen via search)
- Task: Tabular binary classification
- Target column:
Do you usually listen to music alone or with others?
- License: MIT
- Framework:
autogluon.tabular
- Repo artifacts:
autogluon_predictor.pkl (cloudpickled predictor)
autogluon_predictor_dir.zip (zipped native predictor directory)
Uses
Direct Use
- Classroom demos of AutoML on tabular data
- Baseline experiments for feature engineering and evaluation
- Comparing in-split (synthetic) vs external (original) performance
Out-of-Scope Use
- Production deployment or decision-making with real human subjects
- Generalization beyond course context and small cohort data
Bias, Risks, and Limitations
- Synthetic data bias: Training primarily on augmented data may inflate in-split scores.
- Small/Skewed cohort: Original split is small and classroom-specific; not representative.
- Label/feature noise: Self-reported survey data can introduce noise.
Recommendations
- Treat results as didactic; report both augmented-test and original-external metrics.
- Prefer calibration checks and confidence reporting when demonstrating.
- When comparing models, keep the same split and random seed.
How to Get Started with the Model
Use the code below to get started with the model.
import pathlib, shutil, zipfile
import huggingface_hub as hf
from autogluon.tabular import TabularPredictor
REPO = "ccm/2024-24679-tabular-autogluon-predictor"
ZIPNAME = "autogluon_predictor_dir.zip"
dest = pathlib.Path("hf_download")
dest.mkdir(exist_ok=True)
# Download zipped predictor directory
zip_path = hf.hf_hub_download(
repo_id=REPO,
filename=ZIPNAME,
repo_type="model",
local_dir=str(dest),
local_dir_use_symlinks=False,
)
# Extract to folder
extract_dir = dest / "predictor_dir"
if extract_dir.exists():
shutil.rmtree(extract_dir)
extract_dir.mkdir(parents=True, exist_ok=True)
with zipfile.ZipFile(zip_path, "r") as zf:
zf.extractall(str(extract_dir))
# Load predictor from native directory
predictor = TabularPredictor.load(str(extract_dir))
# Example: predictions
preds = predictor.predict(X)
Training Details
Training Data
- Dataset: ccm/2025-24679-tabular-dataset
- Splits:
- Train/Test: 80/20 stratified split on the augmented split (random_state=42)
- External Validation: Entire original split (not used in training)
- Target column:
Do you usually listen to music alone or with others?
Training Procedure
- Library: AutoGluon Tabular
- Presets:
"best_quality" (ensembles across boosted trees, kNN, neural nets, etc.)
- Training time limit: 300 seconds
- Evaluation metric (internal): AutoGluon default for classification (log_loss / accuracy depending on model family)
Hyperparameters
- time_limit: 300s
- presets: best_quality
- random_state: 42
- problem_type: inferred automatically by AutoGluon
- eval_metric: None (AutoGluon default)
Evaluation
Testing Data
- In-split test: Held-out 20% of augmented split
- External validation: Full original split
Metrics
- Accuracy: Fraction of correctly predicted labels
- F1 (weighted): Harmonic mean of precision and recall, weighted by class support
Results (replace with actuals)
- Augmented test: Accuracy =
0.8148, Weighted F1 = 0.8126
- Original external: Accuracy =
0.9167, Weighted F1 = 0.9198
Environmental Impact
Training was lightweight and classroom-focused:
- Hardware: CPU laptop or standard VM (no GPU required)
- Training wall-time: ≤ 5 minutes
- Estimated emissions: negligible
- Cloud provider: N/A (varies by user)
For precise estimates, see ML CO₂ calculator.
Model Card Contact
Christopher McComb — [email protected]