Model Card for Model ID

Binary classifier trained with AutoGluon Tabular on the augmented split of ccm/2025-24679-tabular-dataset to predict “Do you usually listen to music alone or with others?”. We report metrics on a held-out test portion of the augmented split and perform external validation on the original split to highlight any synthetic→original performance gap. Artifacts include (1) a cloudpickled predictor and (2) a zipped native AutoGluon predictor directory for robust reuse.

Model Details

Model Description

Developed by: Fall 2025 24-679 (CMU) — instructor: Christopher McComb
Shared by: Christopher McComb
Model type: AutoML (AutoGluon Tabular ensemble; model family chosen via search)
Task: Tabular binary classification
Target column: Do you usually listen to music alone or with others?
License: MIT
Framework: autogluon.tabular
Repo artifacts:
- autogluon_predictor.pkl (cloudpickled predictor)
- autogluon_predictor_dir.zip (zipped native predictor directory)

Uses

Direct Use

Classroom demos of AutoML on tabular data
Baseline experiments for feature engineering and evaluation
Comparing in-split (synthetic) vs external (original) performance

Out-of-Scope Use

Production deployment or decision-making with real human subjects
Generalization beyond course context and small cohort data

Bias, Risks, and Limitations

Synthetic data bias: Training primarily on augmented data may inflate in-split scores.
Small/Skewed cohort: Original split is small and classroom-specific; not representative.
Label/feature noise: Self-reported survey data can introduce noise.

Recommendations

Treat results as didactic; report both augmented-test and original-external metrics.
Prefer calibration checks and confidence reporting when demonstrating.
When comparing models, keep the same split and random seed.

How to Get Started with the Model

Use the code below to get started with the model.

import pathlib, shutil, zipfile
import huggingface_hub as hf
from autogluon.tabular import TabularPredictor

REPO = "ccm/2024-24679-tabular-autogluon-predictor"
ZIPNAME = "autogluon_predictor_dir.zip"

dest = pathlib.Path("hf_download")
dest.mkdir(exist_ok=True)

# Download zipped predictor directory
zip_path = hf.hf_hub_download(
    repo_id=REPO,
    filename=ZIPNAME,
    repo_type="model",
    local_dir=str(dest),
    local_dir_use_symlinks=False,
)

# Extract to folder
extract_dir = dest / "predictor_dir"
if extract_dir.exists():
    shutil.rmtree(extract_dir)
extract_dir.mkdir(parents=True, exist_ok=True)

with zipfile.ZipFile(zip_path, "r") as zf:
    zf.extractall(str(extract_dir))

# Load predictor from native directory
predictor = TabularPredictor.load(str(extract_dir))

# Example: predictions
preds = predictor.predict(X)

Training Details

Training Data

Dataset: ccm/2025-24679-tabular-dataset
Splits:
- Train/Test: 80/20 stratified split on the augmented split (random_state=42)
- External Validation: Entire original split (not used in training)
Target column: Do you usually listen to music alone or with others?

Training Procedure

Library: AutoGluon Tabular
Presets: "best_quality" (ensembles across boosted trees, kNN, neural nets, etc.)
Training time limit: 300 seconds
Evaluation metric (internal): AutoGluon default for classification (log_loss / accuracy depending on model family)

Hyperparameters

time_limit: 300s
presets: best_quality
random_state: 42
problem_type: inferred automatically by AutoGluon
eval_metric: None (AutoGluon default)

Evaluation

Testing Data

In-split test: Held-out 20% of augmented split
External validation: Full original split

Metrics

Accuracy: Fraction of correctly predicted labels
F1 (weighted): Harmonic mean of precision and recall, weighted by class support

Results (replace with actuals)

Augmented test: Accuracy = 0.8148, Weighted F1 = 0.8126
Original external: Accuracy = 0.9167, Weighted F1 = 0.9198

Environmental Impact

Training was lightweight and classroom-focused:

Hardware: CPU laptop or standard VM (no GPU required)
Training wall-time: ≤ 5 minutes
Estimated emissions: negligible
Cloud provider: N/A (varies by user)

For precise estimates, see ML CO₂ calculator.

Model Card Contact

Christopher McComb — [email protected]

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train ccm/2025-24679-tabular-autolguon-predictor

Collection including ccm/2025-24679-tabular-autolguon-predictor

24-679 Demos for Fall 2025

Collection

Demos prepared for 24-679 in 2025 • 11 items • Updated Sep 24