Model Card for Model ID

Binary classifier trained with AutoGluon Tabular on the augmented split of ccm/2025-24679-tabular-dataset to predict “Do you usually listen to music alone or with others?”. We report metrics on a held-out test portion of the augmented split and perform external validation on the original split to highlight any synthetic→original performance gap. Artifacts include (1) a cloudpickled predictor and (2) a zipped native AutoGluon predictor directory for robust reuse.

Model Details

Model Description

  • Developed by: Fall 2025 24-679 (CMU) — instructor: Christopher McComb
  • Shared by: Christopher McComb
  • Model type: AutoML (AutoGluon Tabular ensemble; model family chosen via search)
  • Task: Tabular binary classification
  • Target column: Do you usually listen to music alone or with others?
  • License: MIT
  • Framework: autogluon.tabular
  • Repo artifacts:
    • autogluon_predictor.pkl (cloudpickled predictor)
    • autogluon_predictor_dir.zip (zipped native predictor directory)

Uses

Direct Use

  • Classroom demos of AutoML on tabular data
  • Baseline experiments for feature engineering and evaluation
  • Comparing in-split (synthetic) vs external (original) performance

Out-of-Scope Use

  • Production deployment or decision-making with real human subjects
  • Generalization beyond course context and small cohort data

Bias, Risks, and Limitations

  • Synthetic data bias: Training primarily on augmented data may inflate in-split scores.
  • Small/Skewed cohort: Original split is small and classroom-specific; not representative.
  • Label/feature noise: Self-reported survey data can introduce noise.

Recommendations

  • Treat results as didactic; report both augmented-test and original-external metrics.
  • Prefer calibration checks and confidence reporting when demonstrating.
  • When comparing models, keep the same split and random seed.

How to Get Started with the Model

Use the code below to get started with the model.

import pathlib, shutil, zipfile
import huggingface_hub as hf
from autogluon.tabular import TabularPredictor

REPO = "ccm/2024-24679-tabular-autogluon-predictor"
ZIPNAME = "autogluon_predictor_dir.zip"

dest = pathlib.Path("hf_download")
dest.mkdir(exist_ok=True)

# Download zipped predictor directory
zip_path = hf.hf_hub_download(
    repo_id=REPO,
    filename=ZIPNAME,
    repo_type="model",
    local_dir=str(dest),
    local_dir_use_symlinks=False,
)

# Extract to folder
extract_dir = dest / "predictor_dir"
if extract_dir.exists():
    shutil.rmtree(extract_dir)
extract_dir.mkdir(parents=True, exist_ok=True)

with zipfile.ZipFile(zip_path, "r") as zf:
    zf.extractall(str(extract_dir))

# Load predictor from native directory
predictor = TabularPredictor.load(str(extract_dir))

# Example: predictions
preds = predictor.predict(X)

Training Details

Training Data

  • Dataset: ccm/2025-24679-tabular-dataset
  • Splits:
    • Train/Test: 80/20 stratified split on the augmented split (random_state=42)
    • External Validation: Entire original split (not used in training)
  • Target column: Do you usually listen to music alone or with others?

Training Procedure

  • Library: AutoGluon Tabular
  • Presets: "best_quality" (ensembles across boosted trees, kNN, neural nets, etc.)
  • Training time limit: 300 seconds
  • Evaluation metric (internal): AutoGluon default for classification (log_loss / accuracy depending on model family)

Hyperparameters

  • time_limit: 300s
  • presets: best_quality
  • random_state: 42
  • problem_type: inferred automatically by AutoGluon
  • eval_metric: None (AutoGluon default)

Evaluation

Testing Data

  • In-split test: Held-out 20% of augmented split
  • External validation: Full original split

Metrics

  • Accuracy: Fraction of correctly predicted labels
  • F1 (weighted): Harmonic mean of precision and recall, weighted by class support

Results (replace with actuals)

  • Augmented test: Accuracy = 0.8148, Weighted F1 = 0.8126
  • Original external: Accuracy = 0.9167, Weighted F1 = 0.9198

Environmental Impact

Training was lightweight and classroom-focused:

  • Hardware: CPU laptop or standard VM (no GPU required)
  • Training wall-time: ≤ 5 minutes
  • Estimated emissions: negligible
  • Cloud provider: N/A (varies by user)

For precise estimates, see ML CO₂ calculator.


Model Card Contact

Christopher McComb — [email protected]

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train ccm/2025-24679-tabular-autolguon-predictor

Collection including ccm/2025-24679-tabular-autolguon-predictor