--- license: mit language: - en pipeline_tag: image-classification tags: - images datasets: - ccm/2025-24679-image-dataset metrics: - accuracy - f1 library_name: autogluon --- # Model Card for Image AutoML Predictor Binary/multiclass image classifier trained with **AutoGluon MultiModal** on the *augmented* split of `ccm/2025-24679-image-dataset` to predict survey-derived image labels. Metrics are reported on a held-out test portion of the augmented split and evaluated via **external validation** on the **original** split. Artifacts include **(1) a zipped native AutoGluon predictor directory** (recommended) and **(2) a cloudpickled predictor** (for convenience). ## Model Details ### Model Description - **Developed by:** Fall 2025 24-679 (CMU) — instructor: Christopher McComb - **Shared by:** Christopher McComb - **Model type:** AutoML (AutoGluon MultiModalPredictor with ResNet18 backbone) - **Task:** Image classification - **Target column:** `label` - **License:** MIT - **Framework:** `autogluon.multimodal` - **Repo artifacts:** - `autogluon_image_predictor_dir.zip` (zipped native predictor directory) - `autogluon_image_predictor.pkl` (cloudpickled predictor) ## Uses ### Direct Use - Classroom demos of AutoML for image classification - Baseline experiments for augmentation vs. generalization - Comparing **augmented** vs **original** split performance ### Out-of-Scope Use - Production deployment with sensitive/real-world decision stakes - Generalization beyond course context or survey-specific images ## Bias, Risks, and Limitations - **Synthetic data inflation:** Augmented data may artificially boost in-split accuracy. - **Limited representativeness:** Original dataset is small, student-generated, not diverse. - **Label noise:** Survey/image associations may be noisy or inconsistent. ### Recommendations - Always report both **augmented-test** and **original-validation** metrics. - Emphasize didactic use cases (education, experimentation). - Use consistent random seeds and splits for reproducibility. ## How to Get Started with the Model ```python import pathlib, shutil, zipfile import huggingface_hub as hf from autogluon.multimodal import MultiModalPredictor REPO = "ccm/2025-24679-image-autogluon-predictor" ZIPNAME = "autogluon_image_predictor_dir.zip" dest = pathlib.Path("hf_download") dest.mkdir(exist_ok=True) # Download predictor zip zip_path = hf.hf_hub_download( repo_id=REPO, filename=ZIPNAME, repo_type="model", local_dir=str(dest), local_dir_use_symlinks=False, ) # Extract extract_dir = dest / "predictor_dir" if extract_dir.exists(): shutil.rmtree(extract_dir) extract_dir.mkdir(parents=True, exist_ok=True) with zipfile.ZipFile(zip_path, "r") as zf: zf.extractall(str(extract_dir)) # Load predictor predictor = MultiModalPredictor.load(str(extract_dir)) # Example inference preds = predictor.predict(test_df[["image"]]) ``` ## Training Details ### Training Data - **Dataset:** ccm/2025-24679-image-dataset - **Splits:** - Augmented: 80/20 train/test with stratification (random_state=42) - Validation: 20% of train used as val split - External validation: Entire original split (unused in training) ### Training Procedure - **Library:** AutoGluon MultiModal - **Presets:** "medium_quality" - **Backbone:** timm_image → resnet18 - **Training time limit:** default (few minutes) - **Eval metric:** Accuracy ### Hyperparameters - **model.names:** timm_image - **checkpoint:** resnet18 - **presets:** medium_quality - **random_state:** 42 ## Evaluation ### Testing Data - **Augmented test:** Held-out 20% of augmented split - **External validation:** Entire original split ### Metrics - **Accuracy:** % correct predictions - **Weighted F1:** Harmonic mean of precision/recall, weighted by support ### Results (example — replace with actuals) - **Augmented test:** Accuracy = 0.7429, Weighted F1 = 0.7392 - **Original validation:** Accuracy = 0.8621, Weighted F1 = 0.8620 ## Environmental Impact - **Hardware:** Single GPU (short run) - **Training wall-time:** < 10 minutes - **Estimated emissions:** negligible - **Cloud provider:** N/A (depends on student setup) See [ML CO₂ calculator](https://mlco2.github.io/impact#compute) for custom estimates. ## Model Card Contact Christopher McComb — ccm@cmu.edu