---
license: mit
language:
- en
pipeline_tag: image-classification
tags:
- images
datasets:
- ccm/2025-24679-image-dataset
metrics:
- accuracy
- f1
library_name: autogluon
---

# Model Card for Image AutoML Predictor

Binary/multiclass image classifier trained with **AutoGluon MultiModal** on the *augmented* split of `ccm/2025-24679-image-dataset` to predict survey-derived image labels. Metrics are reported on a held-out test portion of the augmented split and evaluated via **external validation** on the **original** split. Artifacts include **(1) a zipped native AutoGluon predictor directory** (recommended) and **(2) a cloudpickled predictor** (for convenience).

## Model Details

### Model Description

- **Developed by:** Fall 2025 24-679 (CMU) — instructor: Christopher McComb  
- **Shared by:** Christopher McComb  
- **Model type:** AutoML (AutoGluon MultiModalPredictor with ResNet18 backbone)  
- **Task:** Image classification  
- **Target column:** `label`  
- **License:** MIT  
- **Framework:** `autogluon.multimodal`  
- **Repo artifacts:**  
  - `autogluon_image_predictor_dir.zip` (zipped native predictor directory)  
  - `autogluon_image_predictor.pkl` (cloudpickled predictor)  

## Uses

### Direct Use
- Classroom demos of AutoML for image classification  
- Baseline experiments for augmentation vs. generalization  
- Comparing **augmented** vs **original** split performance  

### Out-of-Scope Use
- Production deployment with sensitive/real-world decision stakes  
- Generalization beyond course context or survey-specific images  

## Bias, Risks, and Limitations
- **Synthetic data inflation:** Augmented data may artificially boost in-split accuracy.  
- **Limited representativeness:** Original dataset is small, student-generated, not diverse.  
- **Label noise:** Survey/image associations may be noisy or inconsistent.  

### Recommendations
- Always report both **augmented-test** and **original-validation** metrics.  
- Emphasize didactic use cases (education, experimentation).  
- Use consistent random seeds and splits for reproducibility.  

## How to Get Started with the Model

```python
import pathlib, shutil, zipfile
import huggingface_hub as hf
from autogluon.multimodal import MultiModalPredictor

REPO = "ccm/2025-24679-image-autogluon-predictor"
ZIPNAME = "autogluon_image_predictor_dir.zip"

dest = pathlib.Path("hf_download")
dest.mkdir(exist_ok=True)

# Download predictor zip
zip_path = hf.hf_hub_download(
    repo_id=REPO,
    filename=ZIPNAME,
    repo_type="model",
    local_dir=str(dest),
    local_dir_use_symlinks=False,
)

# Extract
extract_dir = dest / "predictor_dir"
if extract_dir.exists():
    shutil.rmtree(extract_dir)
extract_dir.mkdir(parents=True, exist_ok=True)

with zipfile.ZipFile(zip_path, "r") as zf:
    zf.extractall(str(extract_dir))

# Load predictor
predictor = MultiModalPredictor.load(str(extract_dir))

# Example inference
preds = predictor.predict(test_df[["image"]])
```

## Training Details

### Training Data
- **Dataset:** ccm/2025-24679-image-dataset  
- **Splits:**  
  - Augmented: 80/20 train/test with stratification (random_state=42)  
  - Validation: 20% of train used as val split  
  - External validation: Entire original split (unused in training)  

### Training Procedure
- **Library:** AutoGluon MultiModal  
- **Presets:** "medium_quality"  
- **Backbone:** timm_image → resnet18  
- **Training time limit:** default (few minutes)  
- **Eval metric:** Accuracy  

### Hyperparameters
- **model.names:** timm_image  
- **checkpoint:** resnet18  
- **presets:** medium_quality  
- **random_state:** 42  


## Evaluation

### Testing Data
- **Augmented test:** Held-out 20% of augmented split  
- **External validation:** Entire original split  

### Metrics
- **Accuracy:** % correct predictions  
- **Weighted F1:** Harmonic mean of precision/recall, weighted by support  

### Results (example — replace with actuals)
- **Augmented test:** Accuracy = 0.7429, Weighted F1 = 0.7392  
- **Original validation:** Accuracy = 0.8621, Weighted F1 = 0.8620  

## Environmental Impact
- **Hardware:** Single GPU (short run)  
- **Training wall-time:** < 10 minutes  
- **Estimated emissions:** negligible  
- **Cloud provider:** N/A (depends on student setup)  

See [ML CO₂ calculator](https://mlco2.github.io/impact#compute) for custom estimates.  

## Model Card Contact
Christopher McComb — ccm@cmu.edu