MetaCLIP2 Image Classification Experiments
Collection
Domain-Specific Downstream Tasks
•
5 items
•
Updated
•
2
MetaCLIP-2-Cifar10 is an image classification vision–language encoder model fine-tuned from facebook/metaclip-2-worldwide-s16 for a single-label classification task. It is designed to identify and categorize images into the ten CIFAR-10 object classes using the MetaClip2ForImageClassification architecture.
MetaCLIP 2: A Worldwide Scaling Recipe : https://huggingface.co/papers/2507.22062
Classification report:
precision recall f1-score support
airplane 0.9813 0.9685 0.9748 2000
automobile 0.9777 0.9850 0.9813 2000
bird 0.9560 0.9560 0.9560 2000
cat 0.9104 0.9395 0.9247 2000
deer 0.9566 0.9580 0.9573 2000
dog 0.9476 0.9215 0.9343 2000
frog 0.9774 0.9735 0.9755 2000
horse 0.9704 0.9670 0.9687 2000
ship 0.9782 0.9890 0.9836 2000
truck 0.9774 0.9735 0.9755 2000
accuracy 0.9631 20000
macro avg 0.9633 0.9632 0.9632 20000
weighted avg 0.9633 0.9631 0.9632 20000
The model classifies images into the following categories:
!pip install -q transformers torch pillow gradio
import gradio as gr
from transformers import AutoImageProcessor
from transformers import AutoModelForImageClassification
from transformers.image_utils import load_image
from PIL import Image
import torch
# Load model and processor
model_name = "prithivMLmods/MetaCLIP-2-Cifar10"
model = AutoModelForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
def cifar10_classification(image):
"""Predicts the CIFAR-10 class represented in an image."""
image = Image.fromarray(image).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
labels = {
"0": "airplane",
"1": "automobile",
"2": "bird",
"3": "cat",
"4": "deer",
"5": "dog",
"6": "frog",
"7": "horse",
"8": "ship",
"9": "truck"
}
predictions = {labels[str(i)]: round(probs[i], 3) for i in range(len(probs))}
return predictions
# Create Gradio interface
iface = gr.Interface(
fn=cifar10_classification,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(label="Prediction Scores"),
title="CIFAR-10 Classification",
description="Upload an image to classify it into one of the CIFAR-10 categories."
)
# Launch the app
if __name__ == "__main__":
iface.launch()
The MetaCLIP-2-Cifar10 model is designed for object classification across the ten CIFAR-10 categories. Potential use cases include:
Base model
facebook/metaclip-2-worldwide-s16