krasserm
/

perceiver-io-mlm

+---
+license: apache-2.0
+datasets:
+- c4
+- wikipedia
+inference: false
+language:
+- en
+pipeline_tag: fill-mask
+---
+# Perceiver IO image classifier
+This model is a Perceiver IO model pretrained on ImageNet (14 million images, 1,000 classes). It is weight-equivalent
+to the [deepmind/vision-perceiver-fourier](https://huggingface.co/deepmind/vision-perceiver-fourier) model but based on
+implementation classes of the [perceiver-io](https://github.com/krasserm/perceiver-io) library. It can be created from
+the `deepmind/vision-perceiver-fourier` model with a library-specific [conversion utility](#model-conversion). Both
+models generate equal output for the same input.
+Content of the `deepmind/vision-perceiver-fourier` [model card](https://huggingface.co/deepmind/vision-perceiver-fourier)
+also applies to this model except [usage examples](#usage-examples). Refer to the linked card for further model and
+training details.
+<img src="http://images.cocodataset.org/val2017/000000507223.jpg" alt="sample image" width=200>
+## Model description
+The model is specif in Appendix A of the [Perceiver IO paper](https://arxiv.org/abs/2107.14795) (2D Fourier features).
+## Intended use and limitations
+The model can be used for image classification.
+## Usage examples
+To use this model you first need to [install](https://github.com/krasserm/perceiver-io/blob/main/README.md#installation)
+the `perceiver-io` library with extension `text`.
+```shell
+pip install perceiver-io[text]
+```
+Then the model can be used with PyTorch. Either use the model and image processor directly
+```python
+    import requests
+    from PIL import Image
+    from transformers import AutoModelForImageClassification, AutoImageProcessor
+    from perceiver.model.vision import image_classifier  # auto-class registration
+    repo_id = "krasserm/perceiver-io-img-clf"
+    # An image of a baseball player from MS-COCO validation set
+    url = "http://images.cocodataset.org/val2017/000000507223.jpg"
+    image = Image.open(requests.get(url, stream=True).raw)
+    model = AutoModelForImageClassification.from_pretrained(repo_id)
+    processor = AutoImageProcessor.from_pretrained(repo_id)
+    processed = processor(image, return_tensors="pt")
+    prediction = model(**processed).logits.argmax(dim=-1)
+    print(f"Predicted class = {model.config.id2label[prediction.item()]}")
+```
+```
+Predicted class = ballplayer, baseball player
+```
+or use an `image-classification` pipeline:
+```python
+    import requests
+    from PIL import Image
+    from transformers import pipeline
+    from perceiver.model.vision import image_classifier  # auto-class registration
+    repo_id = "krasserm/perceiver-io-img-clf"
+    # An image of a baseball player from MS-COCO validation set
+    url = "http://images.cocodataset.org/val2017/000000507223.jpg"
+    image = Image.open(requests.get(url, stream=True).raw)
+    classifier = pipeline("image-classification", model=repo_id)
+    prediction = classifier(image)
+    print(f"Predicted class = {prediction[0]['label']}")
+```
+```
+Predicted class = ballplayer, baseball player
+```
+## Model conversion
+The `krasserm/perceiver-io-img-clf` model has been created from the source `deepmind/vision-perceiver-fourier` model
+with:
+```python
+from perceiver.model.vision.image_classifier import convert_model
+convert_model(
+    save_dir="krasserm/perceiver-io-img-clf",
+    source_repo_id="deepmind/vision-perceiver-fourier",
+    push_to_hub=True,
+)
+```
+## Citation
+```bibtex
+@article{jaegle2021perceiver,
+  title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs},
+  author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others},
+  journal={arXiv preprint arXiv:2107.14795},
+  year={2021}
+}
+```