---
license: apache-2.0
inference: false
datasets:
- c4
- wikipedia
language:
- en
pipeline_tag: fill-mask
---

# Perceiver IO masked language model

This model is a Perceiver IO model pretrained on the masked language modeling (MLM) task using a text corpus created
from [C4](https://huggingface.co/datasets/c4) and [English Wikipedia](https://huggingface.co/datasets/wikipedia). It 
is weight-equivalent to the [deepmind/language-perceiver](https://huggingface.co/deepmind/language-perceiver) model 
but based on implementation classes of the [perceiver-io](https://github.com/krasserm/perceiver-io) library. It can 
be created from the `deepmind/language-perceiver` model with a library-specific [conversion utility](#model-conversion). 
Both models generate equal output for the same input. 

Content of the `deepmind/language-perceiver` [model card](https://huggingface.co/deepmind/language-perceiver)
also applies to this model except [usage examples](#usage-examples). Refer to the linked card for further model and
training details.

## Model description

The model is specified in Section 4 (Table 1) and Appendix F (Table 11) of the [Perceiver IO paper](https://arxiv.org/abs/2107.14795)
(UTF-8 bytes tokenization, vocabulary size of 262, 201M parameters).

## Intended use

Although the raw model can be [used directly](#usage-examples) for masked language modeling, the main use case is
fine-tuning. This can be fine-tuning with masked language modeling and whole word masking on an unlabeled dataset
([example](https://huggingface.co/krasserm/perceiver-io-mlm-imdb)) or fine-tuning on a labeled dataset using the 
pretrained encoder of this model ([example](https://huggingface.co/krasserm/perceiver-io-txt-clf-imdb)) for weight
initialization. 

## Usage examples

To use this model you first need to [install](https://github.com/krasserm/perceiver-io/blob/main/README.md#installation) 
the `perceiver-io` library with extension `text`.

```shell
pip install perceiver-io[text]
```

Then the model can be used with PyTorch. Either use the model and tokenizer directly

```python
from transformers import AutoModelForMaskedLM, AutoTokenizer
from perceiver.model.text import mlm  # auto-class registration

repo_id = "krasserm/perceiver-io-mlm"

model = AutoModelForMaskedLM.from_pretrained(repo_id)
tokenizer = AutoTokenizer.from_pretrained(repo_id)

masked_text = "This is an incomplete sentence where some words are" \
              "[MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK]"

encoding = tokenizer(masked_text, return_tensors="pt")
outputs = model(**encoding)

# get predictions for 9 [MASK] tokens (exclude [SEP] token at the end)
masked_token_predictions = outputs.logits[0, -10:-1].argmax(dim=-1)
print(tokenizer.decode(masked_token_predictions))
```
```
 missing.
```

or use a `fill-mask` pipeline:

```python
from transformers import pipeline
from perceiver.model.text import mlm  # auto-class registration

repo_id = "krasserm/perceiver-io-mlm"

masked_text = "This is an incomplete sentence where some words are" \
              "[MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK]"

filler_pipeline = pipeline("fill-mask", model=repo_id)
masked_token_predictions = filler_pipeline(masked_text)
print("".join([pred[0]["token_str"] for pred in masked_token_predictions]))
```
```
 missing.
```

## Model conversion

The `krasserm/perceiver-io-mlm` model has been created from the source `deepmind/language-perceiver` model with: 

```python
from perceiver.model.text.mlm import convert_model

convert_model(
    save_dir="krasserm/perceiver-io-mlm",
    source_repo_id="deepmind/language-perceiver",
    push_to_hub=True,
)
```

## Citation

```bibtex
@article{jaegle2021perceiver,
  title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs},
  author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others},
  journal={arXiv preprint arXiv:2107.14795},
  year={2021}
}
```