--- license: apache-2.0 inference: false datasets: - c4 - wikipedia language: - en pipeline_tag: fill-mask --- # Perceiver IO masked language model This model is a Perceiver IO model pretrained on the masked language modeling (MLM) task using a text corpus created from [C4](https://huggingface.co/datasets/c4) and [English Wikipedia](https://huggingface.co/datasets/wikipedia). It is weight-equivalent to the [deepmind/language-perceiver](https://huggingface.co/deepmind/language-perceiver) model but based on implementation classes of the [perceiver-io](https://github.com/krasserm/perceiver-io) library. It can be created from the `deepmind/language-perceiver` model with a library-specific [conversion utility](#model-conversion). Both models generate equal output for the same input. Content of the `deepmind/language-perceiver` [model card](https://huggingface.co/deepmind/language-perceiver) also applies to this model except [usage examples](#usage-examples). Refer to the linked card for further model and training details. ## Model description The model is specified in Section 4 (Table 1) and Appendix F (Table 11) of the [Perceiver IO paper](https://arxiv.org/abs/2107.14795) (UTF-8 bytes tokenization, vocabulary size of 262, 201M parameters). ## Intended use Although the raw model can be [used directly](#usage-examples) for masked language modeling, the main use case is fine-tuning. This can be fine-tuning with masked language modeling and whole word masking on an unlabeled dataset ([example](https://huggingface.co/krasserm/perceiver-io-mlm-imdb)) or fine-tuning on a labeled dataset using the pretrained encoder of this model ([example](https://huggingface.co/krasserm/perceiver-io-txt-clf-imdb)) for weight initialization. ## Usage examples To use this model you first need to [install](https://github.com/krasserm/perceiver-io/blob/main/README.md#installation) the `perceiver-io` library with extension `text`. ```shell pip install perceiver-io[text] ``` Then the model can be used with PyTorch. Either use the model and tokenizer directly ```python from transformers import AutoModelForMaskedLM, AutoTokenizer from perceiver.model.text import mlm # auto-class registration repo_id = "krasserm/perceiver-io-mlm" model = AutoModelForMaskedLM.from_pretrained(repo_id) tokenizer = AutoTokenizer.from_pretrained(repo_id) masked_text = "This is an incomplete sentence where some words are" \ "[MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK]" encoding = tokenizer(masked_text, return_tensors="pt") outputs = model(**encoding) # get predictions for 9 [MASK] tokens (exclude [SEP] token at the end) masked_token_predictions = outputs.logits[0, -10:-1].argmax(dim=-1) print(tokenizer.decode(masked_token_predictions)) ``` ``` missing. ``` or use a `fill-mask` pipeline: ```python from transformers import pipeline from perceiver.model.text import mlm # auto-class registration repo_id = "krasserm/perceiver-io-mlm" masked_text = "This is an incomplete sentence where some words are" \ "[MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK][MASK]" filler_pipeline = pipeline("fill-mask", model=repo_id) masked_token_predictions = filler_pipeline(masked_text) print("".join([pred[0]["token_str"] for pred in masked_token_predictions])) ``` ``` missing. ``` ## Model conversion The `krasserm/perceiver-io-mlm` model has been created from the source `deepmind/language-perceiver` model with: ```python from perceiver.model.text.mlm import convert_model convert_model( save_dir="krasserm/perceiver-io-mlm", source_repo_id="deepmind/language-perceiver", push_to_hub=True, ) ``` ## Citation ```bibtex @article{jaegle2021perceiver, title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs}, author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others}, journal={arXiv preprint arXiv:2107.14795}, year={2021} } ```