Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,67 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
datasets:
|
| 4 |
+
- bookcorpus
|
| 5 |
+
- wikipedia
|
| 6 |
+
language:
|
| 7 |
+
- en
|
| 8 |
---
|
| 9 |
+
|
| 10 |
+
# BERT L12-H512 (uncased)
|
| 11 |
+
|
| 12 |
+
Mini BERT models from https://arxiv.org/abs/1908.08962 that the HF team didn't convert. The original [conversion script](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/convert_bert_original_tf_checkpoint_to_pytorch.py) is used.
|
| 13 |
+
|
| 14 |
+
See the original Google repo: [google-research/bert](https://github.com/google-research/bert)
|
| 15 |
+
|
| 16 |
+
Note: it's not clear if these checkpoints have undergone knowledge distillation.
|
| 17 |
+
|
| 18 |
+
## Model variants
|
| 19 |
+
|
| 20 |
+
| |H=128|H=256|H=512|H=768|
|
| 21 |
+
|---|:---:|:---:|:---:|:---:|
|
| 22 |
+
| **L=2** |[2/128 (BERT-Tiny)][2_128]|[2/256][2_256]|[2/512][2_512]|[2/768][2_768]|
|
| 23 |
+
| **L=4** |[4/128][4_128]|[4/256 (BERT-Mini)][4_256]|[4/512 (BERT-Small)][4_512]|[4/768][4_768]|
|
| 24 |
+
| **L=6** |[6/128][6_128]|[6/256][6_256]|[6/512][6_512]|[6/768][6_768]|
|
| 25 |
+
| **L=8** |[8/128][8_128]|[8/256][8_256]|[8/512 (BERT-Medium)][8_512]|[8/768][8_768]|
|
| 26 |
+
| **L=10** |[10/128][10_128]|[10/256][10_256]|[10/512][10_512]|[10/768][10_768]|
|
| 27 |
+
| **L=12** |[12/128][12_128]|[12/256][12_256]|[**12/512**][12_512]|[12/768 (BERT-Base, original)][12_768]|
|
| 28 |
+
|
| 29 |
+
[2_128]: https://huggingface.co/gaunernst/bert-tiny-uncased
|
| 30 |
+
[2_256]: https://huggingface.co/gaunernst/bert-L2-H256-uncased
|
| 31 |
+
[2_512]: https://huggingface.co/gaunernst/bert-L2-H512-uncased
|
| 32 |
+
[2_768]: https://huggingface.co/gaunernst/bert-L2-H768-uncased
|
| 33 |
+
[4_128]: https://huggingface.co/gaunernst/bert-L4-H128-uncased
|
| 34 |
+
[4_256]: https://huggingface.co/gaunernst/bert-mini-uncased
|
| 35 |
+
[4_512]: https://huggingface.co/gaunernst/bert-small-uncased
|
| 36 |
+
[4_768]: https://huggingface.co/gaunernst/bert-L4-H768-uncased
|
| 37 |
+
[6_128]: https://huggingface.co/gaunernst/bert-L6-H128-uncased
|
| 38 |
+
[6_256]: https://huggingface.co/gaunernst/bert-L6-H256-uncased
|
| 39 |
+
[6_512]: https://huggingface.co/gaunernst/bert-L6-H512-uncased
|
| 40 |
+
[6_768]: https://huggingface.co/gaunernst/bert-L6-H768-uncased
|
| 41 |
+
[8_128]: https://huggingface.co/gaunernst/bert-L8-H128-uncased
|
| 42 |
+
[8_256]: https://huggingface.co/gaunernst/bert-L8-H256-uncased
|
| 43 |
+
[8_512]: https://huggingface.co/gaunernst/bert-medium-uncased
|
| 44 |
+
[8_768]: https://huggingface.co/gaunernst/bert-L8-H768-uncased
|
| 45 |
+
[10_128]: https://huggingface.co/gaunernst/bert-L10-H128-uncased
|
| 46 |
+
[10_256]: https://huggingface.co/gaunernst/bert-L10-H256-uncased
|
| 47 |
+
[10_512]: https://huggingface.co/gaunernst/bert-L10-H512-uncased
|
| 48 |
+
[10_768]: https://huggingface.co/gaunernst/bert-L10-H768-uncased
|
| 49 |
+
[12_128]: https://huggingface.co/gaunernst/bert-L12-H128-uncased
|
| 50 |
+
[12_256]: https://huggingface.co/gaunernst/bert-L12-H256-uncased
|
| 51 |
+
[12_512]: https://huggingface.co/gaunernst/bert-L12-H512-uncased
|
| 52 |
+
[12_768]: https://huggingface.co/bert-base-uncased
|
| 53 |
+
|
| 54 |
+
## Usage
|
| 55 |
+
|
| 56 |
+
See other BERT model cards e.g. https://huggingface.co/bert-base-uncased
|
| 57 |
+
|
| 58 |
+
## Citation
|
| 59 |
+
|
| 60 |
+
```bibtex
|
| 61 |
+
@article{turc2019,
|
| 62 |
+
title={Well-Read Students Learn Better: On the Importance of Pre-training Compact Models},
|
| 63 |
+
author={Turc, Iulia and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
|
| 64 |
+
journal={arXiv preprint arXiv:1908.08962v2 },
|
| 65 |
+
year={2019}
|
| 66 |
+
}
|
| 67 |
+
```
|