readme: add initial version
Browse files
    	
        README.md
    CHANGED
    
    | @@ -1,3 +1,57 @@ | |
| 1 | 
            -
            ---
         | 
| 2 | 
            -
            license: apache-2.0
         | 
| 3 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 1 | 
            +
            ---
         | 
| 2 | 
            +
            license: apache-2.0
         | 
| 3 | 
            +
            datasets:
         | 
| 4 | 
            +
            - HuggingFaceFW/fineweb
         | 
| 5 | 
            +
            - HuggingFaceFW/fineweb-edu
         | 
| 6 | 
            +
            language:
         | 
| 7 | 
            +
            - en
         | 
| 8 | 
            +
            tags:
         | 
| 9 | 
            +
            - fineweb-lms
         | 
| 10 | 
            +
            - bert
         | 
| 11 | 
            +
            - teams
         | 
| 12 | 
            +
            - electra
         | 
| 13 | 
            +
            ---
         | 
| 14 | 
            +
            # FineWeb-LMs: Training ELECTRA Augmented with Multi-word Selection (TEAMS)
         | 
| 15 | 
            +
             | 
| 16 | 
            +
            <p align="left">
         | 
| 17 | 
            +
              <picture>
         | 
| 18 | 
            +
                <img alt="BERT with TensorFlow Model Garden" src="https://github.com/stefan-it/model-garden-lms/raw/main/bert_tf_model_garden.png" style="max-width: 25%;">
         | 
| 19 | 
            +
              </picture>
         | 
| 20 | 
            +
              <br/>
         | 
| 21 | 
            +
            </p>
         | 
| 22 | 
            +
             | 
| 23 | 
            +
            This repository presents a TEAMS model that was pretrained on the 10BT subsets of [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) and [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu).
         | 
| 24 | 
            +
             | 
| 25 | 
            +
            # Pretraining Details
         | 
| 26 | 
            +
             | 
| 27 | 
            +
            The released TEAMS model is part of my [TensorFlow Model Garden LMs](https://github.com/stefan-it/model-garden-lms/tree/main) project.
         | 
| 28 | 
            +
             | 
| 29 | 
            +
            The pretraining was done on a v3-32 TPU VM Pod, provided by the amazing [TRC program](https://sites.research.google/trc/about/). Detailed cheatsheets are available:
         | 
| 30 | 
            +
             | 
| 31 | 
            +
            * [TPU VM Setup](https://github.com/stefan-it/model-garden-lms/tree/main/cheatsheet)
         | 
| 32 | 
            +
            * [Pretraining a TEAMS Model with TensorFlow Model Garden Library](https://github.com/stefan-it/model-garden-lms/tree/main/teams)
         | 
| 33 | 
            +
             | 
| 34 | 
            +
            tl;dr: The model was pretrained for 1M steps with a global batch size of 256, a sequence length of 512 using a vocab size of 64k.
         | 
| 35 | 
            +
             | 
| 36 | 
            +
            # Checkpoint Evaluation with ScandEval
         | 
| 37 | 
            +
             | 
| 38 | 
            +
            We evaluate the last 5 checkpoints (1M, 951k, 901k, 851k and 851k) with a recent version of ScandEval to check their performance and also compare it with popular encoder-only models such as BERT, RoBERTa or ELECTRA:
         | 
| 39 | 
            +
             | 
| 40 | 
            +
            | Model ID                                                                                                                                  |   Avg. Score | CoNLL-En                    | SST5                        | ScaLA-En                    | SQuAD                       |
         | 
| 41 | 
            +
            |-------------------------------------------------------------------------------------------------------------------------------------------|--------------|-----------------------------|-----------------------------|-----------------------------|-----------------------------|
         | 
| 42 | 
            +
            | [model-garden-lms/teams-base-finewebs-1m](https://huggingface.co/model-garden-lms/teams-base-finewebs-1m)                                 |    **72.64** | 89.27 ± 0.41 / 88.82 ± 0.41 | 59.58 ± 0.64 / 62.63 ± 3.0  | 66.72 ± 0.94 / 83.01 ± 0.45 | 59.95 ± 0.71 / 71.13 ± 0.58 |
         | 
| 43 | 
            +
            | [model-garden-lms/teams-base-finewebs-951k](https://huggingface.co/model-garden-lms/teams-base-finewebs-951k)                             |        72.06 | 89.64 ± 0.52 / 89.18 ± 0.42 | 60.31 ± 1.03 / 58.82 ± 2.79 | 65.85 ± 2.01 / 82.47 ± 1.23 | 59.36 ± 0.77 / 70.82 ± 0.62 |
         | 
| 44 | 
            +
            | [model-garden-lms/teams-base-finewebs-901k](https://huggingface.co/model-garden-lms/teams-base-finewebs-901k)                             |        72.19 | 89.31 ± 0.52 / 88.71 ± 0.53 | 59.86 ± 1.05 / 62.17 ± 2.61 | 64.89 ± 2.86 / 81.84 ± 1.65 | 59.74 ± 0.55 / 71.0 ± 0.5   |
         | 
| 45 | 
            +
            | [model-garden-lms/teams-base-finewebs-851k](https://huggingface.co/model-garden-lms/teams-base-finewebs-851k)                             |        71.41 | 89.48 ± 0.47 / 88.99 ± 0.52 | 59.17 ± 1.2 / 60.25 ± 3.25  | 63.01 ± 2.31 / 80.77 ± 1.38 | 59.13 ± 0.53 / 70.5 ± 0.49  |
         | 
| 46 | 
            +
            | [model-garden-lms/teams-base-finewebs-801k](https://huggingface.co/model-garden-lms/teams-base-finewebs-801k)                             |        70.73 | 89.2 ± 0.43 / 88.8 ± 0.46   | 59.21 ± 1.5 / 61.41 ± 2.36  | 58.47 ± 4.1 / 78.24 ± 2.4   | 59.59 ± 0.66 / 70.9 ± 0.59  |
         | 
| 47 | 
            +
            | [google-bert/bert-base-cased](https://huggingface.co/google-bert/bert-base-cased)                                                         |        62.26 | 87.39 ± 0.79 / 87.11 ± 0.66 | 54.49 ± 1.36 / 53.22 ± 1.15 | 52.08 ± 2.13 / 74.52 ± 1.31 | 38.63 ± 2.1 / 50.68 ± 1.87  |
         | 
| 48 | 
            +
            | [google/electra-base-discriminator](https://huggingface.co/google/electra-base-discriminator)                                             |        69.26 | 87.82 ± 0.69 / 86.83 ± 0.62 | 62.3 ± 1.12 / 55.93 ± 0.67  | 62.61 ± 1.21 / 80.85 ± 0.59 | 52.51 ± 0.86 / 65.2 ± 0.85  |
         | 
| 49 | 
            +
            | [FacebookAI/roberta-base](https://huggingface.co/FacebookAI/roberta-base)                                                                 |        68.96 | 90.35 ± 0.23 / 90.14 ± 0.2  | 60.95 ± 1.4 / 57.52 ± 1.97  | 50.64 ± 1.69 / 74.55 ± 0.9  | 57.82 ± 1.35 / 69.68 ± 1.02 |
         | 
| 50 | 
            +
             | 
| 51 | 
            +
            Our pretrained TEAMS model shows strong performance across all tasks. All detailed results can be found in [this](https://huggingface.co/datasets/model-garden-lms/finewebs-scandeval-results) dataset repository.
         | 
| 52 | 
            +
             | 
| 53 | 
            +
            # ❤️ Acknowledgements
         | 
| 54 | 
            +
             | 
| 55 | 
            +
            This repository is the outcome of the last two years of working with TPUs from the awesome [TRC program](https://sites.research.google/trc/about/) and the [TensorFlow Model Garden](https://github.com/tensorflow/models) library.
         | 
| 56 | 
            +
             | 
| 57 | 
            +
            Made from Bavarian Oberland with ❤️ and 🥨.
         | 

