Add Swiss German adapter
Browse files- README.md +22 -2
- config.json +3 -2
- pytorch_model.bin +2 -2
README.md
CHANGED
|
@@ -5,6 +5,7 @@ language:
|
|
| 5 |
- fr
|
| 6 |
- it
|
| 7 |
- rm
|
|
|
|
| 8 |
- multilingual
|
| 9 |
inference: false
|
| 10 |
---
|
|
@@ -19,6 +20,9 @@ In addition, we used a Switzerland-specific subword vocabulary.
|
|
| 19 |
|
| 20 |
The pre-training code and usage examples are available [here](https://github.com/ZurichNLP/swissbert). We also release a version that was fine-tuned on named entity recognition (NER): https://huggingface.co/ZurichNLP/swissbert-ner
|
| 21 |
|
|
|
|
|
|
|
|
|
|
| 22 |
## Languages
|
| 23 |
|
| 24 |
SwissBERT contains the following language adapters:
|
|
@@ -29,6 +33,7 @@ SwissBERT contains the following language adapters:
|
|
| 29 |
| 1 | `fr_CH` | French |
|
| 30 |
| 2 | `it_CH` | Italian |
|
| 31 |
| 3 | `rm_CH` | Romansh Grischun |
|
|
|
|
| 32 |
|
| 33 |
## License
|
| 34 |
Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).
|
|
@@ -87,6 +92,10 @@ SwissBERT is not designed for generating text.
|
|
| 87 |
- Training data: German, French, Italian and Romansh documents in the [Swissdox@LiRI](https://t.uzh.ch/1hI) database, until 2022.
|
| 88 |
- Training procedure: Masked language modeling
|
| 89 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
## Environmental Impact
|
| 91 |
- Hardware type: RTX 2080 Ti.
|
| 92 |
- Hours used: 10 epochs × 18 hours × 8 devices = 1440 hours
|
|
@@ -95,7 +104,7 @@ SwissBERT is not designed for generating text.
|
|
| 95 |
- Carbon efficiency: 0.0016 kg CO2e/kWh ([source](https://t.uzh.ch/1rU))
|
| 96 |
- Carbon emitted: 0.6 kg CO2e ([source](https://mlco2.github.io/impact#compute))
|
| 97 |
|
| 98 |
-
##
|
| 99 |
```bibtex
|
| 100 |
@article{vamvas-etal-2023-swissbert,
|
| 101 |
title={Swiss{BERT}: The Multilingual Language Model for Switzerland},
|
|
@@ -106,4 +115,15 @@ SwissBERT is not designed for generating text.
|
|
| 106 |
primaryClass={cs.CL},
|
| 107 |
url={https://arxiv.org/abs/2303.13310}
|
| 108 |
}
|
| 109 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
- fr
|
| 6 |
- it
|
| 7 |
- rm
|
| 8 |
+
- gsw
|
| 9 |
- multilingual
|
| 10 |
inference: false
|
| 11 |
---
|
|
|
|
| 20 |
|
| 21 |
The pre-training code and usage examples are available [here](https://github.com/ZurichNLP/swissbert). We also release a version that was fine-tuned on named entity recognition (NER): https://huggingface.co/ZurichNLP/swissbert-ner
|
| 22 |
|
| 23 |
+
## Update 2024-01: Support for Swiss German
|
| 24 |
+
We added a Swiss German adapter to the model.
|
| 25 |
+
|
| 26 |
## Languages
|
| 27 |
|
| 28 |
SwissBERT contains the following language adapters:
|
|
|
|
| 33 |
| 1 | `fr_CH` | French |
|
| 34 |
| 2 | `it_CH` | Italian |
|
| 35 |
| 3 | `rm_CH` | Romansh Grischun |
|
| 36 |
+
| 4 | `gsw` | Swiss German |
|
| 37 |
|
| 38 |
## License
|
| 39 |
Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).
|
|
|
|
| 92 |
- Training data: German, French, Italian and Romansh documents in the [Swissdox@LiRI](https://t.uzh.ch/1hI) database, until 2022.
|
| 93 |
- Training procedure: Masked language modeling
|
| 94 |
|
| 95 |
+
The Swiss German adapter was trained on the following two datasets of written Swiss German:
|
| 96 |
+
1. [SwissCrawl](https://icosys.ch/swisscrawl) ([Linder et al., LREC 2020](https://aclanthology.org/2020.lrec-1.329)), a collection of Swiss German web text (forum discussions, social media).
|
| 97 |
+
2. A custom dataset of Swiss German tweets
|
| 98 |
+
|
| 99 |
## Environmental Impact
|
| 100 |
- Hardware type: RTX 2080 Ti.
|
| 101 |
- Hours used: 10 epochs × 18 hours × 8 devices = 1440 hours
|
|
|
|
| 104 |
- Carbon efficiency: 0.0016 kg CO2e/kWh ([source](https://t.uzh.ch/1rU))
|
| 105 |
- Carbon emitted: 0.6 kg CO2e ([source](https://mlco2.github.io/impact#compute))
|
| 106 |
|
| 107 |
+
## Citations
|
| 108 |
```bibtex
|
| 109 |
@article{vamvas-etal-2023-swissbert,
|
| 110 |
title={Swiss{BERT}: The Multilingual Language Model for Switzerland},
|
|
|
|
| 115 |
primaryClass={cs.CL},
|
| 116 |
url={https://arxiv.org/abs/2303.13310}
|
| 117 |
}
|
| 118 |
+
```
|
| 119 |
+
|
| 120 |
+
Swiss German adapter:
|
| 121 |
+
```bibtex
|
| 122 |
+
@inproceedings{vamvas-etal-2024-modular,,
|
| 123 |
+
title={Modular Adaptation of Multilingual Encoders to Written Swiss German Dialect},
|
| 124 |
+
author={Jannis Vamvas and No{\"e}mi Aepli and Rico Sennrich},
|
| 125 |
+
booktitle={First Workshop on Modular and Open Multilingual NLP},
|
| 126 |
+
year={2024},
|
| 127 |
+
}
|
| 128 |
+
```
|
| 129 |
+
|
config.json
CHANGED
|
@@ -18,7 +18,8 @@
|
|
| 18 |
"de_CH",
|
| 19 |
"fr_CH",
|
| 20 |
"it_CH",
|
| 21 |
-
"rm_CH"
|
|
|
|
| 22 |
],
|
| 23 |
"layer_norm_eps": 1e-05,
|
| 24 |
"ln_before_adapter": true,
|
|
@@ -30,7 +31,7 @@
|
|
| 30 |
"position_embedding_type": "absolute",
|
| 31 |
"pre_norm": false,
|
| 32 |
"torch_dtype": "float32",
|
| 33 |
-
"transformers_version": "4.
|
| 34 |
"type_vocab_size": 1,
|
| 35 |
"use_cache": true,
|
| 36 |
"vocab_size": 50262
|
|
|
|
| 18 |
"de_CH",
|
| 19 |
"fr_CH",
|
| 20 |
"it_CH",
|
| 21 |
+
"rm_CH",
|
| 22 |
+
"gsw"
|
| 23 |
],
|
| 24 |
"layer_norm_eps": 1e-05,
|
| 25 |
"ln_before_adapter": true,
|
|
|
|
| 31 |
"position_embedding_type": "absolute",
|
| 32 |
"pre_norm": false,
|
| 33 |
"torch_dtype": "float32",
|
| 34 |
+
"transformers_version": "4.33.2",
|
| 35 |
"type_vocab_size": 1,
|
| 36 |
"use_cache": true,
|
| 37 |
"vocab_size": 50262
|
pytorch_model.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3621abd43ac00e35367a180626eccb4091493178ed6f922fc78717e2a4c06fed
|
| 3 |
+
size 640768013
|