martinerrazquin commited on
Commit
220f390
·
verified ·
1 Parent(s): 92fc6ee

Add new CrossEncoder model

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,276 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - es
5
+ license: apache-2.0
6
+ tags:
7
+ - sentence-transformers
8
+ - cross-encoder
9
+ - generated_from_trainer
10
+ - dataset_size:578402
11
+ - loss:BinaryCrossEntropyLoss
12
+ base_model: EuroBERT/EuroBERT-210m
13
+ pipeline_tag: text-ranking
14
+ library_name: sentence-transformers
15
+ metrics:
16
+ - map
17
+ - mrr@10
18
+ - ndcg@10
19
+ model-index:
20
+ - name: EuroBERT-210m trained on GooAQ
21
+ results:
22
+ - task:
23
+ type: cross-encoder-reranking
24
+ name: Cross Encoder Reranking
25
+ dataset:
26
+ name: gooaq dev
27
+ type: gooaq-dev
28
+ metrics:
29
+ - type: map
30
+ value: 0.7097
31
+ name: Map
32
+ - type: mrr@10
33
+ value: 0.7089
34
+ name: Mrr@10
35
+ - type: ndcg@10
36
+ value: 0.7579
37
+ name: Ndcg@10
38
+ - task:
39
+ type: cross-encoder-reranking
40
+ name: Cross Encoder Reranking
41
+ dataset:
42
+ name: NanoMSMARCO R100
43
+ type: NanoMSMARCO_R100
44
+ metrics:
45
+ - type: map
46
+ value: 0.463
47
+ name: Map
48
+ - type: mrr@10
49
+ value: 0.4452
50
+ name: Mrr@10
51
+ - type: ndcg@10
52
+ value: 0.5106
53
+ name: Ndcg@10
54
+ - task:
55
+ type: cross-encoder-reranking
56
+ name: Cross Encoder Reranking
57
+ dataset:
58
+ name: NanoNFCorpus R100
59
+ type: NanoNFCorpus_R100
60
+ metrics:
61
+ - type: map
62
+ value: 0.3363
63
+ name: Map
64
+ - type: mrr@10
65
+ value: 0.5204
66
+ name: Mrr@10
67
+ - type: ndcg@10
68
+ value: 0.3632
69
+ name: Ndcg@10
70
+ - task:
71
+ type: cross-encoder-reranking
72
+ name: Cross Encoder Reranking
73
+ dataset:
74
+ name: NanoNQ R100
75
+ type: NanoNQ_R100
76
+ metrics:
77
+ - type: map
78
+ value: 0.4738
79
+ name: Map
80
+ - type: mrr@10
81
+ value: 0.4783
82
+ name: Mrr@10
83
+ - type: ndcg@10
84
+ value: 0.5182
85
+ name: Ndcg@10
86
+ - task:
87
+ type: cross-encoder-nano-beir
88
+ name: Cross Encoder Nano BEIR
89
+ dataset:
90
+ name: NanoBEIR R100 mean
91
+ type: NanoBEIR_R100_mean
92
+ metrics:
93
+ - type: map
94
+ value: 0.4244
95
+ name: Map
96
+ - type: mrr@10
97
+ value: 0.4813
98
+ name: Mrr@10
99
+ - type: ndcg@10
100
+ value: 0.464
101
+ name: Ndcg@10
102
+ datasets:
103
+ - sentence-transformers/gooaq
104
+ ---
105
+ [<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/67b2f4e49edebc815a3a4739/R1g957j1aBbx8lhZbWmxw.jpeg" width="200"/>](https://huggingface.co/fjmgAI)
106
+
107
+ ## Fine-Tuned Model
108
+
109
+ **`fjmgAI/rerank1-210M-EuroBERT`**
110
+
111
+ ## Base Model
112
+ **`EuroBERT/EuroBERT-210m`**
113
+
114
+ ## Fine-Tuning Method
115
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [EuroBERT/EuroBERT-210m](https://huggingface.co/EuroBERT/EuroBERT-210m) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
116
+
117
+ ## Dataset
118
+ **[`sentence-transformers/gooaq`](https://huggingface.co/datasets/sentence-transformers/gooaq)**
119
+
120
+ ### Description
121
+ This dataset is a collection of question-answer pairs, collected from Google.
122
+
123
+ ## Fine-Tuning Details
124
+ - The model was trained using 578,402 training samples from sentence-transformer.
125
+
126
+ #### Cross Encoder Reranking
127
+
128
+ * Dataset: `gooaq-dev`
129
+ * Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters:
130
+ ```json
131
+ {
132
+ "at_k": 10,
133
+ "always_rerank_positives": false
134
+ }
135
+ ```
136
+
137
+ | Metric | Value |
138
+ |:------------|:---------------------|
139
+ | map | 0.7097 (+0.1786) |
140
+ | mrr@10 | 0.7089 (+0.1850) |
141
+ | **ndcg@10** | **0.7579 (+0.1667)** |
142
+
143
+ #### Cross Encoder Reranking
144
+
145
+ * Datasets: `NanoMSMARCO_R100`, `NanoNFCorpus_R100` and `NanoNQ_R100`
146
+ * Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters:
147
+ ```json
148
+ {
149
+ "at_k": 10,
150
+ "always_rerank_positives": true
151
+ }
152
+ ```
153
+
154
+ | Metric | NanoMSMARCO_R100 | NanoNFCorpus_R100 | NanoNQ_R100 |
155
+ |:------------|:---------------------|:---------------------|:---------------------|
156
+ | map | 0.4630 (-0.0266) | 0.3363 (+0.0753) | 0.4738 (+0.0542) |
157
+ | mrr@10 | 0.4452 (-0.0323) | 0.5204 (+0.0206) | 0.4783 (+0.0516) |
158
+ | **ndcg@10** | **0.5106 (-0.0298)** | **0.3632 (+0.0381)** | **0.5182 (+0.0176)** |
159
+
160
+ #### Cross Encoder Nano BEIR
161
+
162
+ * Dataset: `NanoBEIR_R100_mean`
163
+ * Evaluated with [<code>CrossEncoderNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderNanoBEIREvaluator) with these parameters:
164
+ ```json
165
+ {
166
+ "dataset_names": [
167
+ "msmarco",
168
+ "nfcorpus",
169
+ "nq"
170
+ ],
171
+ "rerank_k": 100,
172
+ "at_k": 10,
173
+ "always_rerank_positives": true
174
+ }
175
+ ```
176
+
177
+ | Metric | Value |
178
+ |:------------|:---------------------|
179
+ | map | 0.4244 (+0.0343) |
180
+ | mrr@10 | 0.4813 (+0.0133) |
181
+ | **ndcg@10** | **0.4640 (+0.0086)** |
182
+
183
+ <!--
184
+ ## Bias, Risks and Limitations
185
+
186
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
187
+ -->
188
+
189
+ <!--
190
+ ### Recommendations
191
+
192
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
193
+ -->
194
+
195
+ ## Usage
196
+
197
+ ### Direct Usage (Sentence Transformers)
198
+
199
+ First install the Sentence Transformers library:
200
+
201
+ ```bash
202
+ pip install -U sentence-transformers
203
+ ```
204
+
205
+ Then you can load this model and run inference.
206
+ ```python
207
+ from sentence_transformers import CrossEncoder
208
+
209
+ # Download from the 🤗 Hub
210
+ model = CrossEncoder("fjmgAI/rerank1-210M-EuroBERT", trust_remote_code=True)
211
+ # Get scores for pairs of texts
212
+ pairs = [
213
+ ['what are the risks with taking statins?', "['Muscle pain and damage. One of the most common complaints of people taking statins is muscle pain. ... ', 'Liver damage. Occasionally, statin use could cause an increase in the level of enzymes that signal liver inflammation. ... ', 'Increased blood sugar or type 2 diabetes. ... ', 'Neurological side effects.']"],
214
+ ['what are the risks with taking statins?', 'Doctors discovered that statins can help lower blood pressure, as well as lower cholesterol. Statins are often prescribed to people with high cholesterol. Too much cholesterol in your blood increases your risk of heart attacks and strokes.'],
215
+ ['what are the risks with taking statins?', 'Lipitor and Crestor are both effective statins that lower levels of “bad” cholesterol and increase levels of “good” cholesterol. While Crestor is the more potent statin, both medications are effective and have slightly different side effects and drug interactions.'],
216
+ ['what are the risks with taking statins?', "About simvastatin Simvastatin belongs to a group of medicines called statins. It's used to lower cholesterol if you've been diagnosed with high blood cholesterol. It's also taken to prevent heart disease, including heart attacks and strokes."],
217
+ ['what are the risks with taking statins?', 'Zetia works to lower cholesterol in a new way different from the statins: it inhibits the absorption of cholesterol in the small intestine, whereas the statins work by blocking cholesterol production in the liver.'],
218
+ ]
219
+ scores = model.predict(pairs)
220
+ print(scores.shape)
221
+ # (5,)
222
+
223
+ # Or rank different texts based on similarity to a single text
224
+ ranks = model.rank(
225
+ 'what are the risks with taking statins?',
226
+ [
227
+ "['Muscle pain and damage. One of the most common complaints of people taking statins is muscle pain. ... ', 'Liver damage. Occasionally, statin use could cause an increase in the level of enzymes that signal liver inflammation. ... ', 'Increased blood sugar or type 2 diabetes. ... ', 'Neurological side effects.']",
228
+ 'Doctors discovered that statins can help lower blood pressure, as well as lower cholesterol. Statins are often prescribed to people with high cholesterol. Too much cholesterol in your blood increases your risk of heart attacks and strokes.',
229
+ 'Lipitor and Crestor are both effective statins that lower levels of “bad” cholesterol and increase levels of “good” cholesterol. While Crestor is the more potent statin, both medications are effective and have slightly different side effects and drug interactions.',
230
+ "About simvastatin Simvastatin belongs to a group of medicines called statins. It's used to lower cholesterol if you've been diagnosed with high blood cholesterol. It's also taken to prevent heart disease, including heart attacks and strokes.",
231
+ 'Zetia works to lower cholesterol in a new way different from the statins: it inhibits the absorption of cholesterol in the small intestine, whereas the statins work by blocking cholesterol production in the liver.',
232
+ ]
233
+ )
234
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
235
+ ```
236
+
237
+ <!--
238
+ ### Direct Usage (Transformers)
239
+
240
+ <details><summary>Click to see the direct usage in Transformers</summary>
241
+
242
+ </details>
243
+ -->
244
+
245
+ <!--
246
+ ### Downstream Usage (Sentence Transformers)
247
+
248
+ You can finetune this model on your own dataset.
249
+
250
+ <details><summary>Click to expand</summary>
251
+
252
+ </details>
253
+ -->
254
+
255
+ <!--
256
+ ### Out-of-Scope Use
257
+
258
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
259
+ -->
260
+
261
+ ### Framework Versions
262
+ - Python: 3.11.12
263
+ - Sentence Transformers: 4.0.2
264
+ - Transformers: 4.51.2
265
+ - PyTorch: 2.6.0+cu126
266
+ - Accelerate: 1.6.0
267
+ - Datasets: 3.5.0
268
+ - Tokenizers: 0.21.1
269
+
270
+ ## Purpose
271
+ This tuned reranker model is optimized for **Spanish and English applications**, prioritizing **accurate reordering of results** by leveraging semantic similarity through refined embedding comparisons, ideal for enhancing **question-answering** and **document retrieval** tasks.
272
+
273
+ - **Developed by:** fjmgAI
274
+ - **License:** apache-2.0
275
+
276
+ [<img src="https://sbert.net/_static/logo.png" width="200"/>](https://github.com/UKPLab/sentence-transformers)
config.json ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "EuroBertForSequenceClassification"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "auto_map": {
8
+ "AutoConfig": "configuration_eurobert.EuroBertConfig",
9
+ "AutoModel": "EuroBERT/EuroBERT-210m--modeling_eurobert.EuroBertModel",
10
+ "AutoModelForMaskedLM": "EuroBERT/EuroBERT-210m--modeling_eurobert.EuroBertForMaskedLM",
11
+ "AutoModelForPreTraining": "EuroBERT/EuroBERT-210m--modeling_eurobert.EuroBertPreTrainedModel",
12
+ "AutoModelForSequenceClassification": "modeling_eurobert.EuroBertForSequenceClassification",
13
+ "AutoModelForTokenClassification": "EuroBERT/EuroBERT-210m--modeling_eurobert.EuroBertForTokenClassification"
14
+ },
15
+ "bos_token": "<|begin_of_text|>",
16
+ "bos_token_id": 128000,
17
+ "clf_pooling": "late",
18
+ "dtype": "float32",
19
+ "eos_token": "<|end_of_text|>",
20
+ "eos_token_id": 128001,
21
+ "head_dim": 64,
22
+ "hidden_act": "silu",
23
+ "hidden_dropout": 0.0,
24
+ "hidden_size": 768,
25
+ "id2label": {
26
+ "0": "LABEL_0"
27
+ },
28
+ "initializer_range": 0.02,
29
+ "intermediate_size": 3072,
30
+ "label2id": {
31
+ "LABEL_0": 0
32
+ },
33
+ "mask_token": "<|mask|>",
34
+ "mask_token_id": 128002,
35
+ "max_position_embeddings": 8192,
36
+ "mlp_bias": false,
37
+ "model_type": "eurobert",
38
+ "num_attention_heads": 12,
39
+ "num_hidden_layers": 12,
40
+ "num_key_value_heads": 12,
41
+ "pad_token": "<|end_of_text|>",
42
+ "pad_token_id": 128004,
43
+ "pretraining_tp": 1,
44
+ "rms_norm_eps": 1e-05,
45
+ "rope_scaling": null,
46
+ "rope_theta": 250000,
47
+ "sentence_transformers": {
48
+ "activation_fn": "torch.nn.modules.activation.Sigmoid",
49
+ "version": "5.1.1"
50
+ },
51
+ "tie_word_embeddings": false,
52
+ "transformers_version": "4.57.1",
53
+ "use_cache": false,
54
+ "vocab_size": 128256
55
+ }
configuration_eurobert.py ADDED
@@ -0,0 +1,216 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
2
+ # This file was automatically generated from src/transformers/models/eurobert/modular_eurobert.py.
3
+ # Do NOT edit this file manually as any edits will be overwritten by the generation of
4
+ # the file from the modular. If any change should be done, please apply the change to the
5
+ # modular_eurobert.py file directly. One of our CI enforces this.
6
+ # 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
7
+ # coding=utf-8
8
+ # Copyright 2025 Nicolas Boizard, Duarte M. Alves, Hippolyte Gisserot-Boukhlef and the EuroBert team. All rights reserved.
9
+ #
10
+ #
11
+ # Licensed under the Apache License, Version 2.0 (the "License");
12
+ # you may not use this file except in compliance with the License.
13
+ # You may obtain a copy of the License at
14
+ #
15
+ # http://www.apache.org/licenses/LICENSE-2.0
16
+ #
17
+ # Unless required by applicable law or agreed to in writing, software
18
+ # distributed under the License is distributed on an "AS IS" BASIS,
19
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
20
+ # See the License for the specific language governing permissions and
21
+ # limitations under the License.
22
+
23
+ from transformers.utils import logging
24
+ from transformers.models.llama import LlamaConfig
25
+
26
+
27
+ logger = logging.get_logger(__name__)
28
+
29
+
30
+ class EuroBertConfig(LlamaConfig):
31
+ r"""
32
+ This is the configuration class to store the configuration of a [`EuroBertModel`]. It is used to instantiate an EuroBert
33
+ model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
34
+ defaults will yield a similar configuration to that of the EuroBERT-210m.
35
+
36
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
37
+ documentation from [`PretrainedConfig`] for more information.
38
+
39
+
40
+ Args:
41
+ vocab_size (`int`, *optional*, defaults to 128256):
42
+ Vocabulary size of the EuroBert model. Defines the number of different tokens that can be represented by the
43
+ `inputs_ids` passed when calling [`EuroBertModel`]
44
+ hidden_size (`int`, *optional*, defaults to 768):
45
+ Dimensionality of the encoder layers and the pooler layer.
46
+ intermediate_size (`int`, *optional*, defaults to 3072):
47
+ Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
48
+ num_hidden_layers (`int`, *optional*, defaults to 12):
49
+ Number of hidden layers in the Transformer encoder.
50
+ num_attention_heads (`int`, *optional*, defaults to 12):
51
+ Number of attention heads for each attention layer in the Transformer encoder.
52
+ num_key_value_heads (`int`, *optional*):
53
+ This is the number of key_value heads that should be used to implement Grouped Query Attention. If
54
+ `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
55
+ `num_key_value_heads=1` the model will use Multi Query Attention (MQA) otherwise GQA is used. When
56
+ converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
57
+ by meanpooling all the original heads within that group. For more details checkout [this
58
+ paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to
59
+ `num_attention_heads`.
60
+ hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
61
+ The non-linear activation function (function or string) in the encoder and pooler.
62
+ max_position_embeddings (`int`, *optional*, defaults to 8192):
63
+ The maximum sequence length that this model might ever be used with. EuroBert supports up to 8192 tokens,
64
+ EuroBert-pretrained up to 2048.
65
+ initializer_range (`float`, *optional*, defaults to 0.02):
66
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
67
+ rms_norm_eps (`float`, *optional*, defaults to 1e-05):
68
+ The epsilon used by the rms normalization layers.
69
+ bos_token_id (`int`, *optional*, defaults to 128000):
70
+ Beginning of stream token id.
71
+ eos_token_id (`int`, *optional*, defaults to 128001):
72
+ End of stream token id.
73
+ pad_token_id (`int`, *optional*, defaults to 128001):
74
+ Padding token id.
75
+ mask_token_id (`int`, *optional*, defaults to 128002):
76
+ Mask token id.
77
+ pretraining_tp (`int`, *optional*, defaults to 1):
78
+ Experimental feature. Tensor parallelism rank used during pretraining. Please refer to [this
79
+ document](https://huggingface.co/docs/transformers/main/perf_train_gpu_many#tensor-parallelism) to
80
+ understand more about it. This value is necessary to ensure exact reproducibility of the pretraining
81
+ results. Please refer to [this issue](https://github.com/pytorch/pytorch/issues/76232).
82
+ tie_word_embeddings (`bool`, *optional*, defaults to `False`):
83
+ Whether to tie weight embeddings
84
+ rope_theta (`float`, *optional*, defaults to 250000.0):
85
+ The base period of the RoPE embeddings. EuroBert used base period of 250000.0,
86
+ EuroBert-pretrained 10000.0.
87
+ rope_scaling (`Dict`, *optional*):
88
+ Dictionary containing the scaling configuration for the RoPE embeddings. NOTE: if you apply new rope type
89
+ and you expect the model to work on longer `max_position_embeddings`, we recommend you to update this value
90
+ accordingly.
91
+ Expected contents:
92
+ `rope_type` (`str`):
93
+ The sub-variant of RoPE to use. Can be one of ['default', 'linear', 'dynamic', 'yarn', 'longrope',
94
+ 'eurobert3'], with 'default' being the original RoPE implementation.
95
+ `factor` (`float`, *optional*):
96
+ Used with all rope types except 'default'. The scaling factor to apply to the RoPE embeddings. In
97
+ most scaling types, a `factor` of x will enable the model to handle sequences of length x *
98
+ original maximum pre-trained length.
99
+ `original_max_position_embeddings` (`int`, *optional*):
100
+ Used with 'dynamic', 'longrope' and 'eurobert3'. The original max position embeddings used during
101
+ pretraining.
102
+ `attention_factor` (`float`, *optional*):
103
+ Used with 'yarn' and 'longrope'. The scaling factor to be applied on the attention
104
+ computation. If unspecified, it defaults to value recommended by the implementation, using the
105
+ `factor` field to infer the suggested value.
106
+ `beta_fast` (`float`, *optional*):
107
+ Only used with 'yarn'. Parameter to set the boundary for extrapolation (only) in the linear
108
+ ramp function. If unspecified, it defaults to 32.
109
+ `beta_slow` (`float`, *optional*):
110
+ Only used with 'yarn'. Parameter to set the boundary for interpolation (only) in the linear
111
+ ramp function. If unspecified, it defaults to 1.
112
+ `short_factor` (`List[float]`, *optional*):
113
+ Only used with 'longrope'. The scaling factor to be applied to short contexts (<
114
+ `original_max_position_embeddings`). Must be a list of numbers with the same length as the hidden
115
+ size divided by the number of attention heads divided by 2
116
+ `long_factor` (`List[float]`, *optional*):
117
+ Only used with 'longrope'. The scaling factor to be applied to long contexts (<
118
+ `original_max_position_embeddings`). Must be a list of numbers with the same length as the hidden
119
+ size divided by the number of attention heads divided by 2
120
+ `low_freq_factor` (`float`, *optional*):
121
+ Only used with 'eurobert3'. Scaling factor applied to low frequency components of the RoPE
122
+ `high_freq_factor` (`float`, *optional*):
123
+ Only used with 'eurobert3'. Scaling factor applied to high frequency components of the RoPE
124
+ attention_bias (`bool`, *optional*, defaults to `False`):
125
+ Whether to use a bias in the query, key, value and output projection layers during self-attention.
126
+ attention_dropout (`float`, *optional*, defaults to 0.0):
127
+ The dropout ratio for the attention probabilities.
128
+ mlp_bias (`bool`, *optional*, defaults to `False`):
129
+ Whether to use a bias in up_proj, down_proj and gate_proj layers in the MLP layers.
130
+ head_dim (`int`, *optional*):
131
+ The attention head dimension. If None, it will default to hidden_size // num_attention_heads
132
+ classifier_pooling (`str`, *optional*, defaults to `"late"`):
133
+ The pooling strategy to use for the classifier. Can be one of ['bos', 'mean', 'late'].
134
+
135
+ ```python
136
+ >>> from transformers import EuroBertModel, EuroBertConfig
137
+
138
+ >>> # Initializing a EuroBert eurobert-base style configuration
139
+ >>> configuration = EuroBertConfig()
140
+
141
+ >>> # Initializing a model from the eurobert-base style configuration
142
+ >>> model = EuroBertModel(configuration)
143
+
144
+ >>> # Accessing the model configuration
145
+ >>> configuration = model.config
146
+ ```"""
147
+
148
+ model_type = "eurobert"
149
+
150
+ def __init__(
151
+ self,
152
+ vocab_size=128256,
153
+ hidden_size=768,
154
+ intermediate_size=3072,
155
+ num_hidden_layers=12,
156
+ num_attention_heads=12,
157
+ num_key_value_heads=None,
158
+ hidden_act="silu",
159
+ max_position_embeddings=8192,
160
+ initializer_range=0.02,
161
+ rms_norm_eps=1e-05,
162
+ bos_token_id=128000,
163
+ eos_token_id=128001,
164
+ pad_token_id=128001,
165
+ mask_token_id=128002,
166
+ pretraining_tp=1,
167
+ tie_word_embeddings=False,
168
+ rope_theta=250000.0,
169
+ rope_scaling=None,
170
+ attention_bias=False,
171
+ attention_dropout=0.0,
172
+ mlp_bias=False,
173
+ head_dim=None,
174
+ classifier_pooling="late",
175
+ **kwargs,
176
+ ):
177
+ # use_cache is specific to decoder models and should be set to False for encoder models
178
+ use_cache = kwargs.pop("use_cache", None)
179
+ if use_cache:
180
+ logger.warning_once(
181
+ "The `use_cache` argument to EuroBertConfig is set to `False`, as caching is never used for encoder models."
182
+ )
183
+
184
+ if num_key_value_heads is None:
185
+ num_key_value_heads = num_attention_heads
186
+
187
+ super().__init__(
188
+ vocab_size=vocab_size,
189
+ hidden_size=hidden_size,
190
+ intermediate_size=intermediate_size,
191
+ num_hidden_layers=num_hidden_layers,
192
+ num_attention_heads=num_attention_heads,
193
+ num_key_value_heads=num_key_value_heads,
194
+ hidden_act=hidden_act,
195
+ max_position_embeddings=max_position_embeddings,
196
+ initializer_range=initializer_range,
197
+ rms_norm_eps=rms_norm_eps,
198
+ use_cache=False,
199
+ bos_token_id=bos_token_id,
200
+ eos_token_id=eos_token_id,
201
+ pad_token_id=pad_token_id,
202
+ pretraining_tp=pretraining_tp,
203
+ tie_word_embeddings=tie_word_embeddings,
204
+ rope_theta=rope_theta,
205
+ rope_scaling=rope_scaling,
206
+ attention_bias=attention_bias,
207
+ attention_dropout=attention_dropout,
208
+ mlp_bias=mlp_bias,
209
+ head_dim=head_dim,
210
+ **kwargs,
211
+ )
212
+ self.mask_token_id = mask_token_id
213
+ self.clf_pooling = classifier_pooling
214
+
215
+
216
+ __all__ = ["EuroBertConfig"]
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:16376b181fbb3f304c496395a937fa0ad374e72ac16cf9f51ae3c191486f2476
3
+ size 849442036
modeling_eurobert.py ADDED
@@ -0,0 +1,1057 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
2
+ # This file was automatically generated from src/transformers/models/eurobert/modular_eurobert.py.
3
+ # Do NOT edit this file manually as any edits will be overwritten by the generation of
4
+ # the file from the modular. If any change should be done, please apply the change to the
5
+ # modular_eurobert.py file directly. One of our CI enforces this.
6
+ # 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
7
+ # coding=utf-8
8
+ # Copyright 2025 Nicolas Boizard, Duarte M. Alves, Hippolyte Gisserot-Boukhlef and the EuroBert team. All rights reserved.
9
+ #
10
+ #
11
+ # Licensed under the Apache License, Version 2.0 (the "License");
12
+ # you may not use this file except in compliance with the License.
13
+ # You may obtain a copy of the License at
14
+ #
15
+ # http://www.apache.org/licenses/LICENSE-2.0
16
+ #
17
+ # Unless required by applicable law or agreed to in writing, software
18
+ # distributed under the License is distributed on an "AS IS" BASIS,
19
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
20
+ # See the License for the specific language governing permissions and
21
+ # limitations under the License.
22
+
23
+ from typing import Callable, Optional, Tuple, Union
24
+
25
+ import torch
26
+ from torch import nn
27
+ from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
28
+
29
+ from transformers.activations import ACT2FN
30
+ from transformers.cache_utils import Cache, StaticCache
31
+ from transformers.modeling_attn_mask_utils import AttentionMaskConverter
32
+ from transformers.modeling_flash_attention_utils import FlashAttentionKwargs
33
+ from transformers.modeling_outputs import BaseModelOutput, BaseModelOutputWithPast, MaskedLMOutput, QuestionAnsweringModelOutput, SequenceClassifierOutput, TokenClassifierOutput
34
+ from transformers.modeling_rope_utils import ROPE_INIT_FUNCTIONS
35
+ from transformers.modeling_utils import ALL_ATTENTION_FUNCTIONS, PreTrainedModel
36
+ from transformers.processing_utils import Unpack
37
+ from transformers.utils import add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, logging
38
+ from .configuration_eurobert import EuroBertConfig
39
+
40
+
41
+ logger = logging.get_logger(__name__)
42
+
43
+ _CHECKPOINT_FOR_DOC = "EuroBERT/EuroBERT-210m"
44
+ _CONFIG_FOR_DOC = "EuroBertConfig"
45
+
46
+
47
+ class EuroBertRMSNorm(nn.Module):
48
+ def __init__(self, hidden_size, eps=1e-5):
49
+ """
50
+ EuroBertRMSNorm is equivalent to T5LayerNorm
51
+ """
52
+ super().__init__()
53
+ self.weight = nn.Parameter(torch.ones(hidden_size))
54
+ self.variance_epsilon = eps
55
+
56
+ def forward(self, hidden_states):
57
+ input_dtype = hidden_states.dtype
58
+ hidden_states = hidden_states.to(torch.float32)
59
+ variance = hidden_states.pow(2).mean(-1, keepdim=True)
60
+ hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
61
+ return self.weight * hidden_states.to(input_dtype)
62
+
63
+ def extra_repr(self):
64
+ return f"{tuple(self.weight.shape)}, eps={self.variance_epsilon}"
65
+
66
+
67
+ def rotate_half(x):
68
+ """Rotates half the hidden dims of the input."""
69
+ x1 = x[..., : x.shape[-1] // 2]
70
+ x2 = x[..., x.shape[-1] // 2 :]
71
+ return torch.cat((-x2, x1), dim=-1)
72
+
73
+
74
+ def apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1):
75
+ """Applies Rotary Position Embedding to the query and key tensors.
76
+
77
+ Args:
78
+ q (`torch.Tensor`): The query tensor.
79
+ k (`torch.Tensor`): The key tensor.
80
+ cos (`torch.Tensor`): The cosine part of the rotary embedding.
81
+ sin (`torch.Tensor`): The sine part of the rotary embedding.
82
+ position_ids (`torch.Tensor`, *optional*):
83
+ Deprecated and unused.
84
+ unsqueeze_dim (`int`, *optional*, defaults to 1):
85
+ The 'unsqueeze_dim' argument specifies the dimension along which to unsqueeze cos[position_ids] and
86
+ sin[position_ids] so that they can be properly broadcasted to the dimensions of q and k. For example, note
87
+ that cos[position_ids] and sin[position_ids] have the shape [batch_size, seq_len, head_dim]. Then, if q and
88
+ k have the shape [batch_size, heads, seq_len, head_dim], then setting unsqueeze_dim=1 makes
89
+ cos[position_ids] and sin[position_ids] broadcastable to the shapes of q and k. Similarly, if q and k have
90
+ the shape [batch_size, seq_len, heads, head_dim], then set unsqueeze_dim=2.
91
+ Returns:
92
+ `tuple(torch.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding.
93
+ """
94
+ cos = cos.unsqueeze(unsqueeze_dim)
95
+ sin = sin.unsqueeze(unsqueeze_dim)
96
+ q_embed = (q * cos) + (rotate_half(q) * sin)
97
+ k_embed = (k * cos) + (rotate_half(k) * sin)
98
+ return q_embed, k_embed
99
+
100
+
101
+ def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
102
+ """
103
+ This is the equivalent of torch.repeat_interleave(x, dim=1, repeats=n_rep). The hidden states go from (batch,
104
+ num_key_value_heads, seqlen, head_dim) to (batch, num_attention_heads, seqlen, head_dim)
105
+ """
106
+ batch, num_key_value_heads, slen, head_dim = hidden_states.shape
107
+ if n_rep == 1:
108
+ return hidden_states
109
+ hidden_states = hidden_states[:, :, None, :, :].expand(batch, num_key_value_heads, n_rep, slen, head_dim)
110
+ return hidden_states.reshape(batch, num_key_value_heads * n_rep, slen, head_dim)
111
+
112
+
113
+ def eager_attention_forward(
114
+ module: nn.Module,
115
+ query: torch.Tensor,
116
+ key: torch.Tensor,
117
+ value: torch.Tensor,
118
+ attention_mask: Optional[torch.Tensor],
119
+ scaling: float,
120
+ dropout: float = 0.0,
121
+ **kwargs,
122
+ ):
123
+ key_states = repeat_kv(key, module.num_key_value_groups)
124
+ value_states = repeat_kv(value, module.num_key_value_groups)
125
+
126
+ attn_weights = torch.matmul(query, key_states.transpose(2, 3)) * scaling
127
+ if attention_mask is not None:
128
+ causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
129
+ attn_weights = attn_weights + causal_mask
130
+
131
+ attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query.dtype)
132
+ attn_weights = nn.functional.dropout(attn_weights, p=dropout, training=module.training)
133
+ attn_output = torch.matmul(attn_weights, value_states)
134
+ attn_output = attn_output.transpose(1, 2).contiguous()
135
+
136
+ return attn_output, attn_weights
137
+
138
+
139
+ class EuroBertAttention(nn.Module):
140
+ """Multi-headed attention from 'Attention Is All You Need' paper"""
141
+
142
+ def __init__(self, config: EuroBertConfig, layer_idx: int):
143
+ super().__init__()
144
+ self.config = config
145
+ self.layer_idx = layer_idx
146
+ self.head_dim = getattr(config, "head_dim", config.hidden_size // config.num_attention_heads)
147
+ self.num_key_value_groups = config.num_attention_heads // config.num_key_value_heads
148
+ self.scaling = self.head_dim**-0.5
149
+ self.attention_dropout = config.attention_dropout
150
+ self.is_causal = False
151
+
152
+ self.q_proj = nn.Linear(
153
+ config.hidden_size, config.num_attention_heads * self.head_dim, bias=config.attention_bias
154
+ )
155
+ self.k_proj = nn.Linear(
156
+ config.hidden_size, config.num_key_value_heads * self.head_dim, bias=config.attention_bias
157
+ )
158
+ self.v_proj = nn.Linear(
159
+ config.hidden_size, config.num_key_value_heads * self.head_dim, bias=config.attention_bias
160
+ )
161
+ self.o_proj = nn.Linear(
162
+ config.num_attention_heads * self.head_dim, config.hidden_size, bias=config.attention_bias
163
+ )
164
+
165
+ def forward(
166
+ self,
167
+ hidden_states: torch.Tensor,
168
+ position_embeddings: Tuple[torch.Tensor, torch.Tensor],
169
+ attention_mask: Optional[torch.Tensor],
170
+ **kwargs: Unpack[FlashAttentionKwargs],
171
+ ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
172
+ input_shape = hidden_states.shape[:-1]
173
+ hidden_shape = (*input_shape, -1, self.head_dim)
174
+
175
+ query_states = self.q_proj(hidden_states).view(hidden_shape).transpose(1, 2)
176
+ key_states = self.k_proj(hidden_states).view(hidden_shape).transpose(1, 2)
177
+ value_states = self.v_proj(hidden_states).view(hidden_shape).transpose(1, 2)
178
+
179
+ cos, sin = position_embeddings
180
+ query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)
181
+
182
+ attention_interface: Callable = eager_attention_forward
183
+ if self.config._attn_implementation != "eager":
184
+ if self.config._attn_implementation == "sdpa" and kwargs.get("output_attentions", False):
185
+ logger.warning_once(
186
+ "`torch.nn.functional.scaled_dot_product_attention` does not support `output_attentions=True`. Falling back to "
187
+ 'eager attention. This warning can be removed using the argument `attn_implementation="eager"` when loading the model.'
188
+ )
189
+ else:
190
+ attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]
191
+
192
+ attn_output, attn_weights = attention_interface(
193
+ self,
194
+ query_states,
195
+ key_states,
196
+ value_states,
197
+ attention_mask,
198
+ dropout=0.0 if not self.training else self.attention_dropout,
199
+ scaling=self.scaling,
200
+ is_causal=False,
201
+ **kwargs,
202
+ )
203
+
204
+ attn_output = attn_output.reshape(*input_shape, -1).contiguous()
205
+ attn_output = self.o_proj(attn_output)
206
+ return attn_output, attn_weights
207
+
208
+
209
+ EUROBERT_START_DOCSTRING = r"""
210
+ This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the
211
+ library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
212
+ etc.)
213
+
214
+ This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
215
+ Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
216
+ and behavior.
217
+
218
+ Parameters:
219
+ config ([`EuroBertConfig`]):
220
+ Model configuration class with all the parameters of the model. Initializing with a config file does not
221
+ load the weights associated with the model, only the configuration. Check out the
222
+ [`~PreTrainedModel.from_pretrained`] method to load the model weights.
223
+ """
224
+
225
+
226
+ @add_start_docstrings(
227
+ "The bare EuroBERT Model outputting raw hidden-states without any specific head on top.",
228
+ EUROBERT_START_DOCSTRING,
229
+ )
230
+ class EuroBertPreTrainedModel(PreTrainedModel):
231
+ config_class = EuroBertConfig
232
+ base_model_prefix = "model"
233
+ supports_gradient_checkpointing = True
234
+ _no_split_modules = ["EuroBertDecoderLayer"]
235
+ _skip_keys_device_placement = ["past_key_values"]
236
+ _supports_flash_attn_2 = True
237
+ _supports_sdpa = True
238
+ _supports_flex_attn = True
239
+ _supports_cache_class = True
240
+ _supports_quantized_cache = True
241
+ _supports_static_cache = True
242
+ _supports_attention_backend = True
243
+
244
+ def _init_weights(self, module):
245
+ std = self.config.initializer_range
246
+ if isinstance(module, nn.Linear):
247
+ module.weight.data.normal_(mean=0.0, std=std)
248
+ if module.bias is not None:
249
+ module.bias.data.zero_()
250
+ elif isinstance(module, nn.Embedding):
251
+ module.weight.data.normal_(mean=0.0, std=std)
252
+ if module.padding_idx is not None:
253
+ module.weight.data[module.padding_idx].zero_()
254
+
255
+
256
+ class EuroBertRotaryEmbedding(nn.Module):
257
+ def __init__(self, config: EuroBertConfig, device=None):
258
+ super().__init__()
259
+ # BC: "rope_type" was originally "type"
260
+ if hasattr(config, "rope_scaling") and config.rope_scaling is not None:
261
+ self.rope_type = config.rope_scaling.get("rope_type", config.rope_scaling.get("type"))
262
+ else:
263
+ self.rope_type = "default"
264
+ self.max_seq_len_cached = config.max_position_embeddings
265
+ self.original_max_seq_len = config.max_position_embeddings
266
+
267
+ self.config = config
268
+ self.rope_init_fn = ROPE_INIT_FUNCTIONS[self.rope_type]
269
+
270
+ inv_freq, self.attention_scaling = self.rope_init_fn(self.config, device)
271
+ self.register_buffer("inv_freq", inv_freq, persistent=False)
272
+ self.original_inv_freq = self.inv_freq
273
+
274
+ def _dynamic_frequency_update(self, position_ids, device):
275
+ """
276
+ dynamic RoPE layers should recompute `inv_freq` in the following situations:
277
+ 1 - growing beyond the cached sequence length (allow scaling)
278
+ 2 - the current sequence length is in the original scale (avoid losing precision with small sequences)
279
+ """
280
+ seq_len = torch.max(position_ids) + 1
281
+ if seq_len > self.max_seq_len_cached: # growth
282
+ inv_freq, self.attention_scaling = self.rope_init_fn(self.config, device, seq_len=seq_len)
283
+ self.register_buffer("inv_freq", inv_freq, persistent=False) # TODO joao: may break with compilation
284
+ self.max_seq_len_cached = seq_len
285
+
286
+ if seq_len < self.original_max_seq_len and self.max_seq_len_cached > self.original_max_seq_len: # reset
287
+ # This .to() is needed if the model has been moved to a device after being initialized (because
288
+ # the buffer is automatically moved, but not the original copy)
289
+ self.original_inv_freq = self.original_inv_freq.to(device)
290
+ self.register_buffer("inv_freq", self.original_inv_freq, persistent=False)
291
+ self.max_seq_len_cached = self.original_max_seq_len
292
+
293
+ @torch.no_grad()
294
+ def forward(self, x, position_ids):
295
+ if "dynamic" in self.rope_type:
296
+ self._dynamic_frequency_update(position_ids, device=x.device)
297
+
298
+ # Core RoPE block
299
+ inv_freq_expanded = self.inv_freq[None, :, None].float().expand(position_ids.shape[0], -1, 1)
300
+ position_ids_expanded = position_ids[:, None, :].float()
301
+ # Force float32 (see https://github.com/huggingface/transformers/pull/29285)
302
+ device_type = x.device.type
303
+ device_type = device_type if isinstance(device_type, str) and device_type != "mps" else "cpu"
304
+ with torch.autocast(device_type=device_type, enabled=False):
305
+ freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)
306
+ emb = torch.cat((freqs, freqs), dim=-1)
307
+ cos = emb.cos()
308
+ sin = emb.sin()
309
+
310
+ # Advanced RoPE types (e.g. yarn) apply a post-processing scaling factor, equivalent to scaling attention
311
+ cos = cos * self.attention_scaling
312
+ sin = sin * self.attention_scaling
313
+
314
+ return cos.to(dtype=x.dtype), sin.to(dtype=x.dtype)
315
+
316
+
317
+ class EuroBertMLP(nn.Module):
318
+ def __init__(self, config):
319
+ super().__init__()
320
+ self.config = config
321
+ self.hidden_size = config.hidden_size
322
+ self.intermediate_size = config.intermediate_size
323
+ self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=config.mlp_bias)
324
+ self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=config.mlp_bias)
325
+ self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=config.mlp_bias)
326
+ self.act_fn = ACT2FN[config.hidden_act]
327
+
328
+ def forward(self, x):
329
+ down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
330
+ return down_proj
331
+
332
+
333
+ class EuroBertDecoderLayer(nn.Module):
334
+ def __init__(self, config: EuroBertConfig, layer_idx: int):
335
+ super().__init__()
336
+ self.hidden_size = config.hidden_size
337
+
338
+ self.self_attn = EuroBertAttention(config=config, layer_idx=layer_idx)
339
+
340
+ self.mlp = EuroBertMLP(config)
341
+ self.input_layernorm = EuroBertRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
342
+ self.post_attention_layernorm = EuroBertRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
343
+
344
+ def forward(
345
+ self,
346
+ hidden_states: torch.Tensor,
347
+ attention_mask: Optional[torch.Tensor] = None,
348
+ position_ids: Optional[torch.LongTensor] = None,
349
+ past_key_value: Optional[Cache] = None,
350
+ output_attentions: Optional[bool] = False,
351
+ use_cache: Optional[bool] = False,
352
+ cache_position: Optional[torch.LongTensor] = None,
353
+ position_embeddings: Optional[Tuple[torch.Tensor, torch.Tensor]] = None, # necessary, but kept here for BC
354
+ **kwargs: Unpack[FlashAttentionKwargs],
355
+ ) -> Tuple[torch.FloatTensor, Optional[Tuple[torch.FloatTensor, torch.FloatTensor]]]:
356
+ residual = hidden_states
357
+
358
+ hidden_states = self.input_layernorm(hidden_states)
359
+
360
+ # Self Attention
361
+ hidden_states, self_attn_weights = self.self_attn(
362
+ hidden_states=hidden_states,
363
+ attention_mask=attention_mask,
364
+ position_ids=position_ids,
365
+ past_key_value=past_key_value,
366
+ output_attentions=output_attentions,
367
+ use_cache=use_cache,
368
+ cache_position=cache_position,
369
+ position_embeddings=position_embeddings,
370
+ **kwargs,
371
+ )
372
+ hidden_states = residual + hidden_states
373
+
374
+ # Fully Connected
375
+ residual = hidden_states
376
+ hidden_states = self.post_attention_layernorm(hidden_states)
377
+ hidden_states = self.mlp(hidden_states)
378
+ hidden_states = residual + hidden_states
379
+
380
+ outputs = (hidden_states,)
381
+ if output_attentions:
382
+ outputs += (self_attn_weights,)
383
+
384
+ return outputs
385
+
386
+
387
+ EUROBERT_INPUTS_DOCSTRING = r"""
388
+ Args:
389
+ input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`):
390
+ Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide
391
+ it.
392
+
393
+ Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
394
+ [`PreTrainedTokenizer.__call__`] for details.
395
+
396
+ [What are input IDs?](../glossary#input-ids)
397
+ attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
398
+ Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
399
+
400
+ - 1 for tokens that are **not masked**,
401
+ - 0 for tokens that are **masked**.
402
+
403
+ [What are attention masks?](../glossary#attention-mask)
404
+
405
+ Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
406
+ [`PreTrainedTokenizer.__call__`] for details.
407
+
408
+ If `past_key_values` is used, optionally only the last `input_ids` have to be input (see
409
+ `past_key_values`).
410
+
411
+ If you want to change padding behavior, you should read [`modeling_opt._prepare_decoder_attention_mask`]
412
+ and modify to your needs. See diagram 1 in [the paper](https://arxiv.org/abs/1910.13461) for more
413
+ information on the default strategy.
414
+
415
+ - 1 indicates the head is **not masked**,
416
+ - 0 indicates the head is **masked**.
417
+ position_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
418
+ Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0,
419
+ config.n_positions - 1]`.
420
+
421
+ [What are position IDs?](../glossary#position-ids)
422
+ past_key_values (`Cache` or `tuple(tuple(torch.FloatTensor))`, *optional*):
423
+ Pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention
424
+ blocks) that can be used to speed up sequential decoding. This typically consists in the `past_key_values`
425
+ returned by the model at a previous stage of decoding, when `use_cache=True` or `config.use_cache=True`.
426
+
427
+ Two formats are allowed:
428
+ - a [`~cache_utils.Cache`] instance, see our
429
+ [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache);
430
+ - Tuple of `tuple(torch.FloatTensor)` of length `config.n_layers`, with each tuple having 2 tensors of
431
+ shape `(batch_size, num_heads, sequence_length, embed_size_per_head)`). This is also known as the legacy
432
+ cache format.
433
+
434
+ The model will output the same cache format that is fed as input. If no `past_key_values` are passed, the
435
+ legacy cache format will be returned.
436
+
437
+ If `past_key_values` are used, the user can optionally input only the last `input_ids` (those that don't
438
+ have their past key value states given to this model) of shape `(batch_size, 1)` instead of all `input_ids`
439
+ of shape `(batch_size, sequence_length)`.
440
+ inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
441
+ Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This
442
+ is useful if you want more control over how to convert `input_ids` indices into associated vectors than the
443
+ model's internal embedding lookup matrix.
444
+ use_cache (`bool`, *optional*):
445
+ If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
446
+ `past_key_values`).
447
+ output_attentions (`bool`, *optional*):
448
+ Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
449
+ tensors for more detail.
450
+ output_hidden_states (`bool`, *optional*):
451
+ Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
452
+ more detail.
453
+ return_dict (`bool`, *optional*):
454
+ Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
455
+ cache_position (`torch.LongTensor` of shape `(sequence_length)`, *optional*):
456
+ Indices depicting the position of the input sequence tokens in the sequence. Contrarily to `position_ids`,
457
+ this tensor is not affected by padding. It is used to update the cache in the correct position and to infer
458
+ the complete sequence length.
459
+ """
460
+
461
+
462
+ @add_start_docstrings(
463
+ "The bare EuroBert Model outputting raw hidden-states without any specific head on top.",
464
+ EUROBERT_START_DOCSTRING,
465
+ )
466
+ class EuroBertModel(EuroBertPreTrainedModel):
467
+ """
468
+ Transformer encoder consisting of *config.num_hidden_layers* layers. Each layer is a [`EuroBertDecoderLayer`]
469
+
470
+ Args:
471
+ config: EuroBertConfig
472
+ """
473
+
474
+ def __init__(self, config: EuroBertConfig):
475
+ super().__init__(config)
476
+ self.padding_idx = config.pad_token_id
477
+ self.vocab_size = config.vocab_size
478
+
479
+ self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
480
+ self.layers = nn.ModuleList(
481
+ [EuroBertDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
482
+ )
483
+ self.norm = EuroBertRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
484
+ self.rotary_emb = EuroBertRotaryEmbedding(config=config)
485
+ self.gradient_checkpointing = False
486
+ self.mask_converter = AttentionMaskConverter(is_causal=False)
487
+
488
+ # Initialize weights and apply final processing
489
+ self.post_init()
490
+
491
+ def get_input_embeddings(self):
492
+ return self.embed_tokens
493
+
494
+ def set_input_embeddings(self, value):
495
+ self.embed_tokens = value
496
+
497
+ @add_start_docstrings_to_model_forward(EUROBERT_INPUTS_DOCSTRING)
498
+ @add_code_sample_docstrings(
499
+ checkpoint=_CHECKPOINT_FOR_DOC,
500
+ output_type=BaseModelOutput,
501
+ config_class=_CONFIG_FOR_DOC,
502
+ )
503
+ def forward(
504
+ self,
505
+ input_ids: torch.LongTensor = None,
506
+ attention_mask: Optional[torch.Tensor] = None,
507
+ position_ids: Optional[torch.LongTensor] = None,
508
+ inputs_embeds: Optional[torch.FloatTensor] = None,
509
+ output_attentions: Optional[bool] = None,
510
+ output_hidden_states: Optional[bool] = None,
511
+ return_dict: Optional[bool] = None,
512
+ **flash_attn_kwargs: Unpack[FlashAttentionKwargs],
513
+ ) -> Union[Tuple, BaseModelOutputWithPast]:
514
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
515
+ output_hidden_states = (
516
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
517
+ )
518
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
519
+
520
+ if (input_ids is None) ^ (inputs_embeds is not None):
521
+ raise ValueError("You must specify exactly one of input_ids or inputs_embeds")
522
+
523
+ if inputs_embeds is None:
524
+ inputs_embeds = self.embed_tokens(input_ids)
525
+
526
+ if attention_mask is not None and self.config._attn_implementation != "flash_attention_2":
527
+ mask = self.mask_converter.to_4d(attention_mask, attention_mask.shape[1], inputs_embeds.dtype)
528
+ else:
529
+ mask = attention_mask
530
+
531
+ hidden_states = inputs_embeds
532
+
533
+ # create position embeddings to be shared across the encoder layers
534
+ if position_ids is None:
535
+ position_ids = torch.arange(inputs_embeds.shape[1], device=inputs_embeds.device).unsqueeze(0)
536
+ position_embeddings = self.rotary_emb(hidden_states, position_ids)
537
+
538
+ # encoder layers
539
+ all_hidden_states = () if output_hidden_states else None
540
+ all_self_attns = () if output_attentions else None
541
+
542
+ for encoder_layer in self.layers[: self.config.num_hidden_layers]:
543
+ if output_hidden_states:
544
+ all_hidden_states += (hidden_states,)
545
+
546
+ if self.gradient_checkpointing and self.training:
547
+ layer_outputs = self._gradient_checkpointing_func(
548
+ encoder_layer.__call__,
549
+ hidden_states,
550
+ mask,
551
+ position_ids,
552
+ None,
553
+ output_attentions,
554
+ False,
555
+ None,
556
+ position_embeddings,
557
+ )
558
+ else:
559
+ layer_outputs = encoder_layer(
560
+ hidden_states,
561
+ attention_mask=mask,
562
+ position_ids=position_ids,
563
+ output_attentions=output_attentions,
564
+ position_embeddings=position_embeddings,
565
+ **flash_attn_kwargs,
566
+ )
567
+
568
+ hidden_states = layer_outputs[0]
569
+
570
+ if output_attentions:
571
+ all_self_attns += (layer_outputs[1],)
572
+
573
+ hidden_states = self.norm(hidden_states)
574
+
575
+ # add hidden states from the last encoder layer
576
+ if output_hidden_states:
577
+ all_hidden_states += (hidden_states,)
578
+
579
+ output = BaseModelOutput(
580
+ last_hidden_state=hidden_states,
581
+ hidden_states=all_hidden_states,
582
+ attentions=all_self_attns,
583
+ )
584
+ return output if return_dict else output.to_tuple()
585
+
586
+ def _update_causal_mask(
587
+ self,
588
+ attention_mask: torch.Tensor,
589
+ input_tensor: torch.Tensor,
590
+ cache_position: torch.Tensor,
591
+ past_key_values: Cache,
592
+ output_attentions: bool,
593
+ ):
594
+ if self.config._attn_implementation == "flash_attention_2":
595
+ if attention_mask is not None and (attention_mask == 0.0).any():
596
+ return attention_mask
597
+ return None
598
+
599
+ # For SDPA, when possible, we will rely on its `is_causal` argument instead of its `attn_mask` argument, in
600
+ # order to dispatch on Flash Attention 2. This feature is not compatible with static cache, as SDPA will fail
601
+ # to infer the attention mask.
602
+ past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
603
+ using_static_cache = isinstance(past_key_values, StaticCache)
604
+
605
+ # When output attentions is True, sdpa implementation's forward method calls the eager implementation's forward
606
+ if self.config._attn_implementation == "sdpa" and not using_static_cache and not output_attentions:
607
+ if AttentionMaskConverter._ignore_causal_mask_sdpa(
608
+ attention_mask,
609
+ inputs_embeds=input_tensor,
610
+ past_key_values_length=past_seen_tokens,
611
+ is_training=self.training,
612
+ ):
613
+ return None
614
+
615
+ dtype, device = input_tensor.dtype, input_tensor.device
616
+ sequence_length = input_tensor.shape[1]
617
+ if using_static_cache:
618
+ target_length = past_key_values.get_max_cache_shape()
619
+ else:
620
+ target_length = (
621
+ attention_mask.shape[-1]
622
+ if isinstance(attention_mask, torch.Tensor)
623
+ else past_seen_tokens + sequence_length + 1
624
+ )
625
+
626
+ # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
627
+ causal_mask = self._prepare_4d_causal_attention_mask_with_cache_position(
628
+ attention_mask,
629
+ sequence_length=sequence_length,
630
+ target_length=target_length,
631
+ dtype=dtype,
632
+ device=device,
633
+ cache_position=cache_position,
634
+ batch_size=input_tensor.shape[0],
635
+ )
636
+
637
+ if (
638
+ self.config._attn_implementation == "sdpa"
639
+ and attention_mask is not None
640
+ and attention_mask.device.type in ["cuda", "xpu"]
641
+ and not output_attentions
642
+ ):
643
+ # Attend to all tokens in fully masked rows in the causal_mask, for example the relevant first rows when
644
+ # using left padding. This is required by F.scaled_dot_product_attention memory-efficient attention path.
645
+ # Details: https://github.com/pytorch/pytorch/issues/110213
646
+ min_dtype = torch.finfo(dtype).min
647
+ causal_mask = AttentionMaskConverter._unmask_unattended(causal_mask, min_dtype)
648
+
649
+ return causal_mask
650
+
651
+ @staticmethod
652
+ def _prepare_4d_causal_attention_mask_with_cache_position(
653
+ attention_mask: torch.Tensor,
654
+ sequence_length: int,
655
+ target_length: int,
656
+ dtype: torch.dtype,
657
+ device: torch.device,
658
+ cache_position: torch.Tensor,
659
+ batch_size: int,
660
+ **kwargs,
661
+ ):
662
+ """
663
+ Creates a causal 4D mask of shape `(batch_size, 1, query_length, key_value_length)` from a 2D mask of shape
664
+ `(batch_size, key_value_length)`, or if the input `attention_mask` is already 4D, do nothing.
665
+
666
+ Args:
667
+ attention_mask (`torch.Tensor`):
668
+ A 2D attention mask of shape `(batch_size, key_value_length)` or a 4D attention mask of shape
669
+ `(batch_size, 1, query_length, key_value_length)`.
670
+ sequence_length (`int`):
671
+ The sequence length being processed.
672
+ target_length (`int`):
673
+ The target length: when generating with static cache, the mask should be as long as the static cache,
674
+ to account for the 0 padding, the part of the cache that is not filled yet.
675
+ dtype (`torch.dtype`):
676
+ The dtype to use for the 4D attention mask.
677
+ device (`torch.device`):
678
+ The device to plcae the 4D attention mask on.
679
+ cache_position (`torch.Tensor`):
680
+ Indices depicting the position of the input sequence tokens in the sequence.
681
+ batch_size (`torch.Tensor`):
682
+ Batch size.
683
+ """
684
+ if attention_mask is not None and attention_mask.dim() == 4:
685
+ # In this case we assume that the mask comes already in inverted form and requires no inversion or slicing.
686
+ causal_mask = attention_mask
687
+ else:
688
+ min_dtype = torch.finfo(dtype).min
689
+ causal_mask = torch.full(
690
+ (sequence_length, target_length), fill_value=min_dtype, dtype=dtype, device=device
691
+ )
692
+ if sequence_length != 1:
693
+ causal_mask = torch.triu(causal_mask, diagonal=1)
694
+ causal_mask *= torch.arange(target_length, device=device) > cache_position.reshape(-1, 1)
695
+ causal_mask = causal_mask[None, None, :, :].expand(batch_size, 1, -1, -1)
696
+ if attention_mask is not None:
697
+ causal_mask = causal_mask.clone() # copy to contiguous memory for in-place edit
698
+ mask_length = attention_mask.shape[-1]
699
+ padding_mask = causal_mask[:, :, :, :mask_length] + attention_mask[:, None, None, :].to(
700
+ causal_mask.device
701
+ )
702
+ padding_mask = padding_mask == 0
703
+ causal_mask[:, :, :, :mask_length] = causal_mask[:, :, :, :mask_length].masked_fill(
704
+ padding_mask, min_dtype
705
+ )
706
+
707
+ return causal_mask
708
+
709
+
710
+ @add_start_docstrings(
711
+ "The EuroBert Model with a decoder head on top that is used for masked language modeling.",
712
+ EUROBERT_START_DOCSTRING,
713
+ )
714
+ class EuroBertForMaskedLM(EuroBertPreTrainedModel):
715
+ def __init__(self, config: EuroBertConfig):
716
+ super().__init__(config)
717
+ self.model = EuroBertModel(config)
718
+ self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, config.mlp_bias)
719
+ self.post_init()
720
+
721
+ @add_start_docstrings_to_model_forward(EUROBERT_INPUTS_DOCSTRING)
722
+ @add_code_sample_docstrings(
723
+ checkpoint=_CHECKPOINT_FOR_DOC,
724
+ output_type=BaseModelOutput,
725
+ config_class=_CONFIG_FOR_DOC,
726
+ )
727
+ def forward(
728
+ self,
729
+ input_ids: Optional[torch.LongTensor] = None,
730
+ attention_mask: Optional[torch.Tensor] = None,
731
+ position_ids: Optional[torch.LongTensor] = None,
732
+ inputs_embeds: Optional[torch.FloatTensor] = None,
733
+ labels: Optional[torch.LongTensor] = None,
734
+ output_attentions: Optional[bool] = None,
735
+ output_hidden_states: Optional[bool] = None,
736
+ return_dict: Optional[bool] = None,
737
+ ) -> Union[Tuple[torch.Tensor], MaskedLMOutput]:
738
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
739
+
740
+ encoder_output = self.model(
741
+ input_ids,
742
+ attention_mask=attention_mask,
743
+ position_ids=position_ids,
744
+ inputs_embeds=inputs_embeds,
745
+ output_attentions=output_attentions,
746
+ output_hidden_states=output_hidden_states,
747
+ return_dict=return_dict,
748
+ )
749
+
750
+ prediction_scores = self.lm_head(encoder_output[0])
751
+ masked_lm_loss = None
752
+ if labels is not None:
753
+ labels = labels.to(prediction_scores.device)
754
+ masked_lm_loss = self.loss_function(prediction_scores, labels, vocab_size=self.config.vocab_size)
755
+
756
+ if not return_dict:
757
+ output = (prediction_scores,) + encoder_output[1:]
758
+ return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output
759
+
760
+ return MaskedLMOutput(
761
+ loss=masked_lm_loss,
762
+ logits=prediction_scores,
763
+ hidden_states=encoder_output.hidden_states,
764
+ attentions=encoder_output.attentions,
765
+ )
766
+
767
+
768
+ @add_start_docstrings(
769
+ "The EuroBert Model with a sequence classification head on top that performs pooling.",
770
+ EUROBERT_START_DOCSTRING,
771
+ )
772
+ class EuroBertForSequenceClassification(EuroBertPreTrainedModel):
773
+ def __init__(self, config: EuroBertConfig):
774
+ super().__init__(config)
775
+ self.num_labels = config.num_labels
776
+ self.clf_pooling = config.clf_pooling
777
+
778
+ self.model = EuroBertModel(config)
779
+ self.dense = nn.Linear(config.hidden_size, config.hidden_size)
780
+ self.activation = nn.GELU()
781
+ self.classifier = nn.Linear(config.hidden_size, self.num_labels)
782
+ self.post_init()
783
+
784
+ @add_start_docstrings_to_model_forward(EUROBERT_INPUTS_DOCSTRING)
785
+ @add_code_sample_docstrings(
786
+ checkpoint=_CHECKPOINT_FOR_DOC,
787
+ output_type=BaseModelOutput,
788
+ config_class=_CONFIG_FOR_DOC,
789
+ )
790
+ def forward(
791
+ self,
792
+ input_ids: Optional[torch.LongTensor] = None,
793
+ attention_mask: Optional[torch.Tensor] = None,
794
+ position_ids: Optional[torch.LongTensor] = None,
795
+ inputs_embeds: Optional[torch.FloatTensor] = None,
796
+ labels: Optional[torch.LongTensor] = None,
797
+ output_attentions: Optional[bool] = None,
798
+ output_hidden_states: Optional[bool] = None,
799
+ return_dict: Optional[bool] = None,
800
+ ) -> Union[Tuple[torch.Tensor], SequenceClassifierOutput]:
801
+ r"""
802
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
803
+ Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
804
+ config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
805
+ `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
806
+ """
807
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
808
+
809
+ encoder_output = self.model(
810
+ input_ids,
811
+ attention_mask=attention_mask,
812
+ position_ids=position_ids,
813
+ inputs_embeds=inputs_embeds,
814
+ output_attentions=output_attentions,
815
+ output_hidden_states=output_hidden_states,
816
+ return_dict=return_dict,
817
+ )
818
+ last_hidden_state = encoder_output[0]
819
+
820
+ if self.clf_pooling in ["bos", "mean"]:
821
+ if self.clf_pooling == "bos":
822
+ pooled_output = last_hidden_state[:, 0]
823
+
824
+ elif self.clf_pooling == "mean":
825
+ if attention_mask is None:
826
+ pooled_output = last_hidden_state.mean(dim=1)
827
+ else:
828
+ pooled_output = (last_hidden_state * attention_mask.unsqueeze(-1)).sum(dim=1)
829
+ pooled_output /= attention_mask.sum(dim=1, keepdim=True)
830
+
831
+ pooled_output = self.dense(pooled_output)
832
+ pooled_output = self.activation(pooled_output)
833
+ logits = self.classifier(pooled_output)
834
+
835
+ elif self.clf_pooling == "late":
836
+ x = self.dense(last_hidden_state)
837
+ x = self.activation(x)
838
+ logits = self.classifier(x)
839
+ if attention_mask is None:
840
+ logits = logits.mean(dim=1)
841
+ else:
842
+ logits = (logits * attention_mask.unsqueeze(-1)).sum(dim=1)
843
+ logits /= attention_mask.sum(dim=1, keepdim=True)
844
+
845
+ loss = None
846
+ if labels is not None:
847
+ labels = labels.to(logits.device)
848
+ if self.config.problem_type is None:
849
+ if self.num_labels == 1:
850
+ self.config.problem_type = "regression"
851
+ elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
852
+ self.config.problem_type = "single_label_classification"
853
+ else:
854
+ self.config.problem_type = "multi_label_classification"
855
+
856
+ if self.config.problem_type == "regression":
857
+ loss_fct = MSELoss()
858
+ if self.num_labels == 1:
859
+ loss = loss_fct(logits.squeeze(), labels.squeeze())
860
+ else:
861
+ loss = loss_fct(logits, labels)
862
+ elif self.config.problem_type == "single_label_classification":
863
+ loss_fct = CrossEntropyLoss()
864
+ loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
865
+ elif self.config.problem_type == "multi_label_classification":
866
+ loss_fct = BCEWithLogitsLoss()
867
+ loss = loss_fct(logits, labels)
868
+
869
+ if not return_dict:
870
+ output = (logits,) + encoder_output[1:]
871
+ return ((loss,) + output) if loss is not None else output
872
+
873
+ return SequenceClassifierOutput(
874
+ loss=loss,
875
+ logits=logits,
876
+ hidden_states=encoder_output.hidden_states,
877
+ attentions=encoder_output.attentions,
878
+ )
879
+
880
+
881
+ @add_start_docstrings(
882
+ """
883
+ The EuroBert Model with a token classification head on top (a linear layer on top of the hidden-states
884
+ output) e.g. for Named-Entity-Recognition (NER) tasks."
885
+ """,
886
+ EUROBERT_START_DOCSTRING,
887
+ )
888
+ class EuroBertForTokenClassification(EuroBertPreTrainedModel):
889
+ def __init__(self, config: EuroBertConfig):
890
+ super().__init__(config)
891
+ self.num_labels = config.num_labels
892
+ self.model = EuroBertModel(config)
893
+
894
+ self.classifier = nn.Linear(config.hidden_size, config.num_labels)
895
+ self.post_init()
896
+
897
+ def get_input_embeddings(self):
898
+ return self.model.embed_tokens
899
+
900
+ def set_input_embeddings(self, value):
901
+ self.model.embed_tokens = value
902
+
903
+ @add_start_docstrings_to_model_forward(EUROBERT_INPUTS_DOCSTRING)
904
+ def forward(
905
+ self,
906
+ input_ids: Optional[torch.LongTensor] = None,
907
+ attention_mask: Optional[torch.Tensor] = None,
908
+ position_ids: Optional[torch.LongTensor] = None,
909
+ inputs_embeds: Optional[torch.FloatTensor] = None,
910
+ labels: Optional[torch.LongTensor] = None,
911
+ use_cache: Optional[bool] = None,
912
+ output_attentions: Optional[bool] = None,
913
+ output_hidden_states: Optional[bool] = None,
914
+ return_dict: Optional[bool] = None,
915
+ ) -> Union[Tuple, TokenClassifierOutput]:
916
+ r"""
917
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
918
+ Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
919
+ config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
920
+ `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
921
+ """
922
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
923
+
924
+ outputs = self.model(
925
+ input_ids,
926
+ attention_mask=attention_mask,
927
+ position_ids=position_ids,
928
+ inputs_embeds=inputs_embeds,
929
+ use_cache=use_cache,
930
+ output_attentions=output_attentions,
931
+ output_hidden_states=output_hidden_states,
932
+ return_dict=return_dict,
933
+ )
934
+ sequence_output = outputs[0]
935
+ logits = self.classifier(sequence_output)
936
+
937
+ loss = None
938
+ if labels is not None:
939
+ loss_fct = CrossEntropyLoss()
940
+ loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
941
+
942
+ if not return_dict:
943
+ output = (logits,) + outputs[2:]
944
+ return ((loss,) + output) if loss is not None else output
945
+
946
+ return TokenClassifierOutput(
947
+ loss=loss,
948
+ logits=logits,
949
+ hidden_states=outputs.hidden_states,
950
+ attentions=outputs.attentions,
951
+ )
952
+
953
+
954
+ @add_start_docstrings(
955
+ """
956
+ The EuroBert Model with a span classification head on top for extractive question-answering tasks
957
+ like SQuAD (a linear layers on top of the hidden-states output to compute span start logits
958
+ and span end logits).
959
+ """,
960
+ EUROBERT_START_DOCSTRING,
961
+ )
962
+ class EuroBertForQuestionAnswering(EuroBertPreTrainedModel):
963
+ def __init__(self, config: EuroBertConfig):
964
+ super().__init__(config)
965
+ self.num_labels = config.num_labels
966
+ self.model = EuroBertModel(config)
967
+
968
+ self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)
969
+ self.post_init()
970
+
971
+ def get_input_embeddings(self):
972
+ return self.model.embed_tokens
973
+
974
+ def set_input_embeddings(self, value):
975
+ self.model.embed_tokens = value
976
+
977
+ @add_start_docstrings_to_model_forward(EUROBERT_INPUTS_DOCSTRING)
978
+ def forward(
979
+ self,
980
+ input_ids: Optional[torch.Tensor] = None,
981
+ attention_mask: Optional[torch.Tensor] = None,
982
+ position_ids: Optional[torch.Tensor] = None,
983
+ inputs_embeds: Optional[torch.Tensor] = None,
984
+ use_cache: Optional[bool] = None,
985
+ start_positions: Optional[torch.Tensor] = None,
986
+ end_positions: Optional[torch.Tensor] = None,
987
+ output_attentions: Optional[bool] = None,
988
+ output_hidden_states: Optional[bool] = None,
989
+ return_dict: Optional[bool] = None,
990
+ ) -> Union[Tuple[torch.Tensor], QuestionAnsweringModelOutput]:
991
+ r"""
992
+ start_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
993
+ Labels for position (index) of the start of the labelled span for computing the token classification loss.
994
+ Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
995
+ are not taken into account for computing the loss.
996
+ end_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
997
+ Labels for position (index) of the end of the labelled span for computing the token classification loss.
998
+ Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
999
+ are not taken into account for computing the loss.
1000
+ """
1001
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1002
+
1003
+ outputs = self.model(
1004
+ input_ids,
1005
+ attention_mask=attention_mask,
1006
+ position_ids=position_ids,
1007
+ inputs_embeds=inputs_embeds,
1008
+ use_cache=use_cache,
1009
+ output_attentions=output_attentions,
1010
+ output_hidden_states=output_hidden_states,
1011
+ return_dict=return_dict,
1012
+ )
1013
+ sequence_output = outputs[0]
1014
+
1015
+ logits = self.qa_outputs(sequence_output)
1016
+ start_logits, end_logits = logits.split(1, dim=-1)
1017
+ start_logits = start_logits.squeeze(-1).contiguous()
1018
+ end_logits = end_logits.squeeze(-1).contiguous()
1019
+
1020
+ total_loss = None
1021
+ if start_positions is not None and end_positions is not None:
1022
+ # If we are on multi-GPU, split add a dimension
1023
+ if len(start_positions.size()) > 1:
1024
+ start_positions = start_positions.squeeze(-1)
1025
+ if len(end_positions.size()) > 1:
1026
+ end_positions = end_positions.squeeze(-1)
1027
+ # sometimes the start/end positions are outside our model inputs, we ignore these terms
1028
+ ignored_index = start_logits.size(1)
1029
+ start_positions = start_positions.clamp(0, ignored_index)
1030
+ end_positions = end_positions.clamp(0, ignored_index)
1031
+
1032
+ loss_fct = CrossEntropyLoss(ignore_index=ignored_index)
1033
+ start_loss = loss_fct(start_logits, start_positions)
1034
+ end_loss = loss_fct(end_logits, end_positions)
1035
+ total_loss = (start_loss + end_loss) / 2
1036
+
1037
+ if not return_dict:
1038
+ output = (start_logits, end_logits) + outputs[2:]
1039
+ return ((total_loss,) + output) if total_loss is not None else output
1040
+
1041
+ return QuestionAnsweringModelOutput(
1042
+ loss=total_loss,
1043
+ start_logits=start_logits,
1044
+ end_logits=end_logits,
1045
+ hidden_states=outputs.hidden_states,
1046
+ attentions=outputs.attentions,
1047
+ )
1048
+
1049
+
1050
+ __all__ = [
1051
+ "EuroBertPreTrainedModel",
1052
+ "EuroBertModel",
1053
+ "EuroBertForMaskedLM",
1054
+ "EuroBertForSequenceClassification",
1055
+ "EuroBertForTokenClassification",
1056
+ "EuroBertForQuestionAnswering",
1057
+ ]
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|begin_of_text|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|end_of_text|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "mask_token": {
17
+ "content": "<|mask|>",
18
+ "lstrip": true,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "pad_token": {
24
+ "content": "<|pad|>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c0ba502691c6f7f1f01ad71c04bdcb7dee39f997e85ae01e831eab91b09c7e1b
3
+ size 17210334
tokenizer_config.json ADDED
@@ -0,0 +1,2071 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "128000": {
4
+ "content": "<|begin_of_text|>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "128001": {
12
+ "content": "<|end_of_text|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "128002": {
20
+ "content": "<|mask|>",
21
+ "lstrip": true,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "128003": {
28
+ "content": "<|parallel_sep|>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128004": {
36
+ "content": "<|pad|>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "128005": {
44
+ "content": "<|reserved_special_token_2|>",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ },
51
+ "128006": {
52
+ "content": "<|start_header_id|>",
53
+ "lstrip": false,
54
+ "normalized": false,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": true
58
+ },
59
+ "128007": {
60
+ "content": "<|end_header_id|>",
61
+ "lstrip": false,
62
+ "normalized": false,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": true
66
+ },
67
+ "128008": {
68
+ "content": "<|eom_id|>",
69
+ "lstrip": false,
70
+ "normalized": false,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": true
74
+ },
75
+ "128009": {
76
+ "content": "<|eot_id|>",
77
+ "lstrip": false,
78
+ "normalized": false,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": true
82
+ },
83
+ "128010": {
84
+ "content": "<|python_tag|>",
85
+ "lstrip": false,
86
+ "normalized": false,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": true
90
+ },
91
+ "128011": {
92
+ "content": "<|reserved_special_token_3|>",
93
+ "lstrip": false,
94
+ "normalized": false,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": true
98
+ },
99
+ "128012": {
100
+ "content": "<|reserved_special_token_4|>",
101
+ "lstrip": false,
102
+ "normalized": false,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": true
106
+ },
107
+ "128013": {
108
+ "content": "<|reserved_special_token_5|>",
109
+ "lstrip": false,
110
+ "normalized": false,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": true
114
+ },
115
+ "128014": {
116
+ "content": "<|reserved_special_token_6|>",
117
+ "lstrip": false,
118
+ "normalized": false,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": true
122
+ },
123
+ "128015": {
124
+ "content": "<|reserved_special_token_7|>",
125
+ "lstrip": false,
126
+ "normalized": false,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": true
130
+ },
131
+ "128016": {
132
+ "content": "<|reserved_special_token_8|>",
133
+ "lstrip": false,
134
+ "normalized": false,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": true
138
+ },
139
+ "128017": {
140
+ "content": "<|reserved_special_token_9|>",
141
+ "lstrip": false,
142
+ "normalized": false,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": true
146
+ },
147
+ "128018": {
148
+ "content": "<|reserved_special_token_10|>",
149
+ "lstrip": false,
150
+ "normalized": false,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": true
154
+ },
155
+ "128019": {
156
+ "content": "<|reserved_special_token_11|>",
157
+ "lstrip": false,
158
+ "normalized": false,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": true
162
+ },
163
+ "128020": {
164
+ "content": "<|reserved_special_token_12|>",
165
+ "lstrip": false,
166
+ "normalized": false,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": true
170
+ },
171
+ "128021": {
172
+ "content": "<|reserved_special_token_13|>",
173
+ "lstrip": false,
174
+ "normalized": false,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": true
178
+ },
179
+ "128022": {
180
+ "content": "<|reserved_special_token_14|>",
181
+ "lstrip": false,
182
+ "normalized": false,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": true
186
+ },
187
+ "128023": {
188
+ "content": "<|reserved_special_token_15|>",
189
+ "lstrip": false,
190
+ "normalized": false,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": true
194
+ },
195
+ "128024": {
196
+ "content": "<|reserved_special_token_16|>",
197
+ "lstrip": false,
198
+ "normalized": false,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": true
202
+ },
203
+ "128025": {
204
+ "content": "<|reserved_special_token_17|>",
205
+ "lstrip": false,
206
+ "normalized": false,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": true
210
+ },
211
+ "128026": {
212
+ "content": "<|reserved_special_token_18|>",
213
+ "lstrip": false,
214
+ "normalized": false,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": true
218
+ },
219
+ "128027": {
220
+ "content": "<|reserved_special_token_19|>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "128028": {
228
+ "content": "<|reserved_special_token_20|>",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "128029": {
236
+ "content": "<|reserved_special_token_21|>",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "128030": {
244
+ "content": "<|reserved_special_token_22|>",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "128031": {
252
+ "content": "<|reserved_special_token_23|>",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "128032": {
260
+ "content": "<|reserved_special_token_24|>",
261
+ "lstrip": false,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "128033": {
268
+ "content": "<|reserved_special_token_25|>",
269
+ "lstrip": false,
270
+ "normalized": false,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": true
274
+ },
275
+ "128034": {
276
+ "content": "<|reserved_special_token_26|>",
277
+ "lstrip": false,
278
+ "normalized": false,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": true
282
+ },
283
+ "128035": {
284
+ "content": "<|reserved_special_token_27|>",
285
+ "lstrip": false,
286
+ "normalized": false,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": true
290
+ },
291
+ "128036": {
292
+ "content": "<|reserved_special_token_28|>",
293
+ "lstrip": false,
294
+ "normalized": false,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": true
298
+ },
299
+ "128037": {
300
+ "content": "<|reserved_special_token_29|>",
301
+ "lstrip": false,
302
+ "normalized": false,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": true
306
+ },
307
+ "128038": {
308
+ "content": "<|reserved_special_token_30|>",
309
+ "lstrip": false,
310
+ "normalized": false,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": true
314
+ },
315
+ "128039": {
316
+ "content": "<|reserved_special_token_31|>",
317
+ "lstrip": false,
318
+ "normalized": false,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": true
322
+ },
323
+ "128040": {
324
+ "content": "<|reserved_special_token_32|>",
325
+ "lstrip": false,
326
+ "normalized": false,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": true
330
+ },
331
+ "128041": {
332
+ "content": "<|reserved_special_token_33|>",
333
+ "lstrip": false,
334
+ "normalized": false,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": true
338
+ },
339
+ "128042": {
340
+ "content": "<|reserved_special_token_34|>",
341
+ "lstrip": false,
342
+ "normalized": false,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": true
346
+ },
347
+ "128043": {
348
+ "content": "<|reserved_special_token_35|>",
349
+ "lstrip": false,
350
+ "normalized": false,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": true
354
+ },
355
+ "128044": {
356
+ "content": "<|reserved_special_token_36|>",
357
+ "lstrip": false,
358
+ "normalized": false,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": true
362
+ },
363
+ "128045": {
364
+ "content": "<|reserved_special_token_37|>",
365
+ "lstrip": false,
366
+ "normalized": false,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": true
370
+ },
371
+ "128046": {
372
+ "content": "<|reserved_special_token_38|>",
373
+ "lstrip": false,
374
+ "normalized": false,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": true
378
+ },
379
+ "128047": {
380
+ "content": "<|reserved_special_token_39|>",
381
+ "lstrip": false,
382
+ "normalized": false,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": true
386
+ },
387
+ "128048": {
388
+ "content": "<|reserved_special_token_40|>",
389
+ "lstrip": false,
390
+ "normalized": false,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": true
394
+ },
395
+ "128049": {
396
+ "content": "<|reserved_special_token_41|>",
397
+ "lstrip": false,
398
+ "normalized": false,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": true
402
+ },
403
+ "128050": {
404
+ "content": "<|reserved_special_token_42|>",
405
+ "lstrip": false,
406
+ "normalized": false,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": true
410
+ },
411
+ "128051": {
412
+ "content": "<|reserved_special_token_43|>",
413
+ "lstrip": false,
414
+ "normalized": false,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": true
418
+ },
419
+ "128052": {
420
+ "content": "<|reserved_special_token_44|>",
421
+ "lstrip": false,
422
+ "normalized": false,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": true
426
+ },
427
+ "128053": {
428
+ "content": "<|reserved_special_token_45|>",
429
+ "lstrip": false,
430
+ "normalized": false,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": true
434
+ },
435
+ "128054": {
436
+ "content": "<|reserved_special_token_46|>",
437
+ "lstrip": false,
438
+ "normalized": false,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": true
442
+ },
443
+ "128055": {
444
+ "content": "<|reserved_special_token_47|>",
445
+ "lstrip": false,
446
+ "normalized": false,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": true
450
+ },
451
+ "128056": {
452
+ "content": "<|reserved_special_token_48|>",
453
+ "lstrip": false,
454
+ "normalized": false,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": true
458
+ },
459
+ "128057": {
460
+ "content": "<|reserved_special_token_49|>",
461
+ "lstrip": false,
462
+ "normalized": false,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": true
466
+ },
467
+ "128058": {
468
+ "content": "<|reserved_special_token_50|>",
469
+ "lstrip": false,
470
+ "normalized": false,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": true
474
+ },
475
+ "128059": {
476
+ "content": "<|reserved_special_token_51|>",
477
+ "lstrip": false,
478
+ "normalized": false,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": true
482
+ },
483
+ "128060": {
484
+ "content": "<|reserved_special_token_52|>",
485
+ "lstrip": false,
486
+ "normalized": false,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": true
490
+ },
491
+ "128061": {
492
+ "content": "<|reserved_special_token_53|>",
493
+ "lstrip": false,
494
+ "normalized": false,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": true
498
+ },
499
+ "128062": {
500
+ "content": "<|reserved_special_token_54|>",
501
+ "lstrip": false,
502
+ "normalized": false,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": true
506
+ },
507
+ "128063": {
508
+ "content": "<|reserved_special_token_55|>",
509
+ "lstrip": false,
510
+ "normalized": false,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": true
514
+ },
515
+ "128064": {
516
+ "content": "<|reserved_special_token_56|>",
517
+ "lstrip": false,
518
+ "normalized": false,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": true
522
+ },
523
+ "128065": {
524
+ "content": "<|reserved_special_token_57|>",
525
+ "lstrip": false,
526
+ "normalized": false,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": true
530
+ },
531
+ "128066": {
532
+ "content": "<|reserved_special_token_58|>",
533
+ "lstrip": false,
534
+ "normalized": false,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": true
538
+ },
539
+ "128067": {
540
+ "content": "<|reserved_special_token_59|>",
541
+ "lstrip": false,
542
+ "normalized": false,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": true
546
+ },
547
+ "128068": {
548
+ "content": "<|reserved_special_token_60|>",
549
+ "lstrip": false,
550
+ "normalized": false,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": true
554
+ },
555
+ "128069": {
556
+ "content": "<|reserved_special_token_61|>",
557
+ "lstrip": false,
558
+ "normalized": false,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": true
562
+ },
563
+ "128070": {
564
+ "content": "<|reserved_special_token_62|>",
565
+ "lstrip": false,
566
+ "normalized": false,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": true
570
+ },
571
+ "128071": {
572
+ "content": "<|reserved_special_token_63|>",
573
+ "lstrip": false,
574
+ "normalized": false,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": true
578
+ },
579
+ "128072": {
580
+ "content": "<|reserved_special_token_64|>",
581
+ "lstrip": false,
582
+ "normalized": false,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": true
586
+ },
587
+ "128073": {
588
+ "content": "<|reserved_special_token_65|>",
589
+ "lstrip": false,
590
+ "normalized": false,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": true
594
+ },
595
+ "128074": {
596
+ "content": "<|reserved_special_token_66|>",
597
+ "lstrip": false,
598
+ "normalized": false,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": true
602
+ },
603
+ "128075": {
604
+ "content": "<|reserved_special_token_67|>",
605
+ "lstrip": false,
606
+ "normalized": false,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": true
610
+ },
611
+ "128076": {
612
+ "content": "<|reserved_special_token_68|>",
613
+ "lstrip": false,
614
+ "normalized": false,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": true
618
+ },
619
+ "128077": {
620
+ "content": "<|reserved_special_token_69|>",
621
+ "lstrip": false,
622
+ "normalized": false,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": true
626
+ },
627
+ "128078": {
628
+ "content": "<|reserved_special_token_70|>",
629
+ "lstrip": false,
630
+ "normalized": false,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": true
634
+ },
635
+ "128079": {
636
+ "content": "<|reserved_special_token_71|>",
637
+ "lstrip": false,
638
+ "normalized": false,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": true
642
+ },
643
+ "128080": {
644
+ "content": "<|reserved_special_token_72|>",
645
+ "lstrip": false,
646
+ "normalized": false,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": true
650
+ },
651
+ "128081": {
652
+ "content": "<|reserved_special_token_73|>",
653
+ "lstrip": false,
654
+ "normalized": false,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": true
658
+ },
659
+ "128082": {
660
+ "content": "<|reserved_special_token_74|>",
661
+ "lstrip": false,
662
+ "normalized": false,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": true
666
+ },
667
+ "128083": {
668
+ "content": "<|reserved_special_token_75|>",
669
+ "lstrip": false,
670
+ "normalized": false,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": true
674
+ },
675
+ "128084": {
676
+ "content": "<|reserved_special_token_76|>",
677
+ "lstrip": false,
678
+ "normalized": false,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": true
682
+ },
683
+ "128085": {
684
+ "content": "<|reserved_special_token_77|>",
685
+ "lstrip": false,
686
+ "normalized": false,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": true
690
+ },
691
+ "128086": {
692
+ "content": "<|reserved_special_token_78|>",
693
+ "lstrip": false,
694
+ "normalized": false,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": true
698
+ },
699
+ "128087": {
700
+ "content": "<|reserved_special_token_79|>",
701
+ "lstrip": false,
702
+ "normalized": false,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": true
706
+ },
707
+ "128088": {
708
+ "content": "<|reserved_special_token_80|>",
709
+ "lstrip": false,
710
+ "normalized": false,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": true
714
+ },
715
+ "128089": {
716
+ "content": "<|reserved_special_token_81|>",
717
+ "lstrip": false,
718
+ "normalized": false,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": true
722
+ },
723
+ "128090": {
724
+ "content": "<|reserved_special_token_82|>",
725
+ "lstrip": false,
726
+ "normalized": false,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": true
730
+ },
731
+ "128091": {
732
+ "content": "<|reserved_special_token_83|>",
733
+ "lstrip": false,
734
+ "normalized": false,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": true
738
+ },
739
+ "128092": {
740
+ "content": "<|reserved_special_token_84|>",
741
+ "lstrip": false,
742
+ "normalized": false,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": true
746
+ },
747
+ "128093": {
748
+ "content": "<|reserved_special_token_85|>",
749
+ "lstrip": false,
750
+ "normalized": false,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": true
754
+ },
755
+ "128094": {
756
+ "content": "<|reserved_special_token_86|>",
757
+ "lstrip": false,
758
+ "normalized": false,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": true
762
+ },
763
+ "128095": {
764
+ "content": "<|reserved_special_token_87|>",
765
+ "lstrip": false,
766
+ "normalized": false,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": true
770
+ },
771
+ "128096": {
772
+ "content": "<|reserved_special_token_88|>",
773
+ "lstrip": false,
774
+ "normalized": false,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": true
778
+ },
779
+ "128097": {
780
+ "content": "<|reserved_special_token_89|>",
781
+ "lstrip": false,
782
+ "normalized": false,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": true
786
+ },
787
+ "128098": {
788
+ "content": "<|reserved_special_token_90|>",
789
+ "lstrip": false,
790
+ "normalized": false,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": true
794
+ },
795
+ "128099": {
796
+ "content": "<|reserved_special_token_91|>",
797
+ "lstrip": false,
798
+ "normalized": false,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": true
802
+ },
803
+ "128100": {
804
+ "content": "<|reserved_special_token_92|>",
805
+ "lstrip": false,
806
+ "normalized": false,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": true
810
+ },
811
+ "128101": {
812
+ "content": "<|reserved_special_token_93|>",
813
+ "lstrip": false,
814
+ "normalized": false,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": true
818
+ },
819
+ "128102": {
820
+ "content": "<|reserved_special_token_94|>",
821
+ "lstrip": false,
822
+ "normalized": false,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": true
826
+ },
827
+ "128103": {
828
+ "content": "<|reserved_special_token_95|>",
829
+ "lstrip": false,
830
+ "normalized": false,
831
+ "rstrip": false,
832
+ "single_word": false,
833
+ "special": true
834
+ },
835
+ "128104": {
836
+ "content": "<|reserved_special_token_96|>",
837
+ "lstrip": false,
838
+ "normalized": false,
839
+ "rstrip": false,
840
+ "single_word": false,
841
+ "special": true
842
+ },
843
+ "128105": {
844
+ "content": "<|reserved_special_token_97|>",
845
+ "lstrip": false,
846
+ "normalized": false,
847
+ "rstrip": false,
848
+ "single_word": false,
849
+ "special": true
850
+ },
851
+ "128106": {
852
+ "content": "<|reserved_special_token_98|>",
853
+ "lstrip": false,
854
+ "normalized": false,
855
+ "rstrip": false,
856
+ "single_word": false,
857
+ "special": true
858
+ },
859
+ "128107": {
860
+ "content": "<|reserved_special_token_99|>",
861
+ "lstrip": false,
862
+ "normalized": false,
863
+ "rstrip": false,
864
+ "single_word": false,
865
+ "special": true
866
+ },
867
+ "128108": {
868
+ "content": "<|reserved_special_token_100|>",
869
+ "lstrip": false,
870
+ "normalized": false,
871
+ "rstrip": false,
872
+ "single_word": false,
873
+ "special": true
874
+ },
875
+ "128109": {
876
+ "content": "<|reserved_special_token_101|>",
877
+ "lstrip": false,
878
+ "normalized": false,
879
+ "rstrip": false,
880
+ "single_word": false,
881
+ "special": true
882
+ },
883
+ "128110": {
884
+ "content": "<|reserved_special_token_102|>",
885
+ "lstrip": false,
886
+ "normalized": false,
887
+ "rstrip": false,
888
+ "single_word": false,
889
+ "special": true
890
+ },
891
+ "128111": {
892
+ "content": "<|reserved_special_token_103|>",
893
+ "lstrip": false,
894
+ "normalized": false,
895
+ "rstrip": false,
896
+ "single_word": false,
897
+ "special": true
898
+ },
899
+ "128112": {
900
+ "content": "<|reserved_special_token_104|>",
901
+ "lstrip": false,
902
+ "normalized": false,
903
+ "rstrip": false,
904
+ "single_word": false,
905
+ "special": true
906
+ },
907
+ "128113": {
908
+ "content": "<|reserved_special_token_105|>",
909
+ "lstrip": false,
910
+ "normalized": false,
911
+ "rstrip": false,
912
+ "single_word": false,
913
+ "special": true
914
+ },
915
+ "128114": {
916
+ "content": "<|reserved_special_token_106|>",
917
+ "lstrip": false,
918
+ "normalized": false,
919
+ "rstrip": false,
920
+ "single_word": false,
921
+ "special": true
922
+ },
923
+ "128115": {
924
+ "content": "<|reserved_special_token_107|>",
925
+ "lstrip": false,
926
+ "normalized": false,
927
+ "rstrip": false,
928
+ "single_word": false,
929
+ "special": true
930
+ },
931
+ "128116": {
932
+ "content": "<|reserved_special_token_108|>",
933
+ "lstrip": false,
934
+ "normalized": false,
935
+ "rstrip": false,
936
+ "single_word": false,
937
+ "special": true
938
+ },
939
+ "128117": {
940
+ "content": "<|reserved_special_token_109|>",
941
+ "lstrip": false,
942
+ "normalized": false,
943
+ "rstrip": false,
944
+ "single_word": false,
945
+ "special": true
946
+ },
947
+ "128118": {
948
+ "content": "<|reserved_special_token_110|>",
949
+ "lstrip": false,
950
+ "normalized": false,
951
+ "rstrip": false,
952
+ "single_word": false,
953
+ "special": true
954
+ },
955
+ "128119": {
956
+ "content": "<|reserved_special_token_111|>",
957
+ "lstrip": false,
958
+ "normalized": false,
959
+ "rstrip": false,
960
+ "single_word": false,
961
+ "special": true
962
+ },
963
+ "128120": {
964
+ "content": "<|reserved_special_token_112|>",
965
+ "lstrip": false,
966
+ "normalized": false,
967
+ "rstrip": false,
968
+ "single_word": false,
969
+ "special": true
970
+ },
971
+ "128121": {
972
+ "content": "<|reserved_special_token_113|>",
973
+ "lstrip": false,
974
+ "normalized": false,
975
+ "rstrip": false,
976
+ "single_word": false,
977
+ "special": true
978
+ },
979
+ "128122": {
980
+ "content": "<|reserved_special_token_114|>",
981
+ "lstrip": false,
982
+ "normalized": false,
983
+ "rstrip": false,
984
+ "single_word": false,
985
+ "special": true
986
+ },
987
+ "128123": {
988
+ "content": "<|reserved_special_token_115|>",
989
+ "lstrip": false,
990
+ "normalized": false,
991
+ "rstrip": false,
992
+ "single_word": false,
993
+ "special": true
994
+ },
995
+ "128124": {
996
+ "content": "<|reserved_special_token_116|>",
997
+ "lstrip": false,
998
+ "normalized": false,
999
+ "rstrip": false,
1000
+ "single_word": false,
1001
+ "special": true
1002
+ },
1003
+ "128125": {
1004
+ "content": "<|reserved_special_token_117|>",
1005
+ "lstrip": false,
1006
+ "normalized": false,
1007
+ "rstrip": false,
1008
+ "single_word": false,
1009
+ "special": true
1010
+ },
1011
+ "128126": {
1012
+ "content": "<|reserved_special_token_118|>",
1013
+ "lstrip": false,
1014
+ "normalized": false,
1015
+ "rstrip": false,
1016
+ "single_word": false,
1017
+ "special": true
1018
+ },
1019
+ "128127": {
1020
+ "content": "<|reserved_special_token_119|>",
1021
+ "lstrip": false,
1022
+ "normalized": false,
1023
+ "rstrip": false,
1024
+ "single_word": false,
1025
+ "special": true
1026
+ },
1027
+ "128128": {
1028
+ "content": "<|reserved_special_token_120|>",
1029
+ "lstrip": false,
1030
+ "normalized": false,
1031
+ "rstrip": false,
1032
+ "single_word": false,
1033
+ "special": true
1034
+ },
1035
+ "128129": {
1036
+ "content": "<|reserved_special_token_121|>",
1037
+ "lstrip": false,
1038
+ "normalized": false,
1039
+ "rstrip": false,
1040
+ "single_word": false,
1041
+ "special": true
1042
+ },
1043
+ "128130": {
1044
+ "content": "<|reserved_special_token_122|>",
1045
+ "lstrip": false,
1046
+ "normalized": false,
1047
+ "rstrip": false,
1048
+ "single_word": false,
1049
+ "special": true
1050
+ },
1051
+ "128131": {
1052
+ "content": "<|reserved_special_token_123|>",
1053
+ "lstrip": false,
1054
+ "normalized": false,
1055
+ "rstrip": false,
1056
+ "single_word": false,
1057
+ "special": true
1058
+ },
1059
+ "128132": {
1060
+ "content": "<|reserved_special_token_124|>",
1061
+ "lstrip": false,
1062
+ "normalized": false,
1063
+ "rstrip": false,
1064
+ "single_word": false,
1065
+ "special": true
1066
+ },
1067
+ "128133": {
1068
+ "content": "<|reserved_special_token_125|>",
1069
+ "lstrip": false,
1070
+ "normalized": false,
1071
+ "rstrip": false,
1072
+ "single_word": false,
1073
+ "special": true
1074
+ },
1075
+ "128134": {
1076
+ "content": "<|reserved_special_token_126|>",
1077
+ "lstrip": false,
1078
+ "normalized": false,
1079
+ "rstrip": false,
1080
+ "single_word": false,
1081
+ "special": true
1082
+ },
1083
+ "128135": {
1084
+ "content": "<|reserved_special_token_127|>",
1085
+ "lstrip": false,
1086
+ "normalized": false,
1087
+ "rstrip": false,
1088
+ "single_word": false,
1089
+ "special": true
1090
+ },
1091
+ "128136": {
1092
+ "content": "<|reserved_special_token_128|>",
1093
+ "lstrip": false,
1094
+ "normalized": false,
1095
+ "rstrip": false,
1096
+ "single_word": false,
1097
+ "special": true
1098
+ },
1099
+ "128137": {
1100
+ "content": "<|reserved_special_token_129|>",
1101
+ "lstrip": false,
1102
+ "normalized": false,
1103
+ "rstrip": false,
1104
+ "single_word": false,
1105
+ "special": true
1106
+ },
1107
+ "128138": {
1108
+ "content": "<|reserved_special_token_130|>",
1109
+ "lstrip": false,
1110
+ "normalized": false,
1111
+ "rstrip": false,
1112
+ "single_word": false,
1113
+ "special": true
1114
+ },
1115
+ "128139": {
1116
+ "content": "<|reserved_special_token_131|>",
1117
+ "lstrip": false,
1118
+ "normalized": false,
1119
+ "rstrip": false,
1120
+ "single_word": false,
1121
+ "special": true
1122
+ },
1123
+ "128140": {
1124
+ "content": "<|reserved_special_token_132|>",
1125
+ "lstrip": false,
1126
+ "normalized": false,
1127
+ "rstrip": false,
1128
+ "single_word": false,
1129
+ "special": true
1130
+ },
1131
+ "128141": {
1132
+ "content": "<|reserved_special_token_133|>",
1133
+ "lstrip": false,
1134
+ "normalized": false,
1135
+ "rstrip": false,
1136
+ "single_word": false,
1137
+ "special": true
1138
+ },
1139
+ "128142": {
1140
+ "content": "<|reserved_special_token_134|>",
1141
+ "lstrip": false,
1142
+ "normalized": false,
1143
+ "rstrip": false,
1144
+ "single_word": false,
1145
+ "special": true
1146
+ },
1147
+ "128143": {
1148
+ "content": "<|reserved_special_token_135|>",
1149
+ "lstrip": false,
1150
+ "normalized": false,
1151
+ "rstrip": false,
1152
+ "single_word": false,
1153
+ "special": true
1154
+ },
1155
+ "128144": {
1156
+ "content": "<|reserved_special_token_136|>",
1157
+ "lstrip": false,
1158
+ "normalized": false,
1159
+ "rstrip": false,
1160
+ "single_word": false,
1161
+ "special": true
1162
+ },
1163
+ "128145": {
1164
+ "content": "<|reserved_special_token_137|>",
1165
+ "lstrip": false,
1166
+ "normalized": false,
1167
+ "rstrip": false,
1168
+ "single_word": false,
1169
+ "special": true
1170
+ },
1171
+ "128146": {
1172
+ "content": "<|reserved_special_token_138|>",
1173
+ "lstrip": false,
1174
+ "normalized": false,
1175
+ "rstrip": false,
1176
+ "single_word": false,
1177
+ "special": true
1178
+ },
1179
+ "128147": {
1180
+ "content": "<|reserved_special_token_139|>",
1181
+ "lstrip": false,
1182
+ "normalized": false,
1183
+ "rstrip": false,
1184
+ "single_word": false,
1185
+ "special": true
1186
+ },
1187
+ "128148": {
1188
+ "content": "<|reserved_special_token_140|>",
1189
+ "lstrip": false,
1190
+ "normalized": false,
1191
+ "rstrip": false,
1192
+ "single_word": false,
1193
+ "special": true
1194
+ },
1195
+ "128149": {
1196
+ "content": "<|reserved_special_token_141|>",
1197
+ "lstrip": false,
1198
+ "normalized": false,
1199
+ "rstrip": false,
1200
+ "single_word": false,
1201
+ "special": true
1202
+ },
1203
+ "128150": {
1204
+ "content": "<|reserved_special_token_142|>",
1205
+ "lstrip": false,
1206
+ "normalized": false,
1207
+ "rstrip": false,
1208
+ "single_word": false,
1209
+ "special": true
1210
+ },
1211
+ "128151": {
1212
+ "content": "<|reserved_special_token_143|>",
1213
+ "lstrip": false,
1214
+ "normalized": false,
1215
+ "rstrip": false,
1216
+ "single_word": false,
1217
+ "special": true
1218
+ },
1219
+ "128152": {
1220
+ "content": "<|reserved_special_token_144|>",
1221
+ "lstrip": false,
1222
+ "normalized": false,
1223
+ "rstrip": false,
1224
+ "single_word": false,
1225
+ "special": true
1226
+ },
1227
+ "128153": {
1228
+ "content": "<|reserved_special_token_145|>",
1229
+ "lstrip": false,
1230
+ "normalized": false,
1231
+ "rstrip": false,
1232
+ "single_word": false,
1233
+ "special": true
1234
+ },
1235
+ "128154": {
1236
+ "content": "<|reserved_special_token_146|>",
1237
+ "lstrip": false,
1238
+ "normalized": false,
1239
+ "rstrip": false,
1240
+ "single_word": false,
1241
+ "special": true
1242
+ },
1243
+ "128155": {
1244
+ "content": "<|reserved_special_token_147|>",
1245
+ "lstrip": false,
1246
+ "normalized": false,
1247
+ "rstrip": false,
1248
+ "single_word": false,
1249
+ "special": true
1250
+ },
1251
+ "128156": {
1252
+ "content": "<|reserved_special_token_148|>",
1253
+ "lstrip": false,
1254
+ "normalized": false,
1255
+ "rstrip": false,
1256
+ "single_word": false,
1257
+ "special": true
1258
+ },
1259
+ "128157": {
1260
+ "content": "<|reserved_special_token_149|>",
1261
+ "lstrip": false,
1262
+ "normalized": false,
1263
+ "rstrip": false,
1264
+ "single_word": false,
1265
+ "special": true
1266
+ },
1267
+ "128158": {
1268
+ "content": "<|reserved_special_token_150|>",
1269
+ "lstrip": false,
1270
+ "normalized": false,
1271
+ "rstrip": false,
1272
+ "single_word": false,
1273
+ "special": true
1274
+ },
1275
+ "128159": {
1276
+ "content": "<|reserved_special_token_151|>",
1277
+ "lstrip": false,
1278
+ "normalized": false,
1279
+ "rstrip": false,
1280
+ "single_word": false,
1281
+ "special": true
1282
+ },
1283
+ "128160": {
1284
+ "content": "<|reserved_special_token_152|>",
1285
+ "lstrip": false,
1286
+ "normalized": false,
1287
+ "rstrip": false,
1288
+ "single_word": false,
1289
+ "special": true
1290
+ },
1291
+ "128161": {
1292
+ "content": "<|reserved_special_token_153|>",
1293
+ "lstrip": false,
1294
+ "normalized": false,
1295
+ "rstrip": false,
1296
+ "single_word": false,
1297
+ "special": true
1298
+ },
1299
+ "128162": {
1300
+ "content": "<|reserved_special_token_154|>",
1301
+ "lstrip": false,
1302
+ "normalized": false,
1303
+ "rstrip": false,
1304
+ "single_word": false,
1305
+ "special": true
1306
+ },
1307
+ "128163": {
1308
+ "content": "<|reserved_special_token_155|>",
1309
+ "lstrip": false,
1310
+ "normalized": false,
1311
+ "rstrip": false,
1312
+ "single_word": false,
1313
+ "special": true
1314
+ },
1315
+ "128164": {
1316
+ "content": "<|reserved_special_token_156|>",
1317
+ "lstrip": false,
1318
+ "normalized": false,
1319
+ "rstrip": false,
1320
+ "single_word": false,
1321
+ "special": true
1322
+ },
1323
+ "128165": {
1324
+ "content": "<|reserved_special_token_157|>",
1325
+ "lstrip": false,
1326
+ "normalized": false,
1327
+ "rstrip": false,
1328
+ "single_word": false,
1329
+ "special": true
1330
+ },
1331
+ "128166": {
1332
+ "content": "<|reserved_special_token_158|>",
1333
+ "lstrip": false,
1334
+ "normalized": false,
1335
+ "rstrip": false,
1336
+ "single_word": false,
1337
+ "special": true
1338
+ },
1339
+ "128167": {
1340
+ "content": "<|reserved_special_token_159|>",
1341
+ "lstrip": false,
1342
+ "normalized": false,
1343
+ "rstrip": false,
1344
+ "single_word": false,
1345
+ "special": true
1346
+ },
1347
+ "128168": {
1348
+ "content": "<|reserved_special_token_160|>",
1349
+ "lstrip": false,
1350
+ "normalized": false,
1351
+ "rstrip": false,
1352
+ "single_word": false,
1353
+ "special": true
1354
+ },
1355
+ "128169": {
1356
+ "content": "<|reserved_special_token_161|>",
1357
+ "lstrip": false,
1358
+ "normalized": false,
1359
+ "rstrip": false,
1360
+ "single_word": false,
1361
+ "special": true
1362
+ },
1363
+ "128170": {
1364
+ "content": "<|reserved_special_token_162|>",
1365
+ "lstrip": false,
1366
+ "normalized": false,
1367
+ "rstrip": false,
1368
+ "single_word": false,
1369
+ "special": true
1370
+ },
1371
+ "128171": {
1372
+ "content": "<|reserved_special_token_163|>",
1373
+ "lstrip": false,
1374
+ "normalized": false,
1375
+ "rstrip": false,
1376
+ "single_word": false,
1377
+ "special": true
1378
+ },
1379
+ "128172": {
1380
+ "content": "<|reserved_special_token_164|>",
1381
+ "lstrip": false,
1382
+ "normalized": false,
1383
+ "rstrip": false,
1384
+ "single_word": false,
1385
+ "special": true
1386
+ },
1387
+ "128173": {
1388
+ "content": "<|reserved_special_token_165|>",
1389
+ "lstrip": false,
1390
+ "normalized": false,
1391
+ "rstrip": false,
1392
+ "single_word": false,
1393
+ "special": true
1394
+ },
1395
+ "128174": {
1396
+ "content": "<|reserved_special_token_166|>",
1397
+ "lstrip": false,
1398
+ "normalized": false,
1399
+ "rstrip": false,
1400
+ "single_word": false,
1401
+ "special": true
1402
+ },
1403
+ "128175": {
1404
+ "content": "<|reserved_special_token_167|>",
1405
+ "lstrip": false,
1406
+ "normalized": false,
1407
+ "rstrip": false,
1408
+ "single_word": false,
1409
+ "special": true
1410
+ },
1411
+ "128176": {
1412
+ "content": "<|reserved_special_token_168|>",
1413
+ "lstrip": false,
1414
+ "normalized": false,
1415
+ "rstrip": false,
1416
+ "single_word": false,
1417
+ "special": true
1418
+ },
1419
+ "128177": {
1420
+ "content": "<|reserved_special_token_169|>",
1421
+ "lstrip": false,
1422
+ "normalized": false,
1423
+ "rstrip": false,
1424
+ "single_word": false,
1425
+ "special": true
1426
+ },
1427
+ "128178": {
1428
+ "content": "<|reserved_special_token_170|>",
1429
+ "lstrip": false,
1430
+ "normalized": false,
1431
+ "rstrip": false,
1432
+ "single_word": false,
1433
+ "special": true
1434
+ },
1435
+ "128179": {
1436
+ "content": "<|reserved_special_token_171|>",
1437
+ "lstrip": false,
1438
+ "normalized": false,
1439
+ "rstrip": false,
1440
+ "single_word": false,
1441
+ "special": true
1442
+ },
1443
+ "128180": {
1444
+ "content": "<|reserved_special_token_172|>",
1445
+ "lstrip": false,
1446
+ "normalized": false,
1447
+ "rstrip": false,
1448
+ "single_word": false,
1449
+ "special": true
1450
+ },
1451
+ "128181": {
1452
+ "content": "<|reserved_special_token_173|>",
1453
+ "lstrip": false,
1454
+ "normalized": false,
1455
+ "rstrip": false,
1456
+ "single_word": false,
1457
+ "special": true
1458
+ },
1459
+ "128182": {
1460
+ "content": "<|reserved_special_token_174|>",
1461
+ "lstrip": false,
1462
+ "normalized": false,
1463
+ "rstrip": false,
1464
+ "single_word": false,
1465
+ "special": true
1466
+ },
1467
+ "128183": {
1468
+ "content": "<|reserved_special_token_175|>",
1469
+ "lstrip": false,
1470
+ "normalized": false,
1471
+ "rstrip": false,
1472
+ "single_word": false,
1473
+ "special": true
1474
+ },
1475
+ "128184": {
1476
+ "content": "<|reserved_special_token_176|>",
1477
+ "lstrip": false,
1478
+ "normalized": false,
1479
+ "rstrip": false,
1480
+ "single_word": false,
1481
+ "special": true
1482
+ },
1483
+ "128185": {
1484
+ "content": "<|reserved_special_token_177|>",
1485
+ "lstrip": false,
1486
+ "normalized": false,
1487
+ "rstrip": false,
1488
+ "single_word": false,
1489
+ "special": true
1490
+ },
1491
+ "128186": {
1492
+ "content": "<|reserved_special_token_178|>",
1493
+ "lstrip": false,
1494
+ "normalized": false,
1495
+ "rstrip": false,
1496
+ "single_word": false,
1497
+ "special": true
1498
+ },
1499
+ "128187": {
1500
+ "content": "<|reserved_special_token_179|>",
1501
+ "lstrip": false,
1502
+ "normalized": false,
1503
+ "rstrip": false,
1504
+ "single_word": false,
1505
+ "special": true
1506
+ },
1507
+ "128188": {
1508
+ "content": "<|reserved_special_token_180|>",
1509
+ "lstrip": false,
1510
+ "normalized": false,
1511
+ "rstrip": false,
1512
+ "single_word": false,
1513
+ "special": true
1514
+ },
1515
+ "128189": {
1516
+ "content": "<|reserved_special_token_181|>",
1517
+ "lstrip": false,
1518
+ "normalized": false,
1519
+ "rstrip": false,
1520
+ "single_word": false,
1521
+ "special": true
1522
+ },
1523
+ "128190": {
1524
+ "content": "<|reserved_special_token_182|>",
1525
+ "lstrip": false,
1526
+ "normalized": false,
1527
+ "rstrip": false,
1528
+ "single_word": false,
1529
+ "special": true
1530
+ },
1531
+ "128191": {
1532
+ "content": "<|reserved_special_token_183|>",
1533
+ "lstrip": false,
1534
+ "normalized": false,
1535
+ "rstrip": false,
1536
+ "single_word": false,
1537
+ "special": true
1538
+ },
1539
+ "128192": {
1540
+ "content": "<|reserved_special_token_184|>",
1541
+ "lstrip": false,
1542
+ "normalized": false,
1543
+ "rstrip": false,
1544
+ "single_word": false,
1545
+ "special": true
1546
+ },
1547
+ "128193": {
1548
+ "content": "<|reserved_special_token_185|>",
1549
+ "lstrip": false,
1550
+ "normalized": false,
1551
+ "rstrip": false,
1552
+ "single_word": false,
1553
+ "special": true
1554
+ },
1555
+ "128194": {
1556
+ "content": "<|reserved_special_token_186|>",
1557
+ "lstrip": false,
1558
+ "normalized": false,
1559
+ "rstrip": false,
1560
+ "single_word": false,
1561
+ "special": true
1562
+ },
1563
+ "128195": {
1564
+ "content": "<|reserved_special_token_187|>",
1565
+ "lstrip": false,
1566
+ "normalized": false,
1567
+ "rstrip": false,
1568
+ "single_word": false,
1569
+ "special": true
1570
+ },
1571
+ "128196": {
1572
+ "content": "<|reserved_special_token_188|>",
1573
+ "lstrip": false,
1574
+ "normalized": false,
1575
+ "rstrip": false,
1576
+ "single_word": false,
1577
+ "special": true
1578
+ },
1579
+ "128197": {
1580
+ "content": "<|reserved_special_token_189|>",
1581
+ "lstrip": false,
1582
+ "normalized": false,
1583
+ "rstrip": false,
1584
+ "single_word": false,
1585
+ "special": true
1586
+ },
1587
+ "128198": {
1588
+ "content": "<|reserved_special_token_190|>",
1589
+ "lstrip": false,
1590
+ "normalized": false,
1591
+ "rstrip": false,
1592
+ "single_word": false,
1593
+ "special": true
1594
+ },
1595
+ "128199": {
1596
+ "content": "<|reserved_special_token_191|>",
1597
+ "lstrip": false,
1598
+ "normalized": false,
1599
+ "rstrip": false,
1600
+ "single_word": false,
1601
+ "special": true
1602
+ },
1603
+ "128200": {
1604
+ "content": "<|reserved_special_token_192|>",
1605
+ "lstrip": false,
1606
+ "normalized": false,
1607
+ "rstrip": false,
1608
+ "single_word": false,
1609
+ "special": true
1610
+ },
1611
+ "128201": {
1612
+ "content": "<|reserved_special_token_193|>",
1613
+ "lstrip": false,
1614
+ "normalized": false,
1615
+ "rstrip": false,
1616
+ "single_word": false,
1617
+ "special": true
1618
+ },
1619
+ "128202": {
1620
+ "content": "<|reserved_special_token_194|>",
1621
+ "lstrip": false,
1622
+ "normalized": false,
1623
+ "rstrip": false,
1624
+ "single_word": false,
1625
+ "special": true
1626
+ },
1627
+ "128203": {
1628
+ "content": "<|reserved_special_token_195|>",
1629
+ "lstrip": false,
1630
+ "normalized": false,
1631
+ "rstrip": false,
1632
+ "single_word": false,
1633
+ "special": true
1634
+ },
1635
+ "128204": {
1636
+ "content": "<|reserved_special_token_196|>",
1637
+ "lstrip": false,
1638
+ "normalized": false,
1639
+ "rstrip": false,
1640
+ "single_word": false,
1641
+ "special": true
1642
+ },
1643
+ "128205": {
1644
+ "content": "<|reserved_special_token_197|>",
1645
+ "lstrip": false,
1646
+ "normalized": false,
1647
+ "rstrip": false,
1648
+ "single_word": false,
1649
+ "special": true
1650
+ },
1651
+ "128206": {
1652
+ "content": "<|reserved_special_token_198|>",
1653
+ "lstrip": false,
1654
+ "normalized": false,
1655
+ "rstrip": false,
1656
+ "single_word": false,
1657
+ "special": true
1658
+ },
1659
+ "128207": {
1660
+ "content": "<|reserved_special_token_199|>",
1661
+ "lstrip": false,
1662
+ "normalized": false,
1663
+ "rstrip": false,
1664
+ "single_word": false,
1665
+ "special": true
1666
+ },
1667
+ "128208": {
1668
+ "content": "<|reserved_special_token_200|>",
1669
+ "lstrip": false,
1670
+ "normalized": false,
1671
+ "rstrip": false,
1672
+ "single_word": false,
1673
+ "special": true
1674
+ },
1675
+ "128209": {
1676
+ "content": "<|reserved_special_token_201|>",
1677
+ "lstrip": false,
1678
+ "normalized": false,
1679
+ "rstrip": false,
1680
+ "single_word": false,
1681
+ "special": true
1682
+ },
1683
+ "128210": {
1684
+ "content": "<|reserved_special_token_202|>",
1685
+ "lstrip": false,
1686
+ "normalized": false,
1687
+ "rstrip": false,
1688
+ "single_word": false,
1689
+ "special": true
1690
+ },
1691
+ "128211": {
1692
+ "content": "<|reserved_special_token_203|>",
1693
+ "lstrip": false,
1694
+ "normalized": false,
1695
+ "rstrip": false,
1696
+ "single_word": false,
1697
+ "special": true
1698
+ },
1699
+ "128212": {
1700
+ "content": "<|reserved_special_token_204|>",
1701
+ "lstrip": false,
1702
+ "normalized": false,
1703
+ "rstrip": false,
1704
+ "single_word": false,
1705
+ "special": true
1706
+ },
1707
+ "128213": {
1708
+ "content": "<|reserved_special_token_205|>",
1709
+ "lstrip": false,
1710
+ "normalized": false,
1711
+ "rstrip": false,
1712
+ "single_word": false,
1713
+ "special": true
1714
+ },
1715
+ "128214": {
1716
+ "content": "<|reserved_special_token_206|>",
1717
+ "lstrip": false,
1718
+ "normalized": false,
1719
+ "rstrip": false,
1720
+ "single_word": false,
1721
+ "special": true
1722
+ },
1723
+ "128215": {
1724
+ "content": "<|reserved_special_token_207|>",
1725
+ "lstrip": false,
1726
+ "normalized": false,
1727
+ "rstrip": false,
1728
+ "single_word": false,
1729
+ "special": true
1730
+ },
1731
+ "128216": {
1732
+ "content": "<|reserved_special_token_208|>",
1733
+ "lstrip": false,
1734
+ "normalized": false,
1735
+ "rstrip": false,
1736
+ "single_word": false,
1737
+ "special": true
1738
+ },
1739
+ "128217": {
1740
+ "content": "<|reserved_special_token_209|>",
1741
+ "lstrip": false,
1742
+ "normalized": false,
1743
+ "rstrip": false,
1744
+ "single_word": false,
1745
+ "special": true
1746
+ },
1747
+ "128218": {
1748
+ "content": "<|reserved_special_token_210|>",
1749
+ "lstrip": false,
1750
+ "normalized": false,
1751
+ "rstrip": false,
1752
+ "single_word": false,
1753
+ "special": true
1754
+ },
1755
+ "128219": {
1756
+ "content": "<|reserved_special_token_211|>",
1757
+ "lstrip": false,
1758
+ "normalized": false,
1759
+ "rstrip": false,
1760
+ "single_word": false,
1761
+ "special": true
1762
+ },
1763
+ "128220": {
1764
+ "content": "<|reserved_special_token_212|>",
1765
+ "lstrip": false,
1766
+ "normalized": false,
1767
+ "rstrip": false,
1768
+ "single_word": false,
1769
+ "special": true
1770
+ },
1771
+ "128221": {
1772
+ "content": "<|reserved_special_token_213|>",
1773
+ "lstrip": false,
1774
+ "normalized": false,
1775
+ "rstrip": false,
1776
+ "single_word": false,
1777
+ "special": true
1778
+ },
1779
+ "128222": {
1780
+ "content": "<|reserved_special_token_214|>",
1781
+ "lstrip": false,
1782
+ "normalized": false,
1783
+ "rstrip": false,
1784
+ "single_word": false,
1785
+ "special": true
1786
+ },
1787
+ "128223": {
1788
+ "content": "<|reserved_special_token_215|>",
1789
+ "lstrip": false,
1790
+ "normalized": false,
1791
+ "rstrip": false,
1792
+ "single_word": false,
1793
+ "special": true
1794
+ },
1795
+ "128224": {
1796
+ "content": "<|reserved_special_token_216|>",
1797
+ "lstrip": false,
1798
+ "normalized": false,
1799
+ "rstrip": false,
1800
+ "single_word": false,
1801
+ "special": true
1802
+ },
1803
+ "128225": {
1804
+ "content": "<|reserved_special_token_217|>",
1805
+ "lstrip": false,
1806
+ "normalized": false,
1807
+ "rstrip": false,
1808
+ "single_word": false,
1809
+ "special": true
1810
+ },
1811
+ "128226": {
1812
+ "content": "<|reserved_special_token_218|>",
1813
+ "lstrip": false,
1814
+ "normalized": false,
1815
+ "rstrip": false,
1816
+ "single_word": false,
1817
+ "special": true
1818
+ },
1819
+ "128227": {
1820
+ "content": "<|reserved_special_token_219|>",
1821
+ "lstrip": false,
1822
+ "normalized": false,
1823
+ "rstrip": false,
1824
+ "single_word": false,
1825
+ "special": true
1826
+ },
1827
+ "128228": {
1828
+ "content": "<|reserved_special_token_220|>",
1829
+ "lstrip": false,
1830
+ "normalized": false,
1831
+ "rstrip": false,
1832
+ "single_word": false,
1833
+ "special": true
1834
+ },
1835
+ "128229": {
1836
+ "content": "<|reserved_special_token_221|>",
1837
+ "lstrip": false,
1838
+ "normalized": false,
1839
+ "rstrip": false,
1840
+ "single_word": false,
1841
+ "special": true
1842
+ },
1843
+ "128230": {
1844
+ "content": "<|reserved_special_token_222|>",
1845
+ "lstrip": false,
1846
+ "normalized": false,
1847
+ "rstrip": false,
1848
+ "single_word": false,
1849
+ "special": true
1850
+ },
1851
+ "128231": {
1852
+ "content": "<|reserved_special_token_223|>",
1853
+ "lstrip": false,
1854
+ "normalized": false,
1855
+ "rstrip": false,
1856
+ "single_word": false,
1857
+ "special": true
1858
+ },
1859
+ "128232": {
1860
+ "content": "<|reserved_special_token_224|>",
1861
+ "lstrip": false,
1862
+ "normalized": false,
1863
+ "rstrip": false,
1864
+ "single_word": false,
1865
+ "special": true
1866
+ },
1867
+ "128233": {
1868
+ "content": "<|reserved_special_token_225|>",
1869
+ "lstrip": false,
1870
+ "normalized": false,
1871
+ "rstrip": false,
1872
+ "single_word": false,
1873
+ "special": true
1874
+ },
1875
+ "128234": {
1876
+ "content": "<|reserved_special_token_226|>",
1877
+ "lstrip": false,
1878
+ "normalized": false,
1879
+ "rstrip": false,
1880
+ "single_word": false,
1881
+ "special": true
1882
+ },
1883
+ "128235": {
1884
+ "content": "<|reserved_special_token_227|>",
1885
+ "lstrip": false,
1886
+ "normalized": false,
1887
+ "rstrip": false,
1888
+ "single_word": false,
1889
+ "special": true
1890
+ },
1891
+ "128236": {
1892
+ "content": "<|reserved_special_token_228|>",
1893
+ "lstrip": false,
1894
+ "normalized": false,
1895
+ "rstrip": false,
1896
+ "single_word": false,
1897
+ "special": true
1898
+ },
1899
+ "128237": {
1900
+ "content": "<|reserved_special_token_229|>",
1901
+ "lstrip": false,
1902
+ "normalized": false,
1903
+ "rstrip": false,
1904
+ "single_word": false,
1905
+ "special": true
1906
+ },
1907
+ "128238": {
1908
+ "content": "<|reserved_special_token_230|>",
1909
+ "lstrip": false,
1910
+ "normalized": false,
1911
+ "rstrip": false,
1912
+ "single_word": false,
1913
+ "special": true
1914
+ },
1915
+ "128239": {
1916
+ "content": "<|reserved_special_token_231|>",
1917
+ "lstrip": false,
1918
+ "normalized": false,
1919
+ "rstrip": false,
1920
+ "single_word": false,
1921
+ "special": true
1922
+ },
1923
+ "128240": {
1924
+ "content": "<|reserved_special_token_232|>",
1925
+ "lstrip": false,
1926
+ "normalized": false,
1927
+ "rstrip": false,
1928
+ "single_word": false,
1929
+ "special": true
1930
+ },
1931
+ "128241": {
1932
+ "content": "<|reserved_special_token_233|>",
1933
+ "lstrip": false,
1934
+ "normalized": false,
1935
+ "rstrip": false,
1936
+ "single_word": false,
1937
+ "special": true
1938
+ },
1939
+ "128242": {
1940
+ "content": "<|reserved_special_token_234|>",
1941
+ "lstrip": false,
1942
+ "normalized": false,
1943
+ "rstrip": false,
1944
+ "single_word": false,
1945
+ "special": true
1946
+ },
1947
+ "128243": {
1948
+ "content": "<|reserved_special_token_235|>",
1949
+ "lstrip": false,
1950
+ "normalized": false,
1951
+ "rstrip": false,
1952
+ "single_word": false,
1953
+ "special": true
1954
+ },
1955
+ "128244": {
1956
+ "content": "<|reserved_special_token_236|>",
1957
+ "lstrip": false,
1958
+ "normalized": false,
1959
+ "rstrip": false,
1960
+ "single_word": false,
1961
+ "special": true
1962
+ },
1963
+ "128245": {
1964
+ "content": "<|reserved_special_token_237|>",
1965
+ "lstrip": false,
1966
+ "normalized": false,
1967
+ "rstrip": false,
1968
+ "single_word": false,
1969
+ "special": true
1970
+ },
1971
+ "128246": {
1972
+ "content": "<|reserved_special_token_238|>",
1973
+ "lstrip": false,
1974
+ "normalized": false,
1975
+ "rstrip": false,
1976
+ "single_word": false,
1977
+ "special": true
1978
+ },
1979
+ "128247": {
1980
+ "content": "<|reserved_special_token_239|>",
1981
+ "lstrip": false,
1982
+ "normalized": false,
1983
+ "rstrip": false,
1984
+ "single_word": false,
1985
+ "special": true
1986
+ },
1987
+ "128248": {
1988
+ "content": "<|reserved_special_token_240|>",
1989
+ "lstrip": false,
1990
+ "normalized": false,
1991
+ "rstrip": false,
1992
+ "single_word": false,
1993
+ "special": true
1994
+ },
1995
+ "128249": {
1996
+ "content": "<|reserved_special_token_241|>",
1997
+ "lstrip": false,
1998
+ "normalized": false,
1999
+ "rstrip": false,
2000
+ "single_word": false,
2001
+ "special": true
2002
+ },
2003
+ "128250": {
2004
+ "content": "<|reserved_special_token_242|>",
2005
+ "lstrip": false,
2006
+ "normalized": false,
2007
+ "rstrip": false,
2008
+ "single_word": false,
2009
+ "special": true
2010
+ },
2011
+ "128251": {
2012
+ "content": "<|reserved_special_token_243|>",
2013
+ "lstrip": false,
2014
+ "normalized": false,
2015
+ "rstrip": false,
2016
+ "single_word": false,
2017
+ "special": true
2018
+ },
2019
+ "128252": {
2020
+ "content": "<|reserved_special_token_244|>",
2021
+ "lstrip": false,
2022
+ "normalized": false,
2023
+ "rstrip": false,
2024
+ "single_word": false,
2025
+ "special": true
2026
+ },
2027
+ "128253": {
2028
+ "content": "<|reserved_special_token_245|>",
2029
+ "lstrip": false,
2030
+ "normalized": false,
2031
+ "rstrip": false,
2032
+ "single_word": false,
2033
+ "special": true
2034
+ },
2035
+ "128254": {
2036
+ "content": "<|reserved_special_token_246|>",
2037
+ "lstrip": false,
2038
+ "normalized": false,
2039
+ "rstrip": false,
2040
+ "single_word": false,
2041
+ "special": true
2042
+ },
2043
+ "128255": {
2044
+ "content": "<|reserved_special_token_247|>",
2045
+ "lstrip": false,
2046
+ "normalized": false,
2047
+ "rstrip": false,
2048
+ "single_word": false,
2049
+ "special": true
2050
+ }
2051
+ },
2052
+ "bos_token": "<|begin_of_text|>",
2053
+ "clean_up_tokenization_spaces": true,
2054
+ "eos_token": "<|end_of_text|>",
2055
+ "extra_special_tokens": {},
2056
+ "mask_token": "<|mask|>",
2057
+ "max_length": null,
2058
+ "model_input_names": [
2059
+ "input_ids",
2060
+ "attention_mask"
2061
+ ],
2062
+ "model_max_length": 8192,
2063
+ "pad_to_multiple_of": null,
2064
+ "pad_token": "<|pad|>",
2065
+ "pad_token_type_id": 0,
2066
+ "padding_side": "right",
2067
+ "stride": 0,
2068
+ "tokenizer_class": "PreTrainedTokenizerFast",
2069
+ "truncation_side": "right",
2070
+ "truncation_strategy": "longest_first"
2071
+ }