en_wiki_mlm_30

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.2011

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 30
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000

Training results

Training Loss Epoch Step Validation Loss
No log 1.1319 2000 7.8720
7.9236 2.2637 4000 7.1074
7.9236 3.3956 6000 7.0304
7.0257 4.5274 8000 6.9532
7.0257 5.6593 10000 6.8760
6.8811 6.7912 12000 6.8110
6.8811 7.9230 14000 6.7316
6.7587 9.0549 16000 6.6892
6.7587 10.1868 18000 6.6501
6.6566 11.3186 20000 6.5951
6.6566 12.4505 22000 6.5255
6.546 13.5823 24000 6.4406
6.546 14.7142 26000 6.3165
6.3494 15.8461 28000 6.1499
6.3494 16.9779 30000 5.9410
6.0156 18.1098 32000 5.6377
6.0156 19.2417 34000 5.1174
5.2999 20.3735 36000 4.8551
5.2999 21.5054 38000 4.6650
4.7633 22.6372 40000 4.4964
4.7633 23.7691 42000 4.3249
4.4471 24.9010 44000 4.2117
4.4471 26.0328 46000 4.0767
4.1884 27.1647 48000 3.9930
4.1884 28.2965 50000 3.9030
3.9939 29.4284 52000 3.8126
3.9939 30.5603 54000 3.7701
3.8479 31.6921 56000 3.6775
3.8479 32.8240 58000 3.6432
3.7265 33.9559 60000 3.5951
3.7265 35.0877 62000 3.5470
3.6305 36.2196 64000 3.5206
3.6305 37.3514 66000 3.4949
3.5483 38.4833 68000 3.4768
3.5483 39.6152 70000 3.4227
3.4798 40.7470 72000 3.3735
3.4798 41.8789 74000 3.3894
3.4256 43.0108 76000 3.3543
3.4256 44.1426 78000 3.3211
3.3707 45.2745 80000 3.3156
3.3707 46.4063 82000 3.2899
3.3325 47.5382 84000 3.2545
3.3325 48.6701 86000 3.2459
3.2983 49.8019 88000 3.2607
3.2983 50.9338 90000 3.2458
3.2655 52.0656 92000 3.2131
3.2655 53.1975 94000 3.2023
3.2442 54.3294 96000 3.1815
3.2442 55.4612 98000 3.1892
3.2225 56.5931 100000 3.2011

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
-
Safetensors
Model size
14.9M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support