de_wiki_mlm_13

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.0579

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 13
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000

Training results

Training Loss Epoch Step Validation Loss
No log 1.0796 2000 8.1153
8.1594 2.1592 4000 7.4675
8.1594 3.2389 6000 7.3558
7.3679 4.3185 8000 7.2731
7.3679 5.3981 10000 7.1905
7.2108 6.4777 12000 7.1281
7.2108 7.5574 14000 7.0444
7.0667 8.6370 16000 6.9835
7.0667 9.7166 18000 6.9460
6.9599 10.7962 20000 6.8962
6.9599 11.8758 22000 6.8452
6.8651 12.9555 24000 6.7725
6.8651 14.0351 26000 6.6713
6.7083 15.1147 28000 6.5472
6.7083 16.1943 30000 6.3977
6.4688 17.2740 32000 6.2481
6.4688 18.3536 34000 5.9439
6.0356 19.4332 36000 5.3813
6.0356 20.5128 38000 5.0142
5.1534 21.5924 40000 4.7447
5.1534 22.6721 42000 4.5206
4.6619 23.7517 44000 4.3437
4.6619 24.8313 46000 4.1933
4.3114 25.9109 48000 4.0463
4.3114 26.9906 50000 3.9254
4.0627 28.0702 52000 3.8380
4.0627 29.1498 54000 3.7413
3.869 30.2294 56000 3.6810
3.869 31.3090 58000 3.6163
3.7225 32.3887 60000 3.5482
3.7225 33.4683 62000 3.4884
3.5982 34.5479 64000 3.4383
3.5982 35.6275 66000 3.3907
3.5029 36.7072 68000 3.3516
3.5029 37.7868 70000 3.3336
3.4207 38.8664 72000 3.2854
3.4207 39.9460 74000 3.2706
3.3526 41.0256 76000 3.2301
3.3526 42.1053 78000 3.1975
3.2969 43.1849 80000 3.1776
3.2969 44.2645 82000 3.1729
3.2474 45.3441 84000 3.1339
3.2474 46.4238 86000 3.1216
3.2034 47.5034 88000 3.0938
3.2034 48.5830 90000 3.1126
3.1741 49.6626 92000 3.0775
3.1741 50.7422 94000 3.0699
3.1477 51.8219 96000 3.0651
3.1477 52.9015 98000 3.0555
3.1277 53.9811 100000 3.0579

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
1
Safetensors
Model size
14.9M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support