en_clm_child_13_new

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.6153

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 13
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 1.3952 2000 6.7893
6.708 2.7904 4000 5.2077
6.708 4.1856 6000 4.7717
4.5971 5.5807 8000 4.5102
4.5971 6.9759 10000 4.3320
4.132 8.3711 12000 4.1887
4.132 9.7663 14000 4.0636
3.8546 11.1615 16000 3.9553
3.8546 12.5567 18000 3.8717
3.6539 13.9519 20000 3.7909
3.6539 15.3471 22000 3.7243
3.4908 16.7422 24000 3.6650
3.4908 18.1374 26000 3.6168
3.357 19.5326 28000 3.5736
3.357 20.9278 30000 3.5349
3.2463 22.3230 32000 3.5088
3.2463 23.7182 34000 3.4850
3.1542 25.1134 36000 3.4704
3.1542 26.5085 38000 3.4511
3.0742 27.9037 40000 3.4367
3.0742 29.2989 42000 3.4277
2.9909 30.6941 44000 3.4234
2.9909 32.0893 46000 3.4258
2.9147 33.4845 48000 3.4244
2.9147 34.8797 50000 3.4251
2.8469 36.2749 52000 3.4327
2.8469 37.6700 54000 3.4350
2.7876 39.0652 56000 3.4484
2.7876 40.4604 58000 3.4517
2.7311 41.8556 60000 3.4539
2.7311 43.2508 62000 3.4776
2.6798 44.6460 64000 3.4786
2.6798 46.0412 66000 3.4921
2.6367 47.4363 68000 3.5030
2.6367 48.8315 70000 3.5099
2.5972 50.2267 72000 3.5286
2.5972 51.6219 74000 3.5309
2.5607 53.0171 76000 3.5419
2.5607 54.4123 78000 3.5555
2.524 55.8075 80000 3.5600
2.524 57.2027 82000 3.5688
2.4941 58.5978 84000 3.5768
2.4941 59.9930 86000 3.5802
2.466 61.3882 88000 3.5930
2.466 62.7834 90000 3.5967
2.4428 64.1786 92000 3.6056
2.4428 65.5738 94000 3.6085
2.4196 66.9690 96000 3.6099
2.4196 68.3641 98000 3.6140
2.4017 69.7593 100000 3.6153

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.0.1
  • Tokenizers 0.20.1
Downloads last month
-
Safetensors
Model size
12.7M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support