fda904753def3d123cd25fdbaf479c88

This model is a fine-tuned version of google-t5/t5-3b on the Helsinki-NLP/opus_books [en-pt] dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8807
  • Data Size: 1.0
  • Epoch Runtime: 36.5586
  • Bleu: 19.0050

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 1.6888 0 1.9636 8.1270
No log 1 35 1.4034 0.0078 2.8702 8.9262
No log 2 70 1.2250 0.0156 6.1203 12.4007
No log 3 105 1.1722 0.0312 11.3888 10.3772
No log 4 140 1.1197 0.0625 19.1300 10.5168
No log 5 175 1.0644 0.125 23.5111 11.2868
No log 6 210 0.9824 0.25 28.5935 13.1734
No log 7 245 0.8901 0.5 24.8551 16.2580
0.2182 8.0 280 0.8069 1.0 33.4287 16.1355
0.8651 9.0 315 0.7824 1.0 34.4039 16.7790
0.6654 10.0 350 0.7796 1.0 37.7468 18.0529
0.6654 11.0 385 0.7771 1.0 40.3196 17.7458
0.5116 12.0 420 0.7826 1.0 30.4486 18.1556
0.4119 13.0 455 0.8035 1.0 31.8901 18.4966
0.4119 14.0 490 0.8527 1.0 35.8862 17.9012
0.331 15.0 525 0.8807 1.0 36.5586 19.0050

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
2
Safetensors
Model size
0.7B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for contemmcm/fda904753def3d123cd25fdbaf479c88

Base model

google-t5/t5-3b
Finetuned
(68)
this model