contemmcm's picture
End of training
4562f27 verified
metadata
library_name: transformers
license: mit
base_model: facebook/mbart-large-50
tags:
  - generated_from_trainer
metrics:
  - bleu
model-index:
  - name: be37dcb56e0edd7abfdd701c1e6cf0df
    results: []

be37dcb56e0edd7abfdd701c1e6cf0df

This model is a fine-tuned version of facebook/mbart-large-50 on the Helsinki-NLP/opus_books [de-en] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.3852
  • Data Size: 1.0
  • Epoch Runtime: 322.3205
  • Bleu: 9.3174

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 6.2567 0 26.7048 1.6973
No log 1 1286 4.4748 0.0078 29.5383 5.0592
0.0895 2 2572 3.3679 0.0156 33.2817 10.4733
0.0899 3 3858 2.2590 0.0312 37.5992 6.7743
0.0993 4 5144 2.1548 0.0625 47.5234 8.0927
2.0738 5 6430 2.0462 0.125 66.0811 8.5674
1.9511 6 7716 1.9638 0.25 101.6687 11.8428
1.8928 7 9002 1.9670 0.5 176.5597 12.5154
1.8482 8.0 10288 1.8666 1.0 321.5913 9.3952
1.4755 9.0 11574 1.8589 1.0 322.5186 9.5484
1.2218 10.0 12860 1.9179 1.0 322.5333 9.4680
0.9689 11.0 14146 2.0313 1.0 322.2198 9.4077
0.8023 12.0 15432 2.2226 1.0 324.6607 9.6218
0.6054 13.0 16718 2.3852 1.0 322.3205 9.3174

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1