llama-7b-SFT-qlora-eli5_DPO_ds_RM_contrast_1024_r_64_alpha_16

This model is a fine-tuned version of dhmeltzer/llama-7b-SFT_ds_eli5_1024_r_64_alpha_16_merged on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6204
  • Rewards/chosen: -0.0937
  • Rewards/rejected: -0.3490
  • Rewards/accuracies: 0.6641
  • Rewards/margins: 0.2553
  • Logps/rejected: -205.9560
  • Logps/chosen: -211.6011
  • Logits/rejected: 1.1663
  • Logits/chosen: 1.1890

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6908 0.1 19 0.6536 -0.2975 -0.4466 0.6060 0.1491 -206.9322 -213.6386 1.1767 1.1980
0.6613 0.21 38 0.6391 -0.1759 -0.3858 0.6172 0.2099 -206.3239 -212.4229 1.1695 1.1930
0.6667 0.31 57 0.6297 -0.0287 -0.2656 0.6440 0.2369 -205.1224 -210.9511 1.1612 1.1863
0.6532 0.42 76 0.6271 -0.0915 -0.3376 0.6172 0.2461 -205.8420 -211.5791 1.1395 1.1612
0.6546 0.52 95 0.6235 -0.0575 -0.2906 0.6362 0.2331 -205.3723 -211.2390 1.1551 1.1781
0.6528 0.62 114 0.6231 -0.0939 -0.3382 0.6562 0.2443 -205.8482 -211.6033 1.1702 1.1932
0.646 0.73 133 0.6204 -0.1525 -0.4204 0.6518 0.2678 -206.6696 -212.1891 1.1664 1.1886
0.6524 0.83 152 0.6208 -0.1083 -0.3660 0.6607 0.2577 -206.1257 -211.7465 1.1548 1.1765
0.6335 0.94 171 0.6204 -0.0937 -0.3490 0.6641 0.2553 -205.9560 -211.6011 1.1663 1.1890

Framework versions

  • Transformers 4.32.1
  • Pytorch 2.0.1+cu118
  • Datasets 2.14.4
  • Tokenizers 0.13.3
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for dhmeltzer/llama-7b-SFT-qlora-eli5_DPO_ds_RM_contrast_1024_r_64_alpha_16

Evaluation results