---
library_name: transformers
base_model: princeton-nlp/Llama-3-Base-8B-SFT
tags:
- alignment-handbook
- generated_from_trainer
datasets:
- HuggingFaceH4/ultrafeedback_binarized
model-index:
- name: Llama-3-dpo-5e-7-SFTed-paged_adamw_32bit-0.95
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Llama-3-dpo-5e-7-SFTed-paged_adamw_32bit-0.95

This is a model released from the preprint: [DPO-Shift: Shifting the Distribution of Direct Preference Optimization](https://arxiv.org/abs/2502.07599). Please refer to our [repository](https://github.com/Meaquadddd/DPO-Shift) for more details.


This model is a fine-tuned version of [princeton-nlp/Llama-3-Base-8B-SFT](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT) on the HuggingFaceH4/ultrafeedback_binarized dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5600
- Rewards/chosen: -0.2594
- Rewards/rejected: -0.7680
- Rewards/accuracies: 0.7280
- Rewards/margins: 0.5087
- Logps/rejected: -344.3741
- Logps/chosen: -316.7488
- Logits/rejected: -0.8779
- Logits/chosen: -0.8397

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 32
- total_train_batch_size: 128
- total_eval_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6819        | 0.1047 | 50   | 0.6800          | 0.1050         | 0.0798           | 0.6400             | 0.0252          | -259.5905      | -280.3077    | -0.7374         | -0.6591       |
| 0.6361        | 0.2094 | 100  | 0.6362          | 0.0108         | -0.1367          | 0.7080             | 0.1476          | -281.2423      | -289.7269    | -0.8269         | -0.7622       |
| 0.5998        | 0.3141 | 150  | 0.5975          | -0.1439        | -0.4466          | 0.7120             | 0.3027          | -312.2311      | -305.2002    | -0.7868         | -0.7374       |
| 0.5873        | 0.4187 | 200  | 0.5900          | -0.1226        | -0.4679          | 0.7160             | 0.3454          | -314.3644      | -303.0681    | -0.8278         | -0.7815       |
| 0.5692        | 0.5234 | 250  | 0.5732          | -0.2556        | -0.6926          | 0.7300             | 0.4370          | -336.8325      | -316.3727    | -0.8732         | -0.8325       |
| 0.5668        | 0.6281 | 300  | 0.5730          | -0.3147        | -0.7937          | 0.7160             | 0.4790          | -346.9373      | -322.2795    | -0.8503         | -0.8084       |
| 0.5415        | 0.7328 | 350  | 0.5626          | -0.2087        | -0.6908          | 0.7320             | 0.4822          | -336.6547      | -311.6794    | -0.8694         | -0.8289       |
| 0.5595        | 0.8375 | 400  | 0.5604          | -0.2196        | -0.7069          | 0.7300             | 0.4873          | -338.2576      | -312.7687    | -0.8715         | -0.8329       |
| 0.5552        | 0.9422 | 450  | 0.5600          | -0.2594        | -0.7680          | 0.7280             | 0.5087          | -344.3741      | -316.7488    | -0.8779         | -0.8397       |


### Framework versions

- Transformers 4.44.2
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1