Update README.md
Browse files
README.md
CHANGED
|
@@ -21,8 +21,10 @@ Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tul
|
|
| 21 |
This is a **value** model produced during the PPO training of [this](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm) model.
|
| 22 |
We release the value model as it may provide a good starting point for additional research or improved decoding with our released PPO models.
|
| 23 |
|
|
|
|
|
|
|
| 24 |
For more details, read the paper:
|
| 25 |
-
[Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback](https://
|
| 26 |
|
| 27 |
|
| 28 |
## .Model description
|
|
@@ -76,6 +78,7 @@ If you find Tulu 2.5 is useful in your work, please cite it with:
|
|
| 76 |
title={{Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback}},
|
| 77 |
author={{Hamish Ivison and Yizhong Wang and Jiacheng Liu and Ellen Wu and Valentina Pyatkin and Nathan Lambert and Yejin Choi and Noah A. Smith and Hannaneh Hajishirzi}}
|
| 78 |
year={2024},
|
|
|
|
| 79 |
archivePrefix={arXiv},
|
| 80 |
primaryClass={cs.CL}
|
| 81 |
}
|
|
|
|
| 21 |
This is a **value** model produced during the PPO training of [this](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm) model.
|
| 22 |
We release the value model as it may provide a good starting point for additional research or improved decoding with our released PPO models.
|
| 23 |
|
| 24 |
+
At time of writing, you may have to [install transformers from source](https://huggingface.co/docs/transformers/en/installation#install-from-source) to get the `LlamaForTokenClassification` class.
|
| 25 |
+
|
| 26 |
For more details, read the paper:
|
| 27 |
+
[Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback](https://arxiv.org/abs/2406.09279).
|
| 28 |
|
| 29 |
|
| 30 |
## .Model description
|
|
|
|
| 78 |
title={{Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback}},
|
| 79 |
author={{Hamish Ivison and Yizhong Wang and Jiacheng Liu and Ellen Wu and Valentina Pyatkin and Nathan Lambert and Yejin Choi and Noah A. Smith and Hannaneh Hajishirzi}}
|
| 80 |
year={2024},
|
| 81 |
+
eprint={2406.09279},
|
| 82 |
archivePrefix={arXiv},
|
| 83 |
primaryClass={cs.CL}
|
| 84 |
}
|