Evaluation of DPO Configurations
Collection
An Empirical Study of DPO Configuration Choices for LLM Alignment
•
14 items
•
Updated
This repo contains LoRA adapter created by aligning Tülu3 8B on the UltraFeedback Binarized dataset using Direct Preference Optimization (DPO). It was trained as a series of models for studying DPO alignment.
See the base model card for usage and chat template details.
This adapter is released under Meta's Llama 3.1 Community License Agreement. Llama 3.1 is © Meta Platforms, Inc.
If this work was helpful, please cite:
TBA