WPO Models and datasets in paper "WPO: Enhancing RLHF with Weighted Preference Optimization". wzhouad/Llama3-Instruct-8B-WPO-FP Text Generation • 8B • Updated Jul 24, 2024 wzhouad/Llama3-Instruct-8B-WPO-HB Text Generation • 8B • Updated Aug 22, 2024 • 4 • 1 wzhouad/zephyr-7B-WPO-FP Text Generation • 7B • Updated Jul 24, 2024 wzhouad/zephyr-7B-WPO-HB Text Generation • 7B • Updated Aug 21, 2024
WPO Models and datasets in paper "WPO: Enhancing RLHF with Weighted Preference Optimization". wzhouad/Llama3-Instruct-8B-WPO-FP Text Generation • 8B • Updated Jul 24, 2024 wzhouad/Llama3-Instruct-8B-WPO-HB Text Generation • 8B • Updated Aug 22, 2024 • 4 • 1 wzhouad/zephyr-7B-WPO-FP Text Generation • 7B • Updated Jul 24, 2024 wzhouad/zephyr-7B-WPO-HB Text Generation • 7B • Updated Aug 21, 2024