RL Nagi-ovo/Qwen2.5-7B-Reasoning-Adapter Text Generation • Updated Feb 8 • 8 Nagi-ovo/Llama-3-8B-RM Text Classification • 8B • Updated Jan 6 • 11 • 2 Nagi-ovo/Llama-3-8B-PPO Text Generation • 8B • Updated Jan 21 • 7
Llama-3-8B-RLHF-Pipeline Nagi-ovo/Llama-3-8B-SFT-RuoZhiBa Text Generation • 8B • Updated Jan 7 • 20 Nagi-ovo/Llama-3-8B-DPO Text Generation • 8B • Updated Jan 6 • 14 Nagi-ovo/Llama-3-8B-RM Text Classification • 8B • Updated Jan 6 • 11 • 2 Nagi-ovo/Llama-3-8B-PPO Text Generation • 8B • Updated Jan 21 • 7
RL Nagi-ovo/Qwen2.5-7B-Reasoning-Adapter Text Generation • Updated Feb 8 • 8 Nagi-ovo/Llama-3-8B-RM Text Classification • 8B • Updated Jan 6 • 11 • 2 Nagi-ovo/Llama-3-8B-PPO Text Generation • 8B • Updated Jan 21 • 7
Llama-3-8B-RLHF-Pipeline Nagi-ovo/Llama-3-8B-SFT-RuoZhiBa Text Generation • 8B • Updated Jan 7 • 20 Nagi-ovo/Llama-3-8B-DPO Text Generation • 8B • Updated Jan 6 • 14 Nagi-ovo/Llama-3-8B-RM Text Classification • 8B • Updated Jan 6 • 11 • 2 Nagi-ovo/Llama-3-8B-PPO Text Generation • 8B • Updated Jan 21 • 7