Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends Paper • 2509.24203 • Published Sep 29 • 7
Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models Paper • 2505.17826 • Published May 23 • 9