On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting Paper • 2508.11408 • Published Aug 15 • 8
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends Paper • 2509.24203 • Published Sep 29 • 7
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends Paper • 2509.24203 • Published Sep 29 • 7
Group-Relative REINFORCE Is Secretly an Off-Policy Algorithm: Demystifying Some Myths About GRPO and Its Friends Paper • 2509.24203 • Published Sep 29 • 7 • 2
Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models Paper • 2505.17826 • Published May 23 • 9
Enhancing Latent Computation in Transformers with Latent Tokens Paper • 2505.12629 • Published May 19
Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models Paper • 2505.17826 • Published May 23 • 9
Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models Paper • 2505.17826 • Published May 23 • 9 • 2
A Simple and Provable Scaling Law for the Test-Time Compute of Large Language Models Paper • 2411.19477 • Published Nov 29, 2024 • 6
A Simple and Provable Scaling Law for the Test-Time Compute of Large Language Models Paper • 2411.19477 • Published Nov 29, 2024 • 6 • 2
EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models Paper • 2402.00518 • Published Feb 1, 2024 • 4
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism Paper • 2312.04916 • Published Dec 8, 2023 • 7 • 7
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism Paper • 2312.04916 • Published Dec 8, 2023 • 7 • 7
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism Paper • 2312.04916 • Published Dec 8, 2023 • 7 • 7
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism Paper • 2312.04916 • Published Dec 8, 2023 • 7