Rethinking Expert Trajectory Utilization in LLM Post-training Paper • 2512.11470 • Published 19 days ago • 7 • 4
State over Tokens: Characterizing the Role of Reasoning Tokens Paper • 2512.12777 • Published 17 days ago • 3 • 6
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published Nov 9 • 131 • 12
DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation Paper • 2511.06307 • Published Nov 9 • 51 • 5
Learning from the Best, Differently: A Diversity-Driven Rethinking on Data Selection Paper • 2510.18909 • Published Oct 21 • 4 • 3
First Try Matters: Revisiting the Role of Reflection in Reasoning Models Paper • 2510.08308 • Published Oct 9 • 24 • 4
First Try Matters: Revisiting the Role of Reflection in Reasoning Models Paper • 2510.08308 • Published Oct 9 • 24 • 4
Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning Paper • 2505.21067 • Published May 27 • 3 • 1
Model Merging in Pre-training of Large Language Models Paper • 2505.12082 • Published May 17 • 40 • 6
IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization Paper • 2411.06208 • Published Nov 9, 2024 • 21 • 8
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28, 2024 • 104 • 6