ASPO: Asymmetric Importance Sampling Policy Optimization Paper • 2510.06062 • Published 27 days ago • 13 • 2
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models Paper • 2509.26628 • Published Sep 30 • 14 • 3
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR Paper • 2507.15778 • Published Jul 21 • 20 • 1
Scaling Image and Video Generation via Test-Time Evolutionary Search Paper • 2505.17618 • Published May 23 • 41 • 2
GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning Paper • 2504.00891 • Published Apr 1 • 14 • 3