DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Paper • 2509.25454 • Published Sep 29 • 137
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources Paper • 2509.21268 • Published Sep 25 • 101
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model Paper • 2509.00676 • Published Aug 31 • 83
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25 • 206
Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published Aug 27 • 84
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning Paper • 2508.20751 • Published Aug 28 • 89
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference Paper • 2508.02193 • Published Aug 4 • 130
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning Paper • 2508.10433 • Published Aug 14 • 143
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 238
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published Jul 19 • 133