Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR Paper • 2509.23808 • Published 28 days ago • 47
Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR Paper • 2509.23808 • Published 28 days ago • 47
Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR Paper • 2509.23808 • Published 28 days ago • 47 • 2
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Paper • 2509.15221 • Published Sep 18 • 109
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2 • 218
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published Aug 19 • 118 • 6
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published Aug 19 • 118
Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration Paper • 2508.13755 • Published Aug 19 • 14
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published Aug 19 • 118
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published Aug 19 • 118 • 6
Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding Paper • 2507.19427 • Published Jul 25 • 18