P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published 3 days ago • 121
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling Paper • 2511.11793 • Published 6 days ago • 139
Panda LLM: Training Data and Evaluation for Open-Sourced Chinese Instruction-Following Large Language Models Paper • 2305.03025 • Published May 4, 2023
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models Paper • 2505.00551 • Published May 1 • 36
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published Jul 19 • 132
First Try Matters: Revisiting the Role of Reflection in Reasoning Models Paper • 2510.08308 • Published Oct 9 • 24
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward Paper • 2510.03222 • Published Oct 3 • 74
First Try Matters: Revisiting the Role of Reflection in Reasoning Models Paper • 2510.08308 • Published Oct 9 • 24
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published Jul 19 • 132
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models Paper • 2505.00551 • Published May 1 • 36