Redesign Mixture-of-Experts Routers with Manifold Power Iteration Paper • 2606.12397 • Published 15 days ago • 87
SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research Paper • 2606.09730 • Published 17 days ago • 52
Rethinking Continual Experience Internalization for Self-Evolving LLM Agents Paper • 2606.04703 • Published 22 days ago • 25
LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know? Paper • 2605.28721 • Published 29 days ago • 17
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild Paper • 2603.17187 • Published Mar 17 • 141
AgentProcessBench: Diagnosing Step-Level Process Quality in Tool-Using Agents Paper • 2603.14465 • Published Mar 15 • 23
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation Paper • 2602.12125 • Published Feb 12 • 68
Outcome Accuracy is Not Enough: Aligning the Reasoning Process of Reward Models Paper • 2602.04649 • Published Feb 4 • 13
RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents Paper • 2602.02486 • Published Feb 2 • 20
InT: Self-Proposed Interventions Enable Credit Assignment in LLM Reasoning Paper • 2601.14209 • Published Jan 20 • 6
DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution Paper • 2601.13761 • Published Jan 20 • 16
Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving Paper • 2512.10739 • Published Dec 11, 2025 • 47
Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning Paper • 2512.07461 • Published Dec 8, 2025 • 80
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published Oct 16, 2025 • 40
WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models Paper • 2411.05451 • Published Nov 8, 2024 • 2