-
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 141 -
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Paper • 2504.13837 • Published • 135 -
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 88
Collections
Discover the best community collections!
Collections including paper arxiv:2510.08558
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 102 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
MacroBench: A Novel Testbed for Web Automation Scripts via Large Language Models
Paper • 2510.04363 • Published -
Control Plane as a Tool: A Scalable Design Pattern for Agentic AI Systems
Paper • 2505.06817 • Published -
Agentic Web: Weaving the Next Web with AI Agents
Paper • 2507.21206 • Published -
Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning
Paper • 2410.02052 • Published • 9
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 18 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 246 -
Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents
Paper • 2301.12601 • Published -
Bayesian Risk Markov Decision Processes
Paper • 2106.02558 • Published -
Metrics for Markov Decision Processes with Infinite State Spaces
Paper • 1207.1386 • Published
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 451 -
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
Paper • 2509.25541 • Published • 136 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 246 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 134
-
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 141 -
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Paper • 2504.13837 • Published • 135 -
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 88
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 18 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 102 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
MacroBench: A Novel Testbed for Web Automation Scripts via Large Language Models
Paper • 2510.04363 • Published -
Control Plane as a Tool: A Scalable Design Pattern for Agentic AI Systems
Paper • 2505.06817 • Published -
Agentic Web: Weaving the Next Web with AI Agents
Paper • 2507.21206 • Published -
Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning
Paper • 2410.02052 • Published • 9
-
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 246 -
Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents
Paper • 2301.12601 • Published -
Bayesian Risk Markov Decision Processes
Paper • 2106.02558 • Published -
Metrics for Markov Decision Processes with Infinite State Spaces
Paper • 1207.1386 • Published
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 451 -
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
Paper • 2509.25541 • Published • 136 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 246 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 134