Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2510.04678

Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 142
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18 • 136
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published Apr 21 • 88

Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published Oct 3 • 74
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Paper • 2510.05592 • Published Oct 7 • 102
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6 • 489
Multi-Agent Tool-Integrated Policy Optimization

Paper • 2510.04678 • Published Oct 6 • 30

A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

Paper • 2508.07407 • Published Aug 10 • 97
Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5 • 120
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7 • 139
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published Aug 6 • 127

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2 • 9
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Paper • 2504.08600 • Published Apr 11 • 32
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Paper • 2503.23157 • Published Mar 29 • 10
AI Agents: Evolution, Architecture, and Real-World Applications

Paper • 2503.12687 • Published Mar 16 • 2

AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

Paper • 2402.15506 • Published Feb 23, 2024 • 18
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent

Paper • 2404.03648 • Published Apr 4, 2024 • 30
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts

Paper • 2405.19893 • Published May 30, 2024 • 33
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable

Paper • 2405.19888 • Published May 30, 2024 • 7

SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models

Paper • 2506.04180 • Published Jun 4 • 33
AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation

Paper • 2506.10540 • Published Jun 12 • 37
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science

Paper • 2506.10974 • Published Jun 12 • 19
SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search

Paper • 2507.15245 • Published Jul 21 • 11

Open Data Synthesis For Deep Research

Paper • 2509.00375 • Published Aug 30 • 69
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

Paper • 2509.03403 • Published Sep 3 • 22
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations

Paper • 2509.03405 • Published Sep 3 • 23
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

Paper • 2509.00930 • Published Aug 31 • 4

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 254 • 98
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published Nov 5, 2024 • 68
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning

Paper • 2502.06060 • Published Feb 9 • 38
MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20 • 192
SurveyX: Academic Survey Automation via Large Language Models

Paper • 2502.14776 • Published Feb 20 • 100

Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 142
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18 • 136
Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published Apr 21 • 88

SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models

Paper • 2506.04180 • Published Jun 4 • 33
AniMaker: Automated Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation

Paper • 2506.10540 • Published Jun 12 • 37
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science

Paper • 2506.10974 • Published Jun 12 • 19
SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search

Paper • 2507.15245 • Published Jul 21 • 11

Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published Oct 3 • 74
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Paper • 2510.05592 • Published Oct 7 • 102
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6 • 489
Multi-Agent Tool-Integrated Policy Optimization

Paper • 2510.04678 • Published Oct 6 • 30

Open Data Synthesis For Deep Research

Paper • 2509.00375 • Published Aug 30 • 69
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

Paper • 2509.03403 • Published Sep 3 • 22
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations

Paper • 2509.03405 • Published Sep 3 • 23
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

Paper • 2509.00930 • Published Aug 31 • 4

A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

Paper • 2508.07407 • Published Aug 10 • 97
Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5 • 120
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7 • 139
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published Aug 6 • 127

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 254 • 98
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2 • 9
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Paper • 2504.08600 • Published Apr 11 • 32
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Paper • 2503.23157 • Published Mar 29 • 10
AI Agents: Evolution, Architecture, and Real-World Applications

Paper • 2503.12687 • Published Mar 16 • 2

Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published Nov 5, 2024 • 68
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning

Paper • 2502.06060 • Published Feb 9 • 38
MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20 • 192
SurveyX: Academic Survey Automation via Large Language Models

Paper • 2502.14776 • Published Feb 20 • 100

AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

Paper • 2402.15506 • Published Feb 23, 2024 • 18
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent

Paper • 2404.03648 • Published Apr 4, 2024 • 30
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts

Paper • 2405.19893 • Published May 30, 2024 • 33
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable

Paper • 2405.19888 • Published May 30, 2024 • 7

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs