Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents Paper • 2510.24702 • Published 6 days ago • 23
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Paper • 2509.25454 • Published Sep 29 • 136
Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum Paper • 2510.00526 • Published Oct 1 • 8