CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions Paper • 2505.18878 • Published May 24 • 2
UserBench: An Interactive Gym Environment for User-Centric Agents Paper • 2507.22034 • Published Jul 29 • 29
UserRL: Training Interactive User-Centric Agent via Reinforcement Learning Paper • 2509.19736 • Published Sep 24 • 11
Enterprise Deep Research: Steerable Multi-Agent Deep Research for Enterprise Analytics Paper • 2510.17797 • Published 11 days ago • 8
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay Paper • 2504.03601 • Published Apr 4 • 17
InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback Paper • 2306.14898 • Published Jun 26, 2023
xLAM: A Family of Large Action Models to Empower AI Agent Systems Paper • 2409.03215 • Published Sep 5, 2024 • 6
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments Paper • 2411.02305 • Published Nov 4, 2024 • 1
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding Paper • 2411.04282 • Published Nov 6, 2024 • 37
SpecTool: A Benchmark for Characterizing Errors in Tool-Use LLMs Paper • 2411.13547 • Published Nov 20, 2024
ActionStudio: A Lightweight Framework for Data and Training of Large Action Models Paper • 2503.22673 • Published Mar 28 • 12