LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries Paper • 2508.15760 • Published Aug 21 • 46
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs Paper • 2304.08244 • Published Apr 14, 2023 • 1
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs Paper • 2508.16153 • Published Aug 22 • 153
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models Paper • 2507.12806 • Published Jul 17 • 20
AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes Paper • 2506.14728 • Published Jun 17
Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First Paper • 2509.00997 • Published Aug 31 • 2
MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools Paper • 2509.09734 • Published Sep 10 • 15
ReAct: Synergizing Reasoning and Acting in Language Models Paper • 2210.03629 • Published Oct 6, 2022 • 29
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use Paper • 2509.24002 • Published 28 days ago • 166