Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models Paper • 2510.04618 • Published Oct 6, 2025 • 127 • 5
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use Paper • 2509.24002 • Published Sep 28, 2025 • 174 • 9
ARE: Scaling Up Agent Environments and Evaluations Paper • 2509.17158 • Published Sep 21, 2025 • 35 • 4
MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools Paper • 2509.09734 • Published Sep 10, 2025 • 15 • 3
AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes Paper • 2506.14728 • Published Jun 17, 2025 • 3
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models Paper • 2507.12806 • Published Jul 17, 2025 • 20 • 3