MCP - a jbejar86 Collection

jbejar86 's Collections

MCP

updated Oct 2, 2025

LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

Paper • 2508.15760 • Published Aug 21, 2025 • 47
LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?

Paper • 2508.01780 • Published Aug 3, 2025 • 21
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

Paper • 2304.08244 • Published Apr 14, 2023 • 1
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22, 2025 • 162
Memp: Exploring Agent Procedural Memory

Paper • 2508.06433 • Published Aug 8, 2025 • 36
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models

Paper • 2507.12806 • Published Jul 17, 2025 • 21
Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20, 2025 • 97
AgentBench: Evaluating LLMs as Agents

Paper • 2308.03688 • Published Aug 7, 2023 • 26
PlanGenLLMs: A Modern Survey of LLM Planning Capabilities

Paper • 2502.11221 • Published Feb 16, 2025 • 1
AgentDistill: Training-Free Agent Distillation with Generalizable MCP Boxes

Paper • 2506.14728 • Published Jun 17, 2025
Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First

Paper • 2509.00997 • Published Aug 31, 2025 • 2
Small Language Models are the Future of Agentic AI

Paper • 2506.02153 • Published Jun 2, 2025 • 24
MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools

Paper • 2509.09734 • Published Sep 10, 2025 • 16
Virtual Agent Economies

Paper • 2509.10147 • Published Sep 12, 2025 • 27
ReAct: Synergizing Reasoning and Acting in Language Models

Paper • 2210.03629 • Published Oct 6, 2022 • 35
ARE: Scaling Up Agent Environments and Evaluations

Paper • 2509.17158 • Published Sep 21, 2025 • 36
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use

Paper • 2509.24002 • Published Sep 28, 2025 • 180