Spaces:
Running
Running
| # TraceMind-AI - MCP Integration Guide | |
| This document explains how TraceMind-AI integrates with MCP servers to provide AI-powered agent evaluation. | |
| ## Table of Contents | |
| - [Overview](#overview) | |
| - [Dual MCP Integration](#dual-mcp-integration) | |
| - [Architecture](#architecture) | |
| - [MCP Client Implementation](#mcp-client-implementation) | |
| - [Agent Framework Integration](#agent-framework-integration) | |
| - [MCP Tools Usage](#mcp-tools-usage) | |
| - [Development Guide](#development-guide) | |
| --- | |
| ## Overview | |
| TraceMind-AI demonstrates **enterprise MCP client usage** as part of the **Track 2: MCP in Action** submission. It showcases two distinct patterns of MCP integration: | |
| 1. **Direct MCP Client**: Python-based client connecting to remote MCP server via SSE transport | |
| 2. **Autonomous Agent**: `smolagents`-based agent with access to MCP tools for multi-step reasoning | |
| Both patterns consume the same MCP server ([TraceMind-mcp-server](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server)) to provide AI-powered analysis of agent evaluation data. | |
| --- | |
| ## Dual MCP Integration | |
| ### Pattern 1: Direct MCP Client Integration | |
| **Where**: Leaderboard insights, cost estimation dialogs, trace debugging | |
| **How it works**: | |
| ```python | |
| # TraceMind-AI calls MCP server directly | |
| mcp_client = get_sync_mcp_client() | |
| insights = mcp_client.analyze_leaderboard( | |
| metric_focus="overall", | |
| time_range="last_week", | |
| top_n=5 | |
| ) | |
| # Display insights in UI | |
| ``` | |
| **Use cases**: | |
| - Generate leaderboard insights when user clicks "Load Leaderboard" | |
| - Estimate costs when user clicks "Estimate Cost" in New Evaluation form | |
| - Debug traces when user asks questions in trace visualization | |
| **Advantages**: | |
| - Direct, fast execution | |
| - Synchronous API (easy to integrate with Gradio) | |
| - Predictable, structured responses | |
| --- | |
| ### Pattern 2: Autonomous Agent with MCP Tools | |
| **Where**: Agent Chat tab | |
| **How it works**: | |
| ```python | |
| # smolagents agent discovers and uses MCP tools autonomously | |
| from smolagents import ToolCallingAgent, MCPClient | |
| # Agent initialized with MCP client | |
| agent = ToolCallingAgent( | |
| tools=[], # Tools loaded from MCP server | |
| model=model_client, | |
| mcp_client=MCPClient(mcp_server_url) | |
| ) | |
| # User asks question | |
| result = agent.run("What are the top 3 models and their costs?") | |
| # Agent plans: | |
| # 1. Call get_top_performers MCP tool | |
| # 2. Extract costs from results | |
| # 3. Format and present to user | |
| ``` | |
| **Use cases**: | |
| - Answer complex questions requiring multi-step analysis | |
| - Compare models across multiple dimensions | |
| - Plan evaluation strategies with cost estimates | |
| - Provide recommendations based on leaderboard data | |
| **Advantages**: | |
| - Natural language interface | |
| - Multi-step reasoning | |
| - Autonomous tool selection | |
| - Context-aware responses | |
| --- | |
| ## Architecture | |
| ### System Overview | |
| ``` | |
| ┌─────────────────────────────────────────────────────────────┐ | |
| │ TraceMind-AI (Gradio App) - Track 2 │ | |
| │ │ | |
| │ ┌─────────────────────────────────────────────────────────┐ │ | |
| │ │ UI Layer (Gradio) │ │ | |
| │ │ - Leaderboard tab │ │ | |
| │ │ - Agent Chat tab │ │ | |
| │ │ - New Evaluation tab │ │ | |
| │ │ - Trace Visualization tab │ │ | |
| │ └────────────┬─────────────────────────────┬──────────────┘ │ | |
| │ ↓ ↓ │ | |
| │ ┌───────────────────────┐ ┌──────────────────────────┐ │ | |
| │ │ Direct MCP Client │ │ Autonomous Agent │ │ | |
| │ │ (sync_wrapper.py) │ │ (smolagents) │ │ | |
| │ │ │ │ │ │ | |
| │ │ - Synchronous API │ │ - Multi-step reasoning │ │ | |
| │ │ - Tool calling │ │ - Tool discovery │ │ | |
| │ │ - Error handling │ │ - Context management │ │ | |
| │ └───────────┬───────────┘ └─────────────┬────────────┘ │ | |
| │ └─────────────────┬─────────────┘ │ | |
| │ ↓ │ | |
| │ MCP Protocol │ | |
| │ (SSE Transport) │ | |
| └────────────────────────────────┬────────────────────────────┘ | |
| ↓ | |
| ┌─────────────────────────────────────────────────────────────┐ | |
| │ TraceMind MCP Server - Track 1 │ | |
| │ https://huggingface.co/spaces/MCP-1st-Birthday/ │ | |
| │ TraceMind-mcp-server │ | |
| │ │ | |
| │ 11 AI-Powered Tools: │ | |
| │ - analyze_leaderboard │ | |
| │ - debug_trace │ | |
| │ - estimate_cost │ | |
| │ - compare_runs │ | |
| │ - analyze_results │ | |
| │ - get_top_performers │ | |
| │ - get_leaderboard_summary │ | |
| │ - get_dataset │ | |
| │ - generate_synthetic_dataset │ | |
| │ - push_dataset_to_hub │ | |
| │ - generate_prompt_template │ | |
| └─────────────────────────────────────────────────────────────┘ | |
| ``` | |
| --- | |
| ## MCP Client Implementation | |
| ### File Structure | |
| ``` | |
| TraceMind-AI/ | |
| ├── mcp_client/ | |
| │ ├── __init__.py | |
| │ ├── client.py # Async MCP client | |
| │ └── sync_wrapper.py # Synchronous wrapper for Gradio | |
| ├── agent/ | |
| │ ├── __init__.py | |
| │ └── smolagents_setup.py # Agent with MCP integration | |
| └── app.py # Main Gradio app | |
| ``` | |
| ### Async MCP Client (`client.py`) | |
| ```python | |
| from mcp import ClientSession, StdioServerParameters | |
| import mcp.types as types | |
| class TraceMindMCPClient: | |
| """Async MCP client for TraceMind MCP Server""" | |
| def __init__(self, mcp_server_url: str): | |
| self.mcp_server_url = mcp_server_url | |
| self.session = None | |
| async def connect(self): | |
| """Establish connection to MCP server via SSE""" | |
| # For HTTP-based MCP servers (HuggingFace Spaces) | |
| self.session = ClientSession( | |
| ServerParameters( | |
| url=self.mcp_server_url, | |
| transport="sse" | |
| ) | |
| ) | |
| await self.session.__aenter__() | |
| # List available tools | |
| tools_result = await self.session.list_tools() | |
| self.available_tools = {tool.name: tool for tool in tools_result.tools} | |
| print(f"Connected to MCP server. Available tools: {list(self.available_tools.keys())}") | |
| async def call_tool(self, tool_name: str, arguments: dict) -> str: | |
| """Call an MCP tool with given arguments""" | |
| if not self.session: | |
| raise RuntimeError("MCP client not connected. Call connect() first.") | |
| if tool_name not in self.available_tools: | |
| raise ValueError(f"Tool '{tool_name}' not available. Available: {list(self.available_tools.keys())}") | |
| # Call the tool | |
| result = await self.session.call_tool(tool_name, arguments=arguments) | |
| # Extract text response | |
| if result.content and len(result.content) > 0: | |
| return result.content[0].text | |
| return "" | |
| async def analyze_leaderboard(self, **kwargs) -> str: | |
| """Wrapper for analyze_leaderboard tool""" | |
| return await self.call_tool("analyze_leaderboard", kwargs) | |
| async def estimate_cost(self, **kwargs) -> str: | |
| """Wrapper for estimate_cost tool""" | |
| return await self.call_tool("estimate_cost", kwargs) | |
| async def debug_trace(self, **kwargs) -> str: | |
| """Wrapper for debug_trace tool""" | |
| return await self.call_tool("debug_trace", kwargs) | |
| async def compare_runs(self, **kwargs) -> str: | |
| """Wrapper for compare_runs tool""" | |
| return await self.call_tool("compare_runs", kwargs) | |
| async def get_top_performers(self, **kwargs) -> str: | |
| """Wrapper for get_top_performers tool""" | |
| return await self.call_tool("get_top_performers", kwargs) | |
| async def disconnect(self): | |
| """Close MCP connection""" | |
| if self.session: | |
| await self.session.__aexit__(None, None, None) | |
| ``` | |
| ### Synchronous Wrapper (`sync_wrapper.py`) | |
| ```python | |
| import asyncio | |
| from typing import Optional | |
| from .client import TraceMindMCPClient | |
| class SyncMCPClient: | |
| """Synchronous wrapper for async MCP client (Gradio-compatible)""" | |
| def __init__(self, mcp_server_url: str): | |
| self.mcp_server_url = mcp_server_url | |
| self.async_client = TraceMindMCPClient(mcp_server_url) | |
| self._connected = False | |
| def _run_async(self, coro): | |
| """Run async coroutine in sync context""" | |
| try: | |
| loop = asyncio.get_event_loop() | |
| except RuntimeError: | |
| loop = asyncio.new_event_loop() | |
| asyncio.set_event_loop(loop) | |
| return loop.run_until_complete(coro) | |
| def initialize(self): | |
| """Connect to MCP server""" | |
| if not self._connected: | |
| self._run_async(self.async_client.connect()) | |
| self._connected = True | |
| def analyze_leaderboard(self, **kwargs) -> str: | |
| """Synchronous wrapper for analyze_leaderboard""" | |
| if not self._connected: | |
| self.initialize() | |
| return self._run_async(self.async_client.analyze_leaderboard(**kwargs)) | |
| def estimate_cost(self, **kwargs) -> str: | |
| """Synchronous wrapper for estimate_cost""" | |
| if not self._connected: | |
| self.initialize() | |
| return self._run_async(self.async_client.estimate_cost(**kwargs)) | |
| def debug_trace(self, **kwargs) -> str: | |
| """Synchronous wrapper for debug_trace""" | |
| if not self._connected: | |
| self.initialize() | |
| return self._run_async(self.async_client.debug_trace(**kwargs)) | |
| # ... (similar wrappers for other tools) | |
| # Global instance for use in Gradio app | |
| _mcp_client: Optional[SyncMCPClient] = None | |
| def get_sync_mcp_client() -> SyncMCPClient: | |
| """Get or create global sync MCP client instance""" | |
| global _mcp_client | |
| if _mcp_client is None: | |
| mcp_server_url = os.getenv( | |
| "MCP_SERVER_URL", | |
| "https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse" | |
| ) | |
| _mcp_client = SyncMCPClient(mcp_server_url) | |
| return _mcp_client | |
| ``` | |
| ### Usage in Gradio App | |
| ```python | |
| # app.py | |
| from mcp_client.sync_wrapper import get_sync_mcp_client | |
| # Initialize MCP client | |
| mcp_client = get_sync_mcp_client() | |
| mcp_client.initialize() | |
| # Use in Gradio event handlers | |
| def load_leaderboard(): | |
| """Load leaderboard and generate AI insights""" | |
| # Load dataset | |
| ds = load_dataset("kshitijthakkar/smoltrace-leaderboard") | |
| df = pd.DataFrame(ds) | |
| # Get AI insights from MCP server | |
| try: | |
| insights = mcp_client.analyze_leaderboard( | |
| metric_focus="overall", | |
| time_range="last_week", | |
| top_n=5 | |
| ) | |
| except Exception as e: | |
| insights = f"❌ Error generating insights: {str(e)}" | |
| return df, insights | |
| # Gradio UI | |
| with gr.Blocks() as app: | |
| with gr.Tab("📊 Leaderboard"): | |
| load_btn = gr.Button("Load Leaderboard") | |
| insights_md = gr.Markdown(label="AI Insights") | |
| leaderboard_table = gr.Dataframe() | |
| load_btn.click( | |
| fn=load_leaderboard, | |
| outputs=[leaderboard_table, insights_md] | |
| ) | |
| ``` | |
| --- | |
| ## Agent Framework Integration | |
| ### smolagents Setup | |
| ```python | |
| # agent/smolagents_setup.py | |
| from smolagents import ToolCallingAgent, MCPClient, HfApiModel | |
| import os | |
| def create_agent(): | |
| """Create smolagents agent with MCP tool access""" | |
| # 1. Configure MCP client | |
| mcp_server_url = os.getenv( | |
| "MCP_SERVER_URL", | |
| "https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse" | |
| ) | |
| mcp_client = MCPClient(mcp_server_url) | |
| # 2. Configure LLM | |
| model = HfApiModel( | |
| model_id="Qwen/Qwen2.5-Coder-32B-Instruct", | |
| token=os.getenv("HF_TOKEN") | |
| ) | |
| # 3. Create agent with MCP tools | |
| agent = ToolCallingAgent( | |
| tools=[], # MCP tools loaded automatically | |
| model=model, | |
| mcp_client=mcp_client, | |
| max_steps=10, | |
| verbosity_level=1 | |
| ) | |
| return agent | |
| def run_agent_query(agent: ToolCallingAgent, query: str, show_reasoning: bool = False): | |
| """Run agent query and return response""" | |
| try: | |
| # Set verbosity based on show_reasoning flag | |
| if show_reasoning: | |
| agent.verbosity_level = 2 # Show tool execution logs | |
| else: | |
| agent.verbosity_level = 0 # Only show final answer | |
| # Run agent | |
| result = agent.run(query) | |
| return result | |
| except Exception as e: | |
| return f"❌ Agent error: {str(e)}" | |
| ``` | |
| ### Agent Chat UI | |
| ```python | |
| # app.py | |
| from agent.smolagents_setup import create_agent, run_agent_query | |
| # Initialize agent (once at startup) | |
| agent = create_agent() | |
| def agent_chat(message: str, history: list, show_reasoning: bool): | |
| """Handle agent chat interaction""" | |
| # Run agent query | |
| response = run_agent_query(agent, message, show_reasoning) | |
| # Update chat history | |
| history.append((message, response)) | |
| return history, "" | |
| # Gradio UI | |
| with gr.Blocks() as app: | |
| with gr.Tab("🤖 Agent Chat"): | |
| gr.Markdown("## Autonomous Agent with MCP Tools") | |
| gr.Markdown("Ask questions about agent evaluations. The agent has access to all MCP tools.") | |
| chatbot = gr.Chatbot(label="Agent Chat") | |
| msg = gr.Textbox(label="Your Question", placeholder="What are the top 3 models and their costs?") | |
| show_reasoning = gr.Checkbox(label="Show Agent Reasoning", value=False) | |
| # Quick action buttons | |
| with gr.Row(): | |
| quick_top = gr.Button("Quick: Top Models") | |
| quick_cost = gr.Button("Quick: Cost Estimate") | |
| quick_load = gr.Button("Quick: Load Leaderboard") | |
| # Event handlers | |
| msg.submit(agent_chat, [msg, chatbot, show_reasoning], [chatbot, msg]) | |
| quick_top.click( | |
| lambda h, sr: agent_chat( | |
| "What are the top 5 models by success rate with their costs?", | |
| h, | |
| sr | |
| ), | |
| [chatbot, show_reasoning], | |
| [chatbot, msg] | |
| ) | |
| ``` | |
| --- | |
| ## MCP Tools Usage | |
| ### Tools Used in TraceMind-AI | |
| | Tool | Where Used | Purpose | | |
| |------|-----------|---------| | |
| | `analyze_leaderboard` | Leaderboard tab | Generate AI insights when user loads leaderboard | | |
| | `estimate_cost` | New Evaluation tab | Predict costs before submitting evaluation | | |
| | `debug_trace` | Trace Visualization | Answer questions about execution traces | | |
| | `compare_runs` | Compare Runs/Agent Chat | Compare two evaluation runs side-by-side | | |
| | `analyze_results` | Agent Chat | Analyze detailed test results with optimization recommendations | | |
| | `get_top_performers` | Agent Chat | Efficiently fetch top N models (90% token reduction) | | |
| | `get_leaderboard_summary` | Agent Chat | Get high-level statistics (99% token reduction) | | |
| | `get_dataset` | Agent Chat | Load SMOLTRACE datasets for detailed analysis | | |
| ### Example Tool Calls | |
| **Example 1: Leaderboard Insights** | |
| ```python | |
| # User clicks "Load Leaderboard" button | |
| insights = mcp_client.analyze_leaderboard( | |
| leaderboard_repo="kshitijthakkar/smoltrace-leaderboard", | |
| metric_focus="overall", | |
| time_range="last_week", | |
| top_n=5 | |
| ) | |
| # Display in Gradio Markdown component | |
| insights_md.value = insights | |
| ``` | |
| **Example 2: Cost Estimation** | |
| ```python | |
| # User fills New Evaluation form and clicks "Estimate Cost" | |
| estimate = mcp_client.estimate_cost( | |
| model="meta-llama/Llama-3.1-8B", | |
| agent_type="both", | |
| num_tests=100, | |
| hardware="auto" | |
| ) | |
| # Display in dialog | |
| gr.Info(estimate) | |
| ``` | |
| **Example 3: Agent Multi-Step Query** | |
| ```python | |
| # User asks: "What are the top 3 models and how much do they cost?" | |
| # Agent reasoning (internal): | |
| # Step 1: Need to get top models by success rate | |
| # → Call get_top_performers(metric="success_rate", top_n=3) | |
| # | |
| # Step 2: Extract cost information from results | |
| # → Parse JSON response, get "total_cost_usd" field | |
| # | |
| # Step 3: Format response for user | |
| # → Create markdown table with model names, success rates, costs | |
| # Agent response: | |
| """ | |
| Here are the top 3 models by success rate: | |
| 1. **GPT-4**: 95.8% success rate, $0.05 per run | |
| 2. **Claude-3**: 94.1% success rate, $0.04 per run | |
| 3. **Llama-3.1-8B**: 93.4% success rate, $0.002 per run | |
| GPT-4 leads in accuracy but is 25x more expensive than Llama-3.1. | |
| For cost-sensitive workloads, Llama-3.1 offers the best value. | |
| """ | |
| ``` | |
| --- | |
| ## Development Guide | |
| ### Adding New MCP Tool Integration | |
| 1. **Add method to async client** (`client.py`): | |
| ```python | |
| async def new_tool_name(self, **kwargs) -> str: | |
| """Wrapper for new_tool_name MCP tool""" | |
| return await self.call_tool("new_tool_name", kwargs) | |
| ``` | |
| 2. **Add synchronous wrapper** (`sync_wrapper.py`): | |
| ```python | |
| def new_tool_name(self, **kwargs) -> str: | |
| """Synchronous wrapper for new_tool_name""" | |
| if not self._connected: | |
| self.initialize() | |
| return self._run_async(self.async_client.new_tool_name(**kwargs)) | |
| ``` | |
| 3. **Use in Gradio app** (`app.py`): | |
| ```python | |
| def handle_new_tool(): | |
| result = mcp_client.new_tool_name(param1="value1", param2="value2") | |
| return result | |
| ``` | |
| **Note**: Agent automatically discovers new tools from MCP server, no code changes needed! | |
| ### Testing MCP Integration | |
| **Test 1: Connection** | |
| ```python | |
| python -c "from mcp_client.sync_wrapper import get_sync_mcp_client; client = get_sync_mcp_client(); client.initialize(); print('✅ MCP client connected')" | |
| ``` | |
| **Test 2: Tool Call** | |
| ```python | |
| from mcp_client.sync_wrapper import get_sync_mcp_client | |
| client = get_sync_mcp_client() | |
| client.initialize() | |
| result = client.analyze_leaderboard( | |
| metric_focus="cost", | |
| time_range="last_week", | |
| top_n=3 | |
| ) | |
| print(result) | |
| ``` | |
| **Test 3: Agent** | |
| ```python | |
| from agent.smolagents_setup import create_agent, run_agent_query | |
| agent = create_agent() | |
| response = run_agent_query(agent, "What are the top 3 models?", show_reasoning=True) | |
| print(response) | |
| ``` | |
| ### Debugging MCP Issues | |
| **Issue**: Connection timeout | |
| - **Check**: MCP server is running at specified URL | |
| - **Check**: Network connectivity to HuggingFace Spaces | |
| - **Check**: SSE transport is enabled on server | |
| **Issue**: Tool not found | |
| - **Check**: MCP server has the tool implemented | |
| - **Check**: Tool name matches exactly (case-sensitive) | |
| - **Check**: Client initialized successfully (call `initialize()` first) | |
| **Issue**: Agent not using MCP tools | |
| - **Check**: MCPClient is properly configured in agent setup | |
| - **Check**: Agent has `max_steps > 0` to allow tool usage | |
| - **Check**: Query requires tool usage (not answerable from agent's knowledge alone) | |
| --- | |
| ## Performance Considerations | |
| ### Token Optimization | |
| **Problem**: Loading full leaderboard dataset consumes excessive tokens | |
| **Solution**: Use token-optimized MCP tools | |
| ```python | |
| # ❌ BAD: Loads all 51 runs (50K+ tokens) | |
| leaderboard = mcp_client.get_dataset("kshitijthakkar/smoltrace-leaderboard") | |
| # ✅ GOOD: Returns only top 5 (5K tokens, 90% reduction) | |
| top_performers = mcp_client.get_top_performers(top_n=5) | |
| # ✅ BETTER: Returns summary stats (500 tokens, 99% reduction) | |
| summary = mcp_client.get_leaderboard_summary() | |
| ``` | |
| ### Caching | |
| **Problem**: Repeated identical MCP calls waste time and credits | |
| **Solution**: Implement client-side caching | |
| ```python | |
| from functools import lru_cache | |
| import time | |
| @lru_cache(maxsize=32) | |
| def cached_analyze_leaderboard(metric_focus: str, time_range: str, top_n: int, cache_key: int): | |
| """Cached MCP call with TTL via cache_key""" | |
| return mcp_client.analyze_leaderboard( | |
| metric_focus=metric_focus, | |
| time_range=time_range, | |
| top_n=top_n | |
| ) | |
| # Use with 5-minute cache TTL | |
| cache_key = int(time.time() // 300) # Changes every 5 minutes | |
| insights = cached_analyze_leaderboard("overall", "last_week", 5, cache_key) | |
| ``` | |
| ### Async Optimization | |
| **Problem**: Sequential MCP calls block UI | |
| **Solution**: Use async for parallel calls | |
| ```python | |
| import asyncio | |
| async def load_leaderboard_with_insights(): | |
| """Load leaderboard and insights in parallel""" | |
| # Start both operations concurrently | |
| leaderboard_task = asyncio.create_task(load_dataset_async("kshitijthakkar/smoltrace-leaderboard")) | |
| insights_task = asyncio.create_task(mcp_client.analyze_leaderboard(metric_focus="overall")) | |
| # Wait for both to complete | |
| leaderboard, insights = await asyncio.gather(leaderboard_task, insights_task) | |
| return leaderboard, insights | |
| ``` | |
| --- | |
| ## Security Considerations | |
| ### API Key Management | |
| **DO**: | |
| - Store API keys in environment variables or HF Spaces secrets | |
| - Use session-only storage in Gradio (not server-side persistence) | |
| - Rotate keys regularly | |
| **DON'T**: | |
| - Hardcode API keys in source code | |
| - Expose keys in client-side JavaScript | |
| - Log API keys in console or files | |
| ### MCP Server Trust | |
| **Verify MCP server authenticity**: | |
| - Use HTTPS URLs only | |
| - Verify domain ownership (huggingface.co spaces) | |
| - Review MCP server code before connecting (open source) | |
| **Limit tool access**: | |
| - Only connect to trusted MCP servers | |
| - Review tool permissions before use | |
| - Implement rate limiting for tool calls | |
| --- | |
| ## Related Documentation | |
| - [USER_GUIDE.md](USER_GUIDE.md) - Complete UI walkthrough | |
| - [ARCHITECTURE.md](ARCHITECTURE.md) - Technical architecture | |
| - [TraceMind MCP Server Documentation](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server) | |
| --- | |
| **Last Updated**: November 21, 2025 | |