Spaces:

MCP-1st-Birthday
/

TraceMind

Running

File size: 23,180 Bytes

# TraceMind-AI - MCP Integration Guide

This document explains how TraceMind-AI integrates with MCP servers to provide AI-powered agent evaluation.

## Table of Contents

- [Overview](#overview)
- [Dual MCP Integration](#dual-mcp-integration)
- [Architecture](#architecture)
- [MCP Client Implementation](#mcp-client-implementation)
- [Agent Framework Integration](#agent-framework-integration)
- [MCP Tools Usage](#mcp-tools-usage)
- [Development Guide](#development-guide)

---

## Overview

TraceMind-AI demonstrates **enterprise MCP client usage** as part of the **Track 2: MCP in Action** submission. It showcases two distinct patterns of MCP integration:

1. **Direct MCP Client**: Python-based client connecting to remote MCP server via SSE transport
2. **Autonomous Agent**: `smolagents`-based agent with access to MCP tools for multi-step reasoning

Both patterns consume the same MCP server ([TraceMind-mcp-server](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server)) to provide AI-powered analysis of agent evaluation data.

---

## Dual MCP Integration

### Pattern 1: Direct MCP Client Integration

**Where**: Leaderboard insights, cost estimation dialogs, trace debugging

**How it works**:
```python
# TraceMind-AI calls MCP server directly
mcp_client = get_sync_mcp_client()
insights = mcp_client.analyze_leaderboard(
    metric_focus="overall",
    time_range="last_week",
    top_n=5
)
# Display insights in UI
```

**Use cases**:
- Generate leaderboard insights when user clicks "Load Leaderboard"
- Estimate costs when user clicks "Estimate Cost" in New Evaluation form
- Debug traces when user asks questions in trace visualization

**Advantages**:
- Direct, fast execution
- Synchronous API (easy to integrate with Gradio)
- Predictable, structured responses

---

### Pattern 2: Autonomous Agent with MCP Tools

**Where**: Agent Chat tab

**How it works**:
```python
# smolagents agent discovers and uses MCP tools autonomously
from smolagents import ToolCallingAgent, MCPClient

# Agent initialized with MCP client
agent = ToolCallingAgent(
    tools=[],  # Tools loaded from MCP server
    model=model_client,
    mcp_client=MCPClient(mcp_server_url)
)

# User asks question
result = agent.run("What are the top 3 models and their costs?")

# Agent plans:
#   1. Call get_top_performers MCP tool
#   2. Extract costs from results
#   3. Format and present to user
```

**Use cases**:
- Answer complex questions requiring multi-step analysis
- Compare models across multiple dimensions
- Plan evaluation strategies with cost estimates
- Provide recommendations based on leaderboard data

**Advantages**:
- Natural language interface
- Multi-step reasoning
- Autonomous tool selection
- Context-aware responses

---

## Architecture

### System Overview

```
┌─────────────────────────────────────────────────────────────┐
│ TraceMind-AI (Gradio App) - Track 2                         │
│                                                               │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ UI Layer (Gradio)                                       │ │
│ │  - Leaderboard tab                                      │ │
│ │  - Agent Chat tab                                       │ │
│ │  - New Evaluation tab                                   │ │
│ │  - Trace Visualization tab                              │ │
│ └────────────┬─────────────────────────────┬──────────────┘ │
│              ↓                             ↓                 │
│  ┌───────────────────────┐   ┌──────────────────────────┐  │
│  │ Direct MCP Client     │   │ Autonomous Agent         │  │
│  │ (sync_wrapper.py)     │   │ (smolagents)             │  │
│  │                       │   │                          │  │
│  │ - Synchronous API     │   │ - Multi-step reasoning   │  │
│  │ - Tool calling        │   │ - Tool discovery         │  │
│  │ - Error handling      │   │ - Context management     │  │
│  └───────────┬───────────┘   └─────────────┬────────────┘  │
│              └─────────────────┬─────────────┘               │
│                                ↓                             │
│                         MCP Protocol                         │
│                         (SSE Transport)                      │
└────────────────────────────────┬────────────────────────────┘
                                 ↓
┌─────────────────────────────────────────────────────────────┐
│ TraceMind MCP Server - Track 1                              │
│ https://huggingface.co/spaces/MCP-1st-Birthday/             │
│ TraceMind-mcp-server                                        │
│                                                               │
│ 11 AI-Powered Tools:                                        │
│  - analyze_leaderboard                                      │
│  - debug_trace                                              │
│  - estimate_cost                                            │
│  - compare_runs                                             │
│  - analyze_results                                          │
│  - get_top_performers                                       │
│  - get_leaderboard_summary                                  │
│  - get_dataset                                              │
│  - generate_synthetic_dataset                               │
│  - push_dataset_to_hub                                      │
│  - generate_prompt_template                                 │
└─────────────────────────────────────────────────────────────┘
```

---

## MCP Client Implementation

### File Structure

```
TraceMind-AI/
├── mcp_client/
│   ├── __init__.py
│   ├── client.py              # Async MCP client
│   └── sync_wrapper.py        # Synchronous wrapper for Gradio
├── agent/
│   ├── __init__.py
│   └── smolagents_setup.py    # Agent with MCP integration
└── app.py                     # Main Gradio app
```

### Async MCP Client (`client.py`)

```python
from mcp import ClientSession, StdioServerParameters
import mcp.types as types

class TraceMindMCPClient:
    """Async MCP client for TraceMind MCP Server"""

    def __init__(self, mcp_server_url: str):
        self.mcp_server_url = mcp_server_url
        self.session = None

    async def connect(self):
        """Establish connection to MCP server via SSE"""
        # For HTTP-based MCP servers (HuggingFace Spaces)
        self.session = ClientSession(
            ServerParameters(
                url=self.mcp_server_url,
                transport="sse"
            )
        )
        await self.session.__aenter__()

        # List available tools
        tools_result = await self.session.list_tools()
        self.available_tools = {tool.name: tool for tool in tools_result.tools}

        print(f"Connected to MCP server. Available tools: {list(self.available_tools.keys())}")

    async def call_tool(self, tool_name: str, arguments: dict) -> str:
        """Call an MCP tool with given arguments"""
        if not self.session:
            raise RuntimeError("MCP client not connected. Call connect() first.")

        if tool_name not in self.available_tools:
            raise ValueError(f"Tool '{tool_name}' not available. Available: {list(self.available_tools.keys())}")

        # Call the tool
        result = await self.session.call_tool(tool_name, arguments=arguments)

        # Extract text response
        if result.content and len(result.content) > 0:
            return result.content[0].text
        return ""

    async def analyze_leaderboard(self, **kwargs) -> str:
        """Wrapper for analyze_leaderboard tool"""
        return await self.call_tool("analyze_leaderboard", kwargs)

    async def estimate_cost(self, **kwargs) -> str:
        """Wrapper for estimate_cost tool"""
        return await self.call_tool("estimate_cost", kwargs)

    async def debug_trace(self, **kwargs) -> str:
        """Wrapper for debug_trace tool"""
        return await self.call_tool("debug_trace", kwargs)

    async def compare_runs(self, **kwargs) -> str:
        """Wrapper for compare_runs tool"""
        return await self.call_tool("compare_runs", kwargs)

    async def get_top_performers(self, **kwargs) -> str:
        """Wrapper for get_top_performers tool"""
        return await self.call_tool("get_top_performers", kwargs)

    async def disconnect(self):
        """Close MCP connection"""
        if self.session:
            await self.session.__aexit__(None, None, None)
```

### Synchronous Wrapper (`sync_wrapper.py`)

```python
import asyncio
from typing import Optional
from .client import TraceMindMCPClient

class SyncMCPClient:
    """Synchronous wrapper for async MCP client (Gradio-compatible)"""

    def __init__(self, mcp_server_url: str):
        self.mcp_server_url = mcp_server_url
        self.async_client = TraceMindMCPClient(mcp_server_url)
        self._connected = False

    def _run_async(self, coro):
        """Run async coroutine in sync context"""
        try:
            loop = asyncio.get_event_loop()
        except RuntimeError:
            loop = asyncio.new_event_loop()
            asyncio.set_event_loop(loop)

        return loop.run_until_complete(coro)

    def initialize(self):
        """Connect to MCP server"""
        if not self._connected:
            self._run_async(self.async_client.connect())
            self._connected = True

    def analyze_leaderboard(self, **kwargs) -> str:
        """Synchronous wrapper for analyze_leaderboard"""
        if not self._connected:
            self.initialize()
        return self._run_async(self.async_client.analyze_leaderboard(**kwargs))

    def estimate_cost(self, **kwargs) -> str:
        """Synchronous wrapper for estimate_cost"""
        if not self._connected:
            self.initialize()
        return self._run_async(self.async_client.estimate_cost(**kwargs))

    def debug_trace(self, **kwargs) -> str:
        """Synchronous wrapper for debug_trace"""
        if not self._connected:
            self.initialize()
        return self._run_async(self.async_client.debug_trace(**kwargs))

    # ... (similar wrappers for other tools)

# Global instance for use in Gradio app
_mcp_client: Optional[SyncMCPClient] = None

def get_sync_mcp_client() -> SyncMCPClient:
    """Get or create global sync MCP client instance"""
    global _mcp_client
    if _mcp_client is None:
        mcp_server_url = os.getenv(
            "MCP_SERVER_URL",
            "https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse"
        )
        _mcp_client = SyncMCPClient(mcp_server_url)
    return _mcp_client
```

### Usage in Gradio App

```python
# app.py
from mcp_client.sync_wrapper import get_sync_mcp_client

# Initialize MCP client
mcp_client = get_sync_mcp_client()
mcp_client.initialize()

# Use in Gradio event handlers
def load_leaderboard():
    """Load leaderboard and generate AI insights"""
    # Load dataset
    ds = load_dataset("kshitijthakkar/smoltrace-leaderboard")
    df = pd.DataFrame(ds)

    # Get AI insights from MCP server
    try:
        insights = mcp_client.analyze_leaderboard(
            metric_focus="overall",
            time_range="last_week",
            top_n=5
        )
    except Exception as e:
        insights = f"❌ Error generating insights: {str(e)}"

    return df, insights

# Gradio UI
with gr.Blocks() as app:
    with gr.Tab("📊 Leaderboard"):
        load_btn = gr.Button("Load Leaderboard")
        insights_md = gr.Markdown(label="AI Insights")
        leaderboard_table = gr.Dataframe()

        load_btn.click(
            fn=load_leaderboard,
            outputs=[leaderboard_table, insights_md]
        )
```

---

## Agent Framework Integration

### smolagents Setup

```python
# agent/smolagents_setup.py
from smolagents import ToolCallingAgent, MCPClient, HfApiModel
import os

def create_agent():
    """Create smolagents agent with MCP tool access"""

    # 1. Configure MCP client
    mcp_server_url = os.getenv(
        "MCP_SERVER_URL",
        "https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse"
    )

    mcp_client = MCPClient(mcp_server_url)

    # 2. Configure LLM
    model = HfApiModel(
        model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
        token=os.getenv("HF_TOKEN")
    )

    # 3. Create agent with MCP tools
    agent = ToolCallingAgent(
        tools=[],  # MCP tools loaded automatically
        model=model,
        mcp_client=mcp_client,
        max_steps=10,
        verbosity_level=1
    )

    return agent

def run_agent_query(agent: ToolCallingAgent, query: str, show_reasoning: bool = False):
    """Run agent query and return response"""
    try:
        # Set verbosity based on show_reasoning flag
        if show_reasoning:
            agent.verbosity_level = 2  # Show tool execution logs
        else:
            agent.verbosity_level = 0  # Only show final answer

        # Run agent
        result = agent.run(query)

        return result
    except Exception as e:
        return f"❌ Agent error: {str(e)}"
```

### Agent Chat UI

```python
# app.py
from agent.smolagents_setup import create_agent, run_agent_query

# Initialize agent (once at startup)
agent = create_agent()

def agent_chat(message: str, history: list, show_reasoning: bool):
    """Handle agent chat interaction"""
    # Run agent query
    response = run_agent_query(agent, message, show_reasoning)

    # Update chat history
    history.append((message, response))

    return history, ""

# Gradio UI
with gr.Blocks() as app:
    with gr.Tab("🤖 Agent Chat"):
        gr.Markdown("## Autonomous Agent with MCP Tools")
        gr.Markdown("Ask questions about agent evaluations. The agent has access to all MCP tools.")

        chatbot = gr.Chatbot(label="Agent Chat")
        msg = gr.Textbox(label="Your Question", placeholder="What are the top 3 models and their costs?")
        show_reasoning = gr.Checkbox(label="Show Agent Reasoning", value=False)

        # Quick action buttons
        with gr.Row():
            quick_top = gr.Button("Quick: Top Models")
            quick_cost = gr.Button("Quick: Cost Estimate")
            quick_load = gr.Button("Quick: Load Leaderboard")

        # Event handlers
        msg.submit(agent_chat, [msg, chatbot, show_reasoning], [chatbot, msg])

        quick_top.click(
            lambda h, sr: agent_chat(
                "What are the top 5 models by success rate with their costs?",
                h,
                sr
            ),
            [chatbot, show_reasoning],
            [chatbot, msg]
        )
```

---

## MCP Tools Usage

### Tools Used in TraceMind-AI

| Tool | Where Used | Purpose |
|------|-----------|---------|
| `analyze_leaderboard` | Leaderboard tab | Generate AI insights when user loads leaderboard |
| `estimate_cost` | New Evaluation tab | Predict costs before submitting evaluation |
| `debug_trace` | Trace Visualization | Answer questions about execution traces |
| `compare_runs` | Compare Runs/Agent Chat | Compare two evaluation runs side-by-side |
| `analyze_results` | Agent Chat | Analyze detailed test results with optimization recommendations |
| `get_top_performers` | Agent Chat | Efficiently fetch top N models (90% token reduction) |
| `get_leaderboard_summary` | Agent Chat | Get high-level statistics (99% token reduction) |
| `get_dataset` | Agent Chat | Load SMOLTRACE datasets for detailed analysis |

### Example Tool Calls

**Example 1: Leaderboard Insights**
```python
# User clicks "Load Leaderboard" button
insights = mcp_client.analyze_leaderboard(
    leaderboard_repo="kshitijthakkar/smoltrace-leaderboard",
    metric_focus="overall",
    time_range="last_week",
    top_n=5
)

# Display in Gradio Markdown component
insights_md.value = insights
```

**Example 2: Cost Estimation**
```python
# User fills New Evaluation form and clicks "Estimate Cost"
estimate = mcp_client.estimate_cost(
    model="meta-llama/Llama-3.1-8B",
    agent_type="both",
    num_tests=100,
    hardware="auto"
)

# Display in dialog
gr.Info(estimate)
```

**Example 3: Agent Multi-Step Query**
```python
# User asks: "What are the top 3 models and how much do they cost?"

# Agent reasoning (internal):
#   Step 1: Need to get top models by success rate
#   → Call get_top_performers(metric="success_rate", top_n=3)
#
#   Step 2: Extract cost information from results
#   → Parse JSON response, get "total_cost_usd" field
#
#   Step 3: Format response for user
#   → Create markdown table with model names, success rates, costs

# Agent response:
"""
Here are the top 3 models by success rate:

1. **GPT-4**: 95.8% success rate, $0.05 per run
2. **Claude-3**: 94.1% success rate, $0.04 per run
3. **Llama-3.1-8B**: 93.4% success rate, $0.002 per run

GPT-4 leads in accuracy but is 25x more expensive than Llama-3.1.
For cost-sensitive workloads, Llama-3.1 offers the best value.
"""
```

---

## Development Guide

### Adding New MCP Tool Integration

1. **Add method to async client** (`client.py`):
```python
async def new_tool_name(self, **kwargs) -> str:
    """Wrapper for new_tool_name MCP tool"""
    return await self.call_tool("new_tool_name", kwargs)
```

2. **Add synchronous wrapper** (`sync_wrapper.py`):
```python
def new_tool_name(self, **kwargs) -> str:
    """Synchronous wrapper for new_tool_name"""
    if not self._connected:
        self.initialize()
    return self._run_async(self.async_client.new_tool_name(**kwargs))
```

3. **Use in Gradio app** (`app.py`):
```python
def handle_new_tool():
    result = mcp_client.new_tool_name(param1="value1", param2="value2")
    return result
```

**Note**: Agent automatically discovers new tools from MCP server, no code changes needed!

### Testing MCP Integration

**Test 1: Connection**
```python
python -c "from mcp_client.sync_wrapper import get_sync_mcp_client; client = get_sync_mcp_client(); client.initialize(); print('✅ MCP client connected')"
```

**Test 2: Tool Call**
```python
from mcp_client.sync_wrapper import get_sync_mcp_client

client = get_sync_mcp_client()
client.initialize()

result = client.analyze_leaderboard(
    metric_focus="cost",
    time_range="last_week",
    top_n=3
)

print(result)
```

**Test 3: Agent**
```python
from agent.smolagents_setup import create_agent, run_agent_query

agent = create_agent()
response = run_agent_query(agent, "What are the top 3 models?", show_reasoning=True)
print(response)
```

### Debugging MCP Issues

**Issue**: Connection timeout
- **Check**: MCP server is running at specified URL
- **Check**: Network connectivity to HuggingFace Spaces
- **Check**: SSE transport is enabled on server

**Issue**: Tool not found
- **Check**: MCP server has the tool implemented
- **Check**: Tool name matches exactly (case-sensitive)
- **Check**: Client initialized successfully (call `initialize()` first)

**Issue**: Agent not using MCP tools
- **Check**: MCPClient is properly configured in agent setup
- **Check**: Agent has `max_steps > 0` to allow tool usage
- **Check**: Query requires tool usage (not answerable from agent's knowledge alone)

---

## Performance Considerations

### Token Optimization

**Problem**: Loading full leaderboard dataset consumes excessive tokens
**Solution**: Use token-optimized MCP tools

```python
# ❌ BAD: Loads all 51 runs (50K+ tokens)
leaderboard = mcp_client.get_dataset("kshitijthakkar/smoltrace-leaderboard")

# ✅ GOOD: Returns only top 5 (5K tokens, 90% reduction)
top_performers = mcp_client.get_top_performers(top_n=5)

# ✅ BETTER: Returns summary stats (500 tokens, 99% reduction)
summary = mcp_client.get_leaderboard_summary()
```

### Caching

**Problem**: Repeated identical MCP calls waste time and credits
**Solution**: Implement client-side caching

```python
from functools import lru_cache
import time

@lru_cache(maxsize=32)
def cached_analyze_leaderboard(metric_focus: str, time_range: str, top_n: int, cache_key: int):
    """Cached MCP call with TTL via cache_key"""
    return mcp_client.analyze_leaderboard(
        metric_focus=metric_focus,
        time_range=time_range,
        top_n=top_n
    )

# Use with 5-minute cache TTL
cache_key = int(time.time() // 300)  # Changes every 5 minutes
insights = cached_analyze_leaderboard("overall", "last_week", 5, cache_key)
```

### Async Optimization

**Problem**: Sequential MCP calls block UI
**Solution**: Use async for parallel calls

```python
import asyncio

async def load_leaderboard_with_insights():
    """Load leaderboard and insights in parallel"""
    # Start both operations concurrently
    leaderboard_task = asyncio.create_task(load_dataset_async("kshitijthakkar/smoltrace-leaderboard"))
    insights_task = asyncio.create_task(mcp_client.analyze_leaderboard(metric_focus="overall"))

    # Wait for both to complete
    leaderboard, insights = await asyncio.gather(leaderboard_task, insights_task)

    return leaderboard, insights
```

---

## Security Considerations

### API Key Management

**DO**:
- Store API keys in environment variables or HF Spaces secrets
- Use session-only storage in Gradio (not server-side persistence)
- Rotate keys regularly

**DON'T**:
- Hardcode API keys in source code
- Expose keys in client-side JavaScript
- Log API keys in console or files

### MCP Server Trust

**Verify MCP server authenticity**:
- Use HTTPS URLs only
- Verify domain ownership (huggingface.co spaces)
- Review MCP server code before connecting (open source)

**Limit tool access**:
- Only connect to trusted MCP servers
- Review tool permissions before use
- Implement rate limiting for tool calls

---

## Related Documentation

- [USER_GUIDE.md](USER_GUIDE.md) - Complete UI walkthrough
- [ARCHITECTURE.md](ARCHITECTURE.md) - Technical architecture
- [TraceMind MCP Server Documentation](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server)

---

**Last Updated**: November 21, 2025