Spaces:

MCP-1st-Birthday
/

TraceMind

Running

App Files Files Community

TraceMind / MCP_INTEGRATION.md

kshitijthakkar

docs: Update Gemini model version and fix typos

040fd52 29 days ago

preview code

raw

history blame contribute delete

23.2 kB

	# TraceMind-AI - MCP Integration Guide

	This document explains how TraceMind-AI integrates with MCP servers to provide AI-powered agent evaluation.

	## Table of Contents

	- [Overview](#overview)
	- [Dual MCP Integration](#dual-mcp-integration)
	- [Architecture](#architecture)
	- [MCP Client Implementation](#mcp-client-implementation)
	- [Agent Framework Integration](#agent-framework-integration)
	- [MCP Tools Usage](#mcp-tools-usage)
	- [Development Guide](#development-guide)

	---

	## Overview

	TraceMind-AI demonstrates enterprise MCP client usage as part of the Track 2: MCP in Action submission. It showcases two distinct patterns of MCP integration:

	1. Direct MCP Client: Python-based client connecting to remote MCP server via SSE transport
	2. Autonomous Agent: `smolagents`-based agent with access to MCP tools for multi-step reasoning

	Both patterns consume the same MCP server ([TraceMind-mcp-server](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server)) to provide AI-powered analysis of agent evaluation data.

	---

	## Dual MCP Integration

	### Pattern 1: Direct MCP Client Integration

	Where: Leaderboard insights, cost estimation dialogs, trace debugging

	How it works:
	```python
	# TraceMind-AI calls MCP server directly
	mcp_client = get_sync_mcp_client()
	insights = mcp_client.analyze_leaderboard(
	metric_focus="overall",
	time_range="last_week",
	top_n=5
	)
	# Display insights in UI
	```

	Use cases:
	- Generate leaderboard insights when user clicks "Load Leaderboard"
	- Estimate costs when user clicks "Estimate Cost" in New Evaluation form
	- Debug traces when user asks questions in trace visualization

	Advantages:
	- Direct, fast execution
	- Synchronous API (easy to integrate with Gradio)
	- Predictable, structured responses

	---

	### Pattern 2: Autonomous Agent with MCP Tools

	Where: Agent Chat tab

	How it works:
	```python
	# smolagents agent discovers and uses MCP tools autonomously
	from smolagents import ToolCallingAgent, MCPClient

	# Agent initialized with MCP client
	agent = ToolCallingAgent(
	tools=[], # Tools loaded from MCP server
	model=model_client,
	mcp_client=MCPClient(mcp_server_url)
	)

	# User asks question
	result = agent.run("What are the top 3 models and their costs?")

	# Agent plans:
	# 1. Call get_top_performers MCP tool
	# 2. Extract costs from results
	# 3. Format and present to user
	```

	Use cases:
	- Answer complex questions requiring multi-step analysis
	- Compare models across multiple dimensions
	- Plan evaluation strategies with cost estimates
	- Provide recommendations based on leaderboard data

	Advantages:
	- Natural language interface
	- Multi-step reasoning
	- Autonomous tool selection
	- Context-aware responses

	---

	## Architecture

	### System Overview

	```
	┌─────────────────────────────────────────────────────────────┐
	│ TraceMind-AI (Gradio App) - Track 2 │
	│ │
	│ ┌─────────────────────────────────────────────────────────┐ │
	│ │ UI Layer (Gradio) │ │
	│ │ - Leaderboard tab │ │
	│ │ - Agent Chat tab │ │
	│ │ - New Evaluation tab │ │
	│ │ - Trace Visualization tab │ │
	│ └────────────┬─────────────────────────────┬──────────────┘ │
	│ ↓ ↓ │
	│ ┌───────────────────────┐ ┌──────────────────────────┐ │
	│ │ Direct MCP Client │ │ Autonomous Agent │ │
	│ │ (sync_wrapper.py) │ │ (smolagents) │ │
	│ │ │ │ │ │
	│ │ - Synchronous API │ │ - Multi-step reasoning │ │
	│ │ - Tool calling │ │ - Tool discovery │ │
	│ │ - Error handling │ │ - Context management │ │
	│ └───────────┬───────────┘ └─────────────┬────────────┘ │
	│ └─────────────────┬─────────────┘ │
	│ ↓ │
	│ MCP Protocol │
	│ (SSE Transport) │
	└────────────────────────────────┬────────────────────────────┘
	↓
	┌─────────────────────────────────────────────────────────────┐
	│ TraceMind MCP Server - Track 1 │
	│ https://huggingface.co/spaces/MCP-1st-Birthday/ │
	│ TraceMind-mcp-server │
	│ │
	│ 11 AI-Powered Tools: │
	│ - analyze_leaderboard │
	│ - debug_trace │
	│ - estimate_cost │
	│ - compare_runs │
	│ - analyze_results │
	│ - get_top_performers │
	│ - get_leaderboard_summary │
	│ - get_dataset │
	│ - generate_synthetic_dataset │
	│ - push_dataset_to_hub │
	│ - generate_prompt_template │
	└─────────────────────────────────────────────────────────────┘
	```

	---

	## MCP Client Implementation

	### File Structure

	```
	TraceMind-AI/
	├── mcp_client/
	│ ├── __init__.py
	│ ├── client.py # Async MCP client
	│ └── sync_wrapper.py # Synchronous wrapper for Gradio
	├── agent/
	│ ├── __init__.py
	│ └── smolagents_setup.py # Agent with MCP integration
	└── app.py # Main Gradio app
	```

	### Async MCP Client (`client.py`)

	```python
	from mcp import ClientSession, StdioServerParameters
	import mcp.types as types

	class TraceMindMCPClient:
	"""Async MCP client for TraceMind MCP Server"""

	def __init__(self, mcp_server_url: str):
	self.mcp_server_url = mcp_server_url
	self.session = None

	async def connect(self):
	"""Establish connection to MCP server via SSE"""
	# For HTTP-based MCP servers (HuggingFace Spaces)
	self.session = ClientSession(
	ServerParameters(
	url=self.mcp_server_url,
	transport="sse"
	)
	)
	await self.session.__aenter__()

	# List available tools
	tools_result = await self.session.list_tools()
	self.available_tools = {tool.name: tool for tool in tools_result.tools}

	print(f"Connected to MCP server. Available tools: {list(self.available_tools.keys())}")

	async def call_tool(self, tool_name: str, arguments: dict) -> str:
	"""Call an MCP tool with given arguments"""
	if not self.session:
	raise RuntimeError("MCP client not connected. Call connect() first.")

	if tool_name not in self.available_tools:
	raise ValueError(f"Tool '{tool_name}' not available. Available: {list(self.available_tools.keys())}")

	# Call the tool
	result = await self.session.call_tool(tool_name, arguments=arguments)

	# Extract text response
	if result.content and len(result.content) > 0:
	return result.content[0].text
	return ""

	async def analyze_leaderboard(self, **kwargs) -> str:
	"""Wrapper for analyze_leaderboard tool"""
	return await self.call_tool("analyze_leaderboard", kwargs)

	async def estimate_cost(self, **kwargs) -> str:
	"""Wrapper for estimate_cost tool"""
	return await self.call_tool("estimate_cost", kwargs)

	async def debug_trace(self, **kwargs) -> str:
	"""Wrapper for debug_trace tool"""
	return await self.call_tool("debug_trace", kwargs)

	async def compare_runs(self, **kwargs) -> str:
	"""Wrapper for compare_runs tool"""
	return await self.call_tool("compare_runs", kwargs)

	async def get_top_performers(self, **kwargs) -> str:
	"""Wrapper for get_top_performers tool"""
	return await self.call_tool("get_top_performers", kwargs)

	async def disconnect(self):
	"""Close MCP connection"""
	if self.session:
	await self.session.__aexit__(None, None, None)
	```

	### Synchronous Wrapper (`sync_wrapper.py`)

	```python
	import asyncio
	from typing import Optional
	from .client import TraceMindMCPClient

	class SyncMCPClient:
	"""Synchronous wrapper for async MCP client (Gradio-compatible)"""

	def __init__(self, mcp_server_url: str):
	self.mcp_server_url = mcp_server_url
	self.async_client = TraceMindMCPClient(mcp_server_url)
	self._connected = False

	def _run_async(self, coro):
	"""Run async coroutine in sync context"""
	try:
	loop = asyncio.get_event_loop()
	except RuntimeError:
	loop = asyncio.new_event_loop()
	asyncio.set_event_loop(loop)

	return loop.run_until_complete(coro)

	def initialize(self):
	"""Connect to MCP server"""
	if not self._connected:
	self._run_async(self.async_client.connect())
	self._connected = True

	def analyze_leaderboard(self, **kwargs) -> str:
	"""Synchronous wrapper for analyze_leaderboard"""
	if not self._connected:
	self.initialize()
	return self._run_async(self.async_client.analyze_leaderboard(**kwargs))

	def estimate_cost(self, **kwargs) -> str:
	"""Synchronous wrapper for estimate_cost"""
	if not self._connected:
	self.initialize()
	return self._run_async(self.async_client.estimate_cost(**kwargs))

	def debug_trace(self, **kwargs) -> str:
	"""Synchronous wrapper for debug_trace"""
	if not self._connected:
	self.initialize()
	return self._run_async(self.async_client.debug_trace(**kwargs))

	# ... (similar wrappers for other tools)

	# Global instance for use in Gradio app
	_mcp_client: Optional[SyncMCPClient] = None

	def get_sync_mcp_client() -> SyncMCPClient:
	"""Get or create global sync MCP client instance"""
	global _mcp_client
	if _mcp_client is None:
	mcp_server_url = os.getenv(
	"MCP_SERVER_URL",
	"https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse"
	)
	_mcp_client = SyncMCPClient(mcp_server_url)
	return _mcp_client
	```

	### Usage in Gradio App

	```python
	# app.py
	from mcp_client.sync_wrapper import get_sync_mcp_client

	# Initialize MCP client
	mcp_client = get_sync_mcp_client()
	mcp_client.initialize()

	# Use in Gradio event handlers
	def load_leaderboard():
	"""Load leaderboard and generate AI insights"""
	# Load dataset
	ds = load_dataset("kshitijthakkar/smoltrace-leaderboard")
	df = pd.DataFrame(ds)

	# Get AI insights from MCP server
	try:
	insights = mcp_client.analyze_leaderboard(
	metric_focus="overall",
	time_range="last_week",
	top_n=5
	)
	except Exception as e:
	insights = f"❌ Error generating insights: {str(e)}"

	return df, insights

	# Gradio UI
	with gr.Blocks() as app:
	with gr.Tab("📊 Leaderboard"):
	load_btn = gr.Button("Load Leaderboard")
	insights_md = gr.Markdown(label="AI Insights")
	leaderboard_table = gr.Dataframe()

	load_btn.click(
	fn=load_leaderboard,
	outputs=[leaderboard_table, insights_md]
	)
	```

	---

	## Agent Framework Integration

	### smolagents Setup

	```python
	# agent/smolagents_setup.py
	from smolagents import ToolCallingAgent, MCPClient, HfApiModel
	import os

	def create_agent():
	"""Create smolagents agent with MCP tool access"""

	# 1. Configure MCP client
	mcp_server_url = os.getenv(
	"MCP_SERVER_URL",
	"https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse"
	)

	mcp_client = MCPClient(mcp_server_url)

	# 2. Configure LLM
	model = HfApiModel(
	model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
	token=os.getenv("HF_TOKEN")
	)

	# 3. Create agent with MCP tools
	agent = ToolCallingAgent(
	tools=[], # MCP tools loaded automatically
	model=model,
	mcp_client=mcp_client,
	max_steps=10,
	verbosity_level=1
	)

	return agent

	def run_agent_query(agent: ToolCallingAgent, query: str, show_reasoning: bool = False):
	"""Run agent query and return response"""
	try:
	# Set verbosity based on show_reasoning flag
	if show_reasoning:
	agent.verbosity_level = 2 # Show tool execution logs
	else:
	agent.verbosity_level = 0 # Only show final answer

	# Run agent
	result = agent.run(query)

	return result
	except Exception as e:
	return f"❌ Agent error: {str(e)}"
	```

	### Agent Chat UI

	```python
	# app.py
	from agent.smolagents_setup import create_agent, run_agent_query

	# Initialize agent (once at startup)
	agent = create_agent()

	def agent_chat(message: str, history: list, show_reasoning: bool):
	"""Handle agent chat interaction"""
	# Run agent query
	response = run_agent_query(agent, message, show_reasoning)

	# Update chat history
	history.append((message, response))

	return history, ""

	# Gradio UI
	with gr.Blocks() as app:
	with gr.Tab("🤖 Agent Chat"):
	gr.Markdown("## Autonomous Agent with MCP Tools")
	gr.Markdown("Ask questions about agent evaluations. The agent has access to all MCP tools.")

	chatbot = gr.Chatbot(label="Agent Chat")
	msg = gr.Textbox(label="Your Question", placeholder="What are the top 3 models and their costs?")
	show_reasoning = gr.Checkbox(label="Show Agent Reasoning", value=False)

	# Quick action buttons
	with gr.Row():
	quick_top = gr.Button("Quick: Top Models")
	quick_cost = gr.Button("Quick: Cost Estimate")
	quick_load = gr.Button("Quick: Load Leaderboard")

	# Event handlers
	msg.submit(agent_chat, [msg, chatbot, show_reasoning], [chatbot, msg])

	quick_top.click(
	lambda h, sr: agent_chat(
	"What are the top 5 models by success rate with their costs?",
	h,
	sr
	),
	[chatbot, show_reasoning],
	[chatbot, msg]
	)
	```

	---

	## MCP Tools Usage

	### Tools Used in TraceMind-AI

	\| Tool \| Where Used \| Purpose \|
	\|------\|-----------\|---------\|
	\| `analyze_leaderboard` \| Leaderboard tab \| Generate AI insights when user loads leaderboard \|
	\| `estimate_cost` \| New Evaluation tab \| Predict costs before submitting evaluation \|
	\| `debug_trace` \| Trace Visualization \| Answer questions about execution traces \|
	\| `compare_runs` \| Compare Runs/Agent Chat \| Compare two evaluation runs side-by-side \|
	\| `analyze_results` \| Agent Chat \| Analyze detailed test results with optimization recommendations \|
	\| `get_top_performers` \| Agent Chat \| Efficiently fetch top N models (90% token reduction) \|
	\| `get_leaderboard_summary` \| Agent Chat \| Get high-level statistics (99% token reduction) \|
	\| `get_dataset` \| Agent Chat \| Load SMOLTRACE datasets for detailed analysis \|

	### Example Tool Calls

	Example 1: Leaderboard Insights
	```python
	# User clicks "Load Leaderboard" button
	insights = mcp_client.analyze_leaderboard(
	leaderboard_repo="kshitijthakkar/smoltrace-leaderboard",
	metric_focus="overall",
	time_range="last_week",
	top_n=5
	)

	# Display in Gradio Markdown component
	insights_md.value = insights
	```

	Example 2: Cost Estimation
	```python
	# User fills New Evaluation form and clicks "Estimate Cost"
	estimate = mcp_client.estimate_cost(
	model="meta-llama/Llama-3.1-8B",
	agent_type="both",
	num_tests=100,
	hardware="auto"
	)

	# Display in dialog
	gr.Info(estimate)
	```

	Example 3: Agent Multi-Step Query
	```python
	# User asks: "What are the top 3 models and how much do they cost?"

	# Agent reasoning (internal):
	# Step 1: Need to get top models by success rate
	# → Call get_top_performers(metric="success_rate", top_n=3)
	#
	# Step 2: Extract cost information from results
	# → Parse JSON response, get "total_cost_usd" field
	#
	# Step 3: Format response for user
	# → Create markdown table with model names, success rates, costs

	# Agent response:
	"""
	Here are the top 3 models by success rate:

	1. GPT-4: 95.8% success rate, $0.05 per run
	2. Claude-3: 94.1% success rate, $0.04 per run
	3. Llama-3.1-8B: 93.4% success rate, $0.002 per run

	GPT-4 leads in accuracy but is 25x more expensive than Llama-3.1.
	For cost-sensitive workloads, Llama-3.1 offers the best value.
	"""
	```

	---

	## Development Guide

	### Adding New MCP Tool Integration

	1. Add method to async client (`client.py`):
	```python
	async def new_tool_name(self, **kwargs) -> str:
	"""Wrapper for new_tool_name MCP tool"""
	return await self.call_tool("new_tool_name", kwargs)
	```

	2. Add synchronous wrapper (`sync_wrapper.py`):
	```python
	def new_tool_name(self, **kwargs) -> str:
	"""Synchronous wrapper for new_tool_name"""
	if not self._connected:
	self.initialize()
	return self._run_async(self.async_client.new_tool_name(**kwargs))
	```

	3. Use in Gradio app (`app.py`):
	```python
	def handle_new_tool():
	result = mcp_client.new_tool_name(param1="value1", param2="value2")
	return result
	```

	Note: Agent automatically discovers new tools from MCP server, no code changes needed!

	### Testing MCP Integration

	Test 1: Connection
	```python
	python -c "from mcp_client.sync_wrapper import get_sync_mcp_client; client = get_sync_mcp_client(); client.initialize(); print('✅ MCP client connected')"
	```

	Test 2: Tool Call
	```python
	from mcp_client.sync_wrapper import get_sync_mcp_client

	client = get_sync_mcp_client()
	client.initialize()

	result = client.analyze_leaderboard(
	metric_focus="cost",
	time_range="last_week",
	top_n=3
	)

	print(result)
	```

	Test 3: Agent
	```python
	from agent.smolagents_setup import create_agent, run_agent_query

	agent = create_agent()
	response = run_agent_query(agent, "What are the top 3 models?", show_reasoning=True)
	print(response)
	```

	### Debugging MCP Issues

	Issue: Connection timeout
	- Check: MCP server is running at specified URL
	- Check: Network connectivity to HuggingFace Spaces
	- Check: SSE transport is enabled on server

	Issue: Tool not found
	- Check: MCP server has the tool implemented
	- Check: Tool name matches exactly (case-sensitive)
	- Check: Client initialized successfully (call `initialize()` first)

	Issue: Agent not using MCP tools
	- Check: MCPClient is properly configured in agent setup
	- Check: Agent has `max_steps > 0` to allow tool usage
	- Check: Query requires tool usage (not answerable from agent's knowledge alone)

	---

	## Performance Considerations

	### Token Optimization

	Problem: Loading full leaderboard dataset consumes excessive tokens
	Solution: Use token-optimized MCP tools

	```python
	# ❌ BAD: Loads all 51 runs (50K+ tokens)
	leaderboard = mcp_client.get_dataset("kshitijthakkar/smoltrace-leaderboard")

	# ✅ GOOD: Returns only top 5 (5K tokens, 90% reduction)
	top_performers = mcp_client.get_top_performers(top_n=5)

	# ✅ BETTER: Returns summary stats (500 tokens, 99% reduction)
	summary = mcp_client.get_leaderboard_summary()
	```

	### Caching

	Problem: Repeated identical MCP calls waste time and credits
	Solution: Implement client-side caching

	```python
	from functools import lru_cache
	import time

	@lru_cache(maxsize=32)
	def cached_analyze_leaderboard(metric_focus: str, time_range: str, top_n: int, cache_key: int):
	"""Cached MCP call with TTL via cache_key"""
	return mcp_client.analyze_leaderboard(
	metric_focus=metric_focus,
	time_range=time_range,
	top_n=top_n
	)

	# Use with 5-minute cache TTL
	cache_key = int(time.time() // 300) # Changes every 5 minutes
	insights = cached_analyze_leaderboard("overall", "last_week", 5, cache_key)
	```

	### Async Optimization

	Problem: Sequential MCP calls block UI
	Solution: Use async for parallel calls

	```python
	import asyncio

	async def load_leaderboard_with_insights():
	"""Load leaderboard and insights in parallel"""
	# Start both operations concurrently
	leaderboard_task = asyncio.create_task(load_dataset_async("kshitijthakkar/smoltrace-leaderboard"))
	insights_task = asyncio.create_task(mcp_client.analyze_leaderboard(metric_focus="overall"))

	# Wait for both to complete
	leaderboard, insights = await asyncio.gather(leaderboard_task, insights_task)

	return leaderboard, insights
	```

	---

	## Security Considerations

	### API Key Management

	DO:
	- Store API keys in environment variables or HF Spaces secrets
	- Use session-only storage in Gradio (not server-side persistence)
	- Rotate keys regularly

	DON'T:
	- Hardcode API keys in source code
	- Expose keys in client-side JavaScript
	- Log API keys in console or files

	### MCP Server Trust

	Verify MCP server authenticity:
	- Use HTTPS URLs only
	- Verify domain ownership (huggingface.co spaces)
	- Review MCP server code before connecting (open source)

	Limit tool access:
	- Only connect to trusted MCP servers
	- Review tool permissions before use
	- Implement rate limiting for tool calls

	---

	## Related Documentation

	- [USER_GUIDE.md](USER_GUIDE.md) - Complete UI walkthrough
	- [ARCHITECTURE.md](ARCHITECTURE.md) - Technical architecture
	- [TraceMind MCP Server Documentation](https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind-mcp-server)

	---

	Last Updated: November 21, 2025