Spaces:

MCP-1st-Birthday
/

TraceMind

Running

File size: 31,007 Bytes

# TraceMind-AI - Technical Architecture

This document provides a deep technical dive into the TraceMind-AI architecture, implementation details, and system design.

## Table of Contents

- [System Overview](#system-overview)
- [Project Structure](#project-structure)
- [Core Components](#core-components)
- [MCP Client Architecture](#mcp-client-architecture)
- [Agent Framework Integration](#agent-framework-integration)
- [Data Flow](#data-flow)
- [Authentication & Authorization](#authentication--authorization)
- [Screen Navigation](#screen-navigation)
- [Job Submission Architecture](#job-submission-architecture)
- [Deployment](#deployment)
- [Performance Optimization](#performance-optimization)

---

## System Overview

TraceMind-AI is a comprehensive Gradio-based web application for evaluating AI agent performance. It serves as the user-facing platform in the TraceMind ecosystem, demonstrating enterprise MCP client usage (Track 2: MCP in Action).

### Technology Stack

| Component | Technology | Version | Purpose |
|-----------|-----------|---------|---------|
| **UI Framework** | Gradio | 5.49.1 | Web interface with components |
| **MCP Client** | MCP Python SDK | Latest | Connect to MCP servers |
| **Agent Framework** | smolagents | 1.22.0+ | Autonomous agent with MCP tools |
| **Data Source** | HuggingFace Datasets | Latest | Load evaluation results |
| **Authentication** | HuggingFace OAuth | - | User authentication |
| **Job Platforms** | HF Jobs + Modal | - | Evaluation job submission |
| **Language** | Python | 3.10+ | Core implementation |

### High-Level Architecture

```
┌─────────────────────────────────────────────────────────────┐
│ User Browser                                                 │
│  - Gradio Interface (React-based)                           │
│  - OAuth Flow (HuggingFace)                                 │
└──────────────┬──────────────────────────────────────────────┘
               │
               │ HTTP/WebSocket
               ↓
┌─────────────────────────────────────────────────────────────┐
│ TraceMind-AI (Gradio App) - Track 2                         │
│                                                               │
│  ┌─────────────────────────────────────────────────────┐   │
│  │ Screen Layer (screens/)                             │   │
│  │  - Leaderboard                                       │   │
│  │  - Agent Chat                                        │   │
│  │  - New Evaluation                                    │   │
│  │  - Job Monitoring                                    │   │
│  │  - Trace Detail                                      │   │
│  │  - Settings                                          │   │
│  └────────────┬────────────────────────────────────────┘   │
│               │                                              │
│  ┌────────────┴────────────────────────────────────────┐   │
│  │ Component Layer (components/)                       │   │
│  │  - Leaderboard Table (Custom HTML)                  │   │
│  │  - Analytics Charts                                  │   │
│  │  - Metric Displays                                   │   │
│  │  - Report Cards                                      │   │
│  └────────────┬────────────────────────────────────────┘   │
│               │                                              │
│  ┌────────────┴────────────────────────────────────────┐   │
│  │ Service Layer                                        │   │
│  │  ┌──────────────────┐  ┌──────────────────┐        │   │
│  │  │ MCP Client       │  │ Data Loader      │        │   │
│  │  │ (mcp_client/)    │  │ (data_loader.py) │        │   │
│  │  └──────────────────┘  └──────────────────┘        │   │
│  │  ┌──────────────────┐  ┌──────────────────┐        │   │
│  │  │ Agent (smolagents│  │ Job Submission   │        │   │
│  │  │ screens/chat.py) │  │ (utils/)         │        │   │
│  │  └──────────────────┘  └──────────────────┘        │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                               │
└───────────┬───────────────────────────────────┬─────────────┘
            │                                   │
            ↓                                   ↓
┌───────────────────────┐         ┌───────────────────────┐
│ TraceMind MCP Server  │         │ External Services     │
│ (Track 1)             │         │  - HF Datasets        │
│  - 11 AI Tools        │         │  - HF Jobs            │
│  - 3 Resources        │         │  - Modal              │
│  - 3 Prompts          │         │  - LLM APIs           │
└───────────────────────┘         └───────────────────────┘
```

---

## Project Structure

```
TraceMind-AI/
├── app.py                          # Main entry point, Gradio app
│
├── screens/                        # UI screens (6 tabs)
│   ├── __init__.py
│   ├── leaderboard.py             # Screen 1: Leaderboard with AI insights
│   ├── chat.py                    # Screen 2: Agent Chat (smolagents)
│   ├── dashboard.py               # Screen 3: New Evaluation
│   ├── job_monitoring.py          # Screen 4: Job Status Tracking
│   ├── trace_detail.py            # Screen 5: Trace Visualization
│   ├── settings.py                # Screen 6: API Key Configuration
│   ├── compare.py                 # Screen 7: Run Comparison (optional)
│   ├── documentation.py           # Screen 8: API Documentation
│   └── mcp_helpers.py             # Shared MCP client helpers
│
├── components/                     # Reusable UI components
│   ├── __init__.py
│   ├── leaderboard_table.py       # Custom HTML table component
│   ├── analytics_charts.py        # Performance charts (Plotly)
│   ├── metric_displays.py         # Metric cards and badges
│   ├── report_cards.py            # Summary report cards
│   └── thought_graph.py           # Agent reasoning visualization
│
├── mcp_client/                     # MCP client implementation
│   ├── __init__.py
│   ├── client.py                  # Async MCP client
│   └── sync_wrapper.py            # Synchronous wrapper for Gradio
│
├── utils/                          # Utility modules
│   ├── __init__.py
│   ├── auth.py                    # HuggingFace OAuth
│   ├── navigation.py              # Screen navigation state
│   ├── hf_jobs_submission.py      # HuggingFace Jobs integration
│   └── modal_job_submission.py    # Modal integration
│
├── styles/                         # Custom styling
│   ├── __init__.py
│   └── tracemind_theme.py         # Gradio theme customization
│
├── data_loader.py                  # Dataset loading and caching
├── requirements.txt                # Python dependencies
├── .env.example                    # Environment variable template
├── .gitignore
├── README.md                       # Project documentation
└── USER_GUIDE.md                   # Complete user guide

Total: ~35 files, ~8,000 lines of code
```

### File Breakdown

| Directory | Files | Lines | Purpose |
|-----------|-------|-------|---------|
| `screens/` | 9 | ~3,500 | UI screen implementations |
| `components/` | 5 | ~1,200 | Reusable UI components |
| `mcp_client/` | 3 | ~800 | MCP client integration |
| `utils/` | 4 | ~1,500 | Authentication, jobs, navigation |
| `styles/` | 2 | ~300 | Custom theme and CSS |
| Root | 3 | ~700 | Main app, data loader, config |

---

## Core Components

### 1. app.py - Main Application

**Purpose**: Entry point, orchestrates all screens and manages global state.

**Architecture**:

```python
# app.py structure
import gradio as gr
from screens import *
from mcp_client.sync_wrapper import get_sync_mcp_client
from utils.auth import auth_ui
from data_loader import DataLoader

# 1. Initialize services
mcp_client = get_sync_mcp_client()
mcp_client.initialize()
data_loader = DataLoader()

# 2. Create Gradio app
with gr.Blocks(theme=tracemind_theme) as app:
    # Global state
    gr.State(...)  # User session, navigation, etc.

    # Authentication (if not disabled)
    if not DISABLE_OAUTH:
        auth_ui()

    # Main tabs
    with gr.Tabs():
        with gr.Tab("📊 Leaderboard"):
            leaderboard_screen()

        with gr.Tab("🤖 Agent Chat"):
            chat_screen()

        with gr.Tab("🚀 New Evaluation"):
            dashboard_screen()

        with gr.Tab("📈 Job Monitoring"):
            job_monitoring_screen()

        with gr.Tab("⚙️ Settings"):
            settings_screen()

# 3. Launch
if __name__ == "__main__":
    app.launch(
        server_name="0.0.0.0",
        server_port=7860,
        share=False
    )
```

**Key Responsibilities**:
- Initialize MCP client and data loader (global instances)
- Create tabbed interface with all screens
- Manage authentication flow
- Handle global state (user session, API keys)

---

### 2. Screen Layer (screens/)

Each screen is a self-contained module that returns a Gradio component tree.

#### screens/leaderboard.py

**Purpose**: Display evaluation results with AI-powered insights.

**Components**:
- Load button
- AI insights panel (Markdown) - powered by MCP server
- Leaderboard table (custom HTML component)
- Filter controls (agent type, provider)

**MCP Integration**:
```python
def load_leaderboard(mcp_client):
    # 1. Load dataset
    ds = load_dataset("kshitijthakkar/smoltrace-leaderboard")
    df = pd.DataFrame(ds)

    # 2. Get AI insights from MCP server
    insights = mcp_client.analyze_leaderboard(
        metric_focus="overall",
        time_range="last_week",
        top_n=5
    )

    # 3. Render table with custom component
    table_html = render_leaderboard_table(df)

    return insights, table_html
```

#### screens/chat.py

**Purpose**: Autonomous agent interface with MCP tool access.

**Agent Setup**:
```python
from smolagents import ToolCallingAgent, MCPClient, HfApiModel

# Initialize agent with MCP client
def create_agent():
    mcp_client = MCPClient(MCP_SERVER_URL)

    model = HfApiModel(
        model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
        token=os.getenv("HF_TOKEN")
    )

    agent = ToolCallingAgent(
        tools=[],  # MCP tools loaded automatically
        model=model,
        mcp_client=mcp_client,
        max_steps=10
    )

    return agent

# Chat interaction
def agent_chat(message, history, show_reasoning):
    if show_reasoning:
        agent.verbosity_level = 2  # Show tool execution
    else:
        agent.verbosity_level = 0  # Only final answer

    response = agent.run(message)
    history.append((message, response))

    return history, ""
```

**MCP Tool Access**:
Agent automatically discovers and uses all 11 MCP tools from TraceMind MCP Server.

#### screens/dashboard.py

**Purpose**: Submit evaluation jobs to HuggingFace Jobs or Modal.

**Key Functions**:
- Model selection (text input)
- Infrastructure choice (HF Jobs / Modal)
- Hardware selection (auto / manual)
- Cost estimation (MCP-powered)
- Job submission

**Cost Estimation Flow**:
```python
def estimate_cost_click(model, agent_type, num_tests, hardware, mcp_client):
    # Call MCP server for cost estimate
    estimate = mcp_client.estimate_cost(
        model=model,
        agent_type=agent_type,
        num_tests=num_tests,
        hardware=hardware
    )

    return estimate  # Display in dialog
```

**Job Submission Flow**:
```python
def submit_job(model, agent_type, hardware, infrastructure, api_keys):
    if infrastructure == "HuggingFace Jobs":
        job_id = submit_hf_job(model, agent_type, hardware, api_keys)
    elif infrastructure == "Modal":
        job_id = submit_modal_job(model, agent_type, hardware, api_keys)

    return f"✅ Job submitted: {job_id}"
```

#### screens/job_monitoring.py

**Purpose**: Track status of submitted jobs.

**Data Source**: HuggingFace Jobs API or Modal API

**Refresh Strategy**:
- Manual refresh button
- Auto-refresh every 30 seconds (optional)

#### screens/trace_detail.py

**Purpose**: Visualize OpenTelemetry traces with GPU metrics.

**Components**:
- Waterfall diagram (spans timeline)
- Span details panel
- GPU metrics overlay (for GPU jobs)
- MCP-powered Q&A

**Trace Loading**:
```python
def load_trace(trace_id, traces_repo):
    # Load trace dataset
    ds = load_dataset(traces_repo)
    trace_data = ds.filter(lambda x: x["trace_id"] == trace_id)[0]

    # Render waterfall
    waterfall_html = render_waterfall(trace_data["spans"])

    return waterfall_html
```

**MCP Q&A**:
```python
def ask_trace_question(trace_id, traces_repo, question, mcp_client):
    # Call MCP server to debug trace
    answer = mcp_client.debug_trace(
        trace_id=trace_id,
        traces_repo=traces_repo,
        question=question
    )

    return answer
```

#### screens/settings.py

**Purpose**: Configure API keys and preferences.

**Security**:
- Keys stored in Gradio State (session-only, not server-side)
- All forms use `api_name=False` (not exposed via API)
- HTTPS encryption for all API calls

**Configuration Options**:
- Gemini API Key
- HuggingFace Token
- Modal Token ID + Secret
- LLM Provider Keys (OpenAI, Anthropic, etc.)

---

### 3. Component Layer (components/)

Reusable UI components that can be used across multiple screens.

#### components/leaderboard_table.py

**Purpose**: Custom HTML table with sorting, filtering, and styling.

**Why Custom Component?**:
- Gradio's default Dataframe component lacks advanced styling
- Need clickable rows for navigation
- Custom sorting and filtering logic
- Badge rendering for metrics

**Implementation**:
```python
def render_leaderboard_table(df: pd.DataFrame) -> str:
    """Render leaderboard as interactive HTML table"""

    html = """
    <style>
        .leaderboard-table { ... }
        .metric-badge { ... }
    </style>
    <table class="leaderboard-table">
        <thead>
            <tr>
                <th onclick="sortTable(0)">Model</th>
                <th onclick="sortTable(1)">Success Rate</th>
                <th onclick="sortTable(2)">Cost</th>
                ...
            </tr>
        </thead>
        <tbody>
    """

    for idx, row in df.iterrows():
        html += f"""
            <tr onclick="selectRun('{row['run_id']}')">
                <td>{row['model']}</td>
                <td><span class="badge success">{row['success_rate']}%</span></td>
                <td>${row['total_cost_usd']:.4f}</td>
                ...
            </tr>
        """

    html += """
        </tbody>
    </table>
    <script>
        function sortTable(col) { ... }
        function selectRun(runId) {
            // Trigger Gradio event to navigate to run detail
            document.dispatchEvent(new CustomEvent('runSelected', {detail: runId}));
        }
    </script>
    """

    return html
```

**Integration with Gradio**:
```python
# In leaderboard screen
table_html = gr.HTML()

load_btn.click(
    fn=lambda: render_leaderboard_table(df),
    outputs=table_html
)
```

#### components/analytics_charts.py

**Purpose**: Performance charts using Plotly.

**Charts Provided**:
- Success rate over time (line chart)
- Cost comparison (bar chart)
- Duration distribution (histogram)
- CO2 emissions by model (pie chart)

**Example**:
```python
import plotly.graph_objects as go

def create_cost_comparison_chart(df):
    fig = go.Figure(data=[
        go.Bar(
            x=df['model'],
            y=df['total_cost_usd'],
            marker_color='indianred'
        )
    ])

    fig.update_layout(
        title="Cost Comparison by Model",
        xaxis_title="Model",
        yaxis_title="Total Cost (USD)"
    )

    return fig
```

#### components/thought_graph.py

**Purpose**: Visualize agent reasoning steps (for Agent Chat).

**Visualization**:
- Graph nodes: Reasoning steps, tool calls
- Edges: Flow between steps
- Annotations: Tool results, errors

---

### 4. MCP Client Layer (mcp_client/)

#### mcp_client/client.py - Async MCP Client

**Purpose**: Connect to TraceMind MCP Server via MCP protocol.

**Implementation**: (See [MCP_INTEGRATION.md](MCP_INTEGRATION.md) for full code)

**Key Methods**:
- `connect()`: Establish SSE connection to MCP server
- `call_tool(tool_name, arguments)`: Call an MCP tool
- `analyze_leaderboard(**kwargs)`: Wrapper for analyze_leaderboard tool
- `estimate_cost(**kwargs)`: Wrapper for estimate_cost tool
- `debug_trace(**kwargs)`: Wrapper for debug_trace tool

#### mcp_client/sync_wrapper.py - Synchronous Wrapper

**Purpose**: Provide synchronous API for Gradio event handlers.

**Why Needed?**: Gradio event handlers are synchronous, but MCP client is async.

**Pattern**:
```python
class SyncMCPClient:
    def __init__(self, mcp_server_url):
        self.async_client = AsyncMCPClient(mcp_server_url)

    def _run_async(self, coro):
        """Run async coroutine in sync context"""
        loop = asyncio.get_event_loop()
        return loop.run_until_complete(coro)

    def analyze_leaderboard(self, **kwargs):
        """Synchronous wrapper"""
        return self._run_async(self.async_client.analyze_leaderboard(**kwargs))
```

---

### 5. Data Loader (data_loader.py)

**Purpose**: Load and cache HuggingFace datasets.

**Features**:
- In-memory caching (5-minute TTL)
- Error handling for missing datasets
- Automatic retry logic
- Dataset validation

**Implementation**:
```python
from datasets import load_dataset
from functools import lru_cache
import time

class DataLoader:
    def __init__(self):
        self.cache = {}
        self.cache_ttl = 300  # 5 minutes

    def load_leaderboard(self, repo="kshitijthakkar/smoltrace-leaderboard"):
        """Load leaderboard with caching"""
        cache_key = f"leaderboard:{repo}"

        # Check cache
        if cache_key in self.cache:
            cached_time, cached_data = self.cache[cache_key]
            if time.time() - cached_time < self.cache_ttl:
                return cached_data

        # Load fresh data
        ds = load_dataset(repo, split="train")
        df = pd.DataFrame(ds)

        # Cache
        self.cache[cache_key] = (time.time(), df)

        return df

    def load_results(self, repo):
        """Load results dataset for specific run"""
        ds = load_dataset(repo, split="train")
        return pd.DataFrame(ds)

    def load_traces(self, repo):
        """Load traces dataset for specific run"""
        ds = load_dataset(repo, split="train")
        return ds  # Keep as Dataset for filtering
```

---

## MCP Client Architecture

**Full details in**: [MCP_INTEGRATION.md](MCP_INTEGRATION.md)

**Summary**:
- **Async Client**: `mcp_client/client.py` - async MCP protocol implementation
- **Sync Wrapper**: `mcp_client/sync_wrapper.py` - synchronous API for Gradio
- **Global Instance**: Initialized once in `app.py`, shared across all screens

**Usage Pattern**:
```python
# In app.py (initialization)
from mcp_client.sync_wrapper import get_sync_mcp_client
mcp_client = get_sync_mcp_client()
mcp_client.initialize()

# In screen (usage)
def some_event_handler(mcp_client):
    result = mcp_client.analyze_leaderboard(metric_focus="cost")
    return result
```

---

## Agent Framework Integration

**Full details in**: [MCP_INTEGRATION.md](MCP_INTEGRATION.md)

**Framework**: smolagents (HuggingFace's agent framework)

**Key Features**:
- Autonomous tool discovery from MCP server
- Multi-step reasoning with tool chaining
- Context-aware responses
- Reasoning visualization (optional)

**Agent Setup**:
```python
from smolagents import ToolCallingAgent, MCPClient

agent = ToolCallingAgent(
    tools=[],  # Empty - tools loaded from MCP server
    model=HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct"),
    mcp_client=MCPClient(MCP_SERVER_URL),
    max_steps=10
)
```

---

## Data Flow

### Leaderboard Loading Flow

```
1. User clicks "Load Leaderboard"
   ↓
2. Gradio Event Handler (leaderboard.py)
   load_leaderboard()
   ↓
3. Data Loader (data_loader.py)
   ├─→ Check cache (5-min TTL)
   │   └─→ If cached: return cached data
   └─→ If not cached: load from HF Datasets
       └─→ load_dataset("kshitijthakkar/smoltrace-leaderboard")
   ↓
4. MCP Client (sync_wrapper.py)
   mcp_client.analyze_leaderboard(metric_focus="overall")
   ↓
5. MCP Server (TraceMind-mcp-server)
   ├─→ Load data
   ├─→ Call Gemini API
   └─→ Return AI analysis
   ↓
6. Render Components
   ├─→ AI Insights (Markdown)
   └─→ Leaderboard Table (Custom HTML)
   ↓
7. Display to User
```

### Agent Chat Flow

```
1. User types message: "What are the top 3 models?"
   ↓
2. Gradio Event Handler (chat.py)
   agent_chat(message, history, show_reasoning)
   ↓
3. smolagents Agent
   agent.run(message)
   ├─→ Step 1: Plan approach
   │   └─→ "Need to get top models from leaderboard"
   ├─→ Step 2: Discover MCP tools
   │   └─→ Found: get_top_performers, analyze_leaderboard
   ├─→ Step 3: Call MCP tool
   │   └─→ get_top_performers(metric="success_rate", top_n=3)
   ├─→ Step 4: Parse result
   │   └─→ Extract model names, success rates, costs
   └─→ Step 5: Format response
       └─→ Generate markdown table with insights
   ↓
4. Return to user with full reasoning trace (if enabled)
```

### Job Submission Flow

```
1. User fills form → Clicks "Submit Evaluation"
   ↓
2. Gradio Event Handler (dashboard.py)
   submit_job(model, agent_type, hardware, infrastructure)
   ↓
3. Job Submission Module (utils/)
   if infrastructure == "HuggingFace Jobs":
       ├─→ hf_jobs_submission.py
       ├─→ Build job config (YAML)
       ├─→ Submit via HF Jobs API
       └─→ Return job_id
   elif infrastructure == "Modal":
       ├─→ modal_job_submission.py
       ├─→ Build Modal app config
       ├─→ Submit via Modal SDK
       └─→ Return job_id
   ↓
4. Store job_id in session state
   ↓
5. Redirect to Job Monitoring screen
   ↓
6. Auto-refresh status every 30s
```

---

## Authentication & Authorization

### HuggingFace OAuth

**Implementation**: `utils/auth.py`

**Flow**:
```
1. User visits TraceMind-AI
   ↓
2. Check OAuth token in session
   ├─→ If valid: proceed to app
   └─→ If invalid: show login screen
   ↓
3. User clicks "Sign in with HuggingFace"
   ↓
4. Redirect to HuggingFace OAuth page
   ├─→ User authorizes TraceMind-AI
   └─→ HF redirects back with token
   ↓
5. Store token in Gradio State (session)
   ↓
6. Use token for:
   ├─→ HF Datasets access
   ├─→ HF Jobs submission
   └─→ User identification
```

**Code**:
```python
# utils/auth.py
import gradio as gr

def auth_ui():
    """Create OAuth login UI"""
    gr.LoginButton(
        value="Sign in with HuggingFace",
        auth_provider="huggingface"
    )

# In app.py
with gr.Blocks() as app:
    if not DISABLE_OAUTH:
        auth_ui()
```

### API Key Storage

**Strategy**: Session-only storage (not server-side persistence)

**Implementation**:
```python
# In settings screen
def save_api_keys(gemini_key, hf_token):
    """Store keys in session state"""
    session_state = gr.State({
        "gemini_key": gemini_key,
        "hf_token": hf_token
    })

    # Override default clients with user keys
    if gemini_key:
        os.environ["GEMINI_API_KEY"] = gemini_key
    if hf_token:
        os.environ["HF_TOKEN"] = hf_token

    return "✅ API keys saved for this session"
```

**Security**:
- ✅ Keys stored only in browser memory
- ✅ Not saved to disk or database
- ✅ Forms use `api_name=False` (not exposed via API)
- ✅ HTTPS encryption

---

## Screen Navigation

### State Management

**Pattern**: Gradio State components for session data

```python
# In app.py
with gr.Blocks() as app:
    # Global state
    session_state = gr.State({
        "user": None,
        "current_run_id": None,
        "current_trace_id": None,
        "api_keys": {}
    })

    # Pass to all screens
    leaderboard_screen(session_state)
    chat_screen(session_state)
```

### Navigation Between Screens

**Pattern**: Click event triggers tab switch + state update

```python
# In leaderboard screen
def row_click(run_id, session_state):
    """Navigate to run detail when row clicked"""
    session_state["current_run_id"] = run_id

    # Switch to trace detail tab (Tab index 4)
    return gr.Tabs.update(selected=4), session_state

table_component.select(
    fn=row_click,
    inputs=[gr.State(), session_state],
    outputs=[main_tabs, session_state]
)
```

---

## Job Submission Architecture

### HuggingFace Jobs Integration

**File**: `utils/hf_jobs_submission.py`

**Key Functions**:
```python
def submit_hf_job(model, agent_type, hardware, api_keys):
    """Submit evaluation job to HuggingFace Jobs"""

    # 1. Build job config (YAML)
    job_config = {
        "name": f"SMOLTRACE Eval - {model}",
        "hardware": hardware,  # cpu-basic, t4-small, a10g-small, a100-large, h200
        "environment": {
            "MODEL": model,
            "AGENT_TYPE": agent_type,
            "HF_TOKEN": api_keys["hf_token"],
            # ... other env vars
        },
        "command": [
            "pip install smoltrace[otel,gpu]",
            f"smoltrace-eval --model {model} --agent-type {agent_type} ..."
        ]
    }

    # 2. Submit via HF Jobs API
    response = requests.post(
        "https://huggingface.co/api/jobs",
        headers={"Authorization": f"Bearer {api_keys['hf_token']}"},
        json=job_config
    )

    # 3. Return job ID
    job_id = response.json()["id"]
    return job_id
```

### Modal Integration

**File**: `utils/modal_job_submission.py`

**Key Functions**:
```python
import modal

def submit_modal_job(model, agent_type, hardware, api_keys):
    """Submit evaluation job to Modal"""

    # 1. Create Modal app
    app = modal.App("smoltrace-eval")

    # 2. Define function with GPU
    @app.function(
        image=modal.Image.debian_slim().pip_install("smoltrace[otel,gpu]"),
        gpu=hardware,  # A10, A100-80GB, H200
        secrets=[
            modal.Secret.from_dict({
                "HF_TOKEN": api_keys["hf_token"],
                # ... other secrets
            })
        ]
    )
    def run_evaluation():
        import smoltrace
        # Run evaluation
        results = smoltrace.evaluate(model=model, agent_type=agent_type)
        return results

    # 3. Deploy and run
    with app.run():
        result = run_evaluation.remote()

    return result.job_id
```

---

## Deployment

### HuggingFace Spaces

**Platform**: HuggingFace Spaces
**SDK**: Gradio 5.49.1
**Hardware**: CPU Basic (upgradeable)
**URL**: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind

### Configuration

**Space Metadata** (README.md header):
```yaml
---
title: TraceMind AI
emoji: 🧠
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
short_description: AI agent evaluation with MCP-powered intelligence
license: agpl-3.0
pinned: true
tags:
  - mcp-in-action-track-enterprise
  - agent-evaluation
  - mcp-client
  - leaderboard
  - gradio
---
```

### Environment Variables

**Set in HF Spaces Secrets**:
```bash
# Required
GEMINI_API_KEY=your_gemini_key
HF_TOKEN=your_hf_token

# Optional
MCP_SERVER_URL=https://mcp-1st-birthday-tracemind-mcp-server.hf.space/gradio_api/mcp/sse
LEADERBOARD_REPO=kshitijthakkar/smoltrace-leaderboard
DISABLE_OAUTH=false  # Set to true for local development
```

---

## Performance Optimization

### 1. Data Caching

**Implementation**: `data_loader.py`
- In-memory cache with 5-minute TTL
- Reduces HF Datasets API calls
- Faster page loads

### 2. Async MCP Calls

**Pattern**: Use async for non-blocking I/O
```python
# Could be optimized to run in parallel
async def load_data_with_insights():
    leaderboard_task = load_dataset_async(...)
    insights_task = mcp_client.analyze_leaderboard_async(...)

    leaderboard, insights = await asyncio.gather(leaderboard_task, insights_task)
    return leaderboard, insights
```

### 3. Component Lazy Loading

**Strategy**: Load components only when tabs are activated
```python
with gr.Tab("Trace Detail", visible=False) as trace_tab:
    # Components created only when tab first shown
    @trace_tab.select
    def load_trace_components():
        return build_trace_visualization()
```

---

## Related Documentation

- [README.md](README.md) - Overview and quick start
- [USER_GUIDE.md](USER_GUIDE.md) - Complete screen-by-screen guide
- [MCP_INTEGRATION.md](MCP_INTEGRATION.md) - MCP client implementation
- [TraceMind MCP Server](https://github.com/Mandark-droid/TraceMind-mcp-server/blob/main/ARCHITECTURE.md) - Server-side architecture

---

**Last Updated**: November 21, 2025
**Version**: 1.0.0
**Track**: MCP in Action (Enterprise)