Spaces:

UWV
/

wimbert-space

Running

yhavinga commited on 14 days ago

Commit

1734421

1 Parent(s): 85efe28

Improve UX: dynamic tokens, better colors, live feedback

- Use dynamic sequence length (no fixed padding)
- Show live token counter below textarea with color coding
- Switch to green gradient palette with proper text contrast
- Add token count to JSON output
- Remove 512 token limit, use full 1408 from model config

Files changed (2) hide show

PLAN.md +145 -0
app.py +73 -15

PLAN.md ADDED Viewed

	@@ -0,0 +1,145 @@

+# WimBERT Synth v0 — Hugging Face Space Plan
+This plan describes a lightweight, reliable Space to demo the dual‑head multi‑label classifier (onderwerp + beleving) defined by `wimbert-synth-v0/model.py`, with labels from `wimbert-synth-v0/label_names.json` and licensing in `wimbert-synth-v0/LICENSE` (Apache‑2.0).
+## Goals
+- Input: a single Dutch “signaalbericht” (free‑text).
+- Output: per head (onderwerp, beleving), show probabilities for all labels:
+  - Visual: color‑coded list/table where color intensity reflects probability.
+  - Numeric: exact probability values (0–1) and top‑K summary.
+  - “Predicted” set using an adjustable threshold (default 0.5).
+- UX: one‑click Predict button; optional “live” inference (after brief inactivity).
+- Portable, reproducible, and fast enough on CPU; optionally GPU‑ready.
+## Toolkit Choice
+- Gradio is the best fit for this demo on Spaces:
+  - First‑class support on Hugging Face Spaces, minimal boilerplate (`app.py`).
+  - Simple event model (button click, input change) and components for text, tabs, HTML, charts.
+  - Easy to serve both a compact top‑K view and a full “all labels” view with custom styling.
+  - No Streamlit server/page lifecycle complexities for this small, single‑page inference app.
+## Model + License
+- Model artifacts live in `wimbert-synth-v0/` with Apache‑2.0 license (redistribution permitted with attribution). Use the exact `LICENSE` in the Space repo.
+- The model is large (~1.2 GB for `model.safetensors`). To keep the Space repo small and boot times predictable, prefer hosting the model as a separate Model repo on the Hub, then download/cache in the Space at runtime.
+  - Recommended: publish a model repo, e.g. `UWV/wimbert-synth-v0`, containing:
+    - `model.safetensors`, `config.json`, tokenizer files, `dual_head_state.pt`, `label_names.json`, `model.py`, `README.md`, `LICENSE`.
+  - The Space loads via `DualHeadModel.from_pretrained(<model_repo_or_local_dir>)`.
+## UX & Visualization
+- Input: `gr.Textbox(label="Signaalbericht", lines=6, placeholder=...)`.
+- Controls:
+  - `Predict` button (primary path).
+  - `Auto-run` toggle to enable live inference: trigger after user stops typing for ~600–800 ms (using Gradio’s input event with debounce or a simple timer wrapper). If performance on CPU is borderline, keep off by default.
+  - `Threshold` slider (0.0–1.0, default 0.5) to highlight predicted labels.
+  - `Top‑K` slider (1–15, default 5) to size the summary.
+- Output: tabs per head and views:
+  - Tab 1: “Samenvatting” → two columns for Onderwerp and Beleving, each listing Top‑K labels with probabilities.
+  - Tab 2: “Alle labels” → scrollable, color‑coded tables (or HTML lists) for every label with exact probabilities.
+  - Tab 3: “JSON/CSV” → exportable raw probabilities (dict of label → prob) + list of predicted labels at current threshold.
+- Color mapping:
+  - Use a light‑to‑dark monochrome (e.g., blue/green) where intensity ∝ probability; add a subtle border for > threshold.
+  - Ensure text contrast (AA) and include numbers to avoid relying on color alone (accessibility).
+## Space Layout
+- Repo root (Space):
+  - `app.py` — Gradio app with UI + inference.
+  - `requirements.txt` — runtime deps.
+  - `README.md` — usage, model card link, privacy note.
+  - `LICENSE` — Apache‑2.0 (from `wimbert-synth-v0/LICENSE`).
+  - Optional: `assets/` (logo), `examples/` (preset texts), `.gitattributes`.
+- The model is not vendored into the Space to avoid 1.2 GB LFS; it’s pulled at startup via `huggingface_hub.snapshot_download` or `from_pretrained` on the Hub repo.
+## Dependencies
+- `gradio>=4.0`
+- `transformers>=4.40`
+- `torch` (CPU is fine; GPU preferred if available)
+- `safetensors`, `huggingface_hub`
+- Optional perf: `accelerate` (device placement), `onnxruntime`/`optimum` (future optimization)
+## Inference Design
+- Load once at Space start (global singleton). Warm up with a short dummy input.
+- Device: choose `cuda` if available, else CPU. Cast to `float16` on GPU; keep `float32` on CPU.
+- Tokenization: use `max_length` from `dual_head_state.pt` config; allow truncation; optionally expose a compact/fast mode (e.g., cap at 512) if CPU latency needs improvement.
+- Output structures:
+  - Dicts for each head: `[ {label, prob, predicted} ... ]` with `predicted = prob >= threshold`.
+  - Top‑K lists derived from the sorted full list.
+- Visualization adapters render the above into: HTML tables (for color‑coding), and JSON/CSV text.
+## Event Flow
+1. User edits text.
+2. If Auto‑run enabled, debounce and run; else wait for Predict button.
+3. Tokenize → model.predict → probs (two tensors).
+4. Sort, slice to Top‑K summary and prepare full tables.
+5. Render to tabs and compact “Predicted labels” chips (one line per head).
+## Pseudocode Sketch (app.py)
+```python
+import gradio as gr
+import torch, json, importlib.util
+from huggingface_hub import snapshot_download
+MODEL_REPO = "UWV/wimbert-synth-v0"
+DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+# Download/copy model folder and import DualHeadModel
+model_dir = snapshot_download(MODEL_REPO)
+spec = importlib.util.spec_from_file_location("model", f"{model_dir}/model.py")
+model_mod = importlib.util.module_from_spec(spec); spec.loader.exec_module(model_mod)
+DualHeadModel = model_mod.DualHeadModel
+model, tokenizer, cfg = DualHeadModel.from_pretrained(model_dir, device=DEVICE)
+# Warm-up
+_ = model.predict(*tokenizer("Hoi", return_tensors="pt", padding="max_length", max_length=cfg["max_length"]).values())
+def predict(text, threshold, topk):
+    enc = tokenizer(text or "", truncation=True, padding="max_length", max_length=cfg["max_length"], return_tensors="pt")
+    on_p, be_p = model.predict(enc["input_ids"].to(DEVICE), enc["attention_mask"].to(DEVICE))
+    # Convert to python lists and build views ...
+    return topk_view, all_labels_html, json_text
+with gr.Blocks(title="WimBERT Synth v0") as demo:
+    # Inputs, controls, tabs, outputs ...
+    ...
+if __name__ == "__main__":
+    demo.launch()
+```
+## Performance Notes
+- CPU on free Spaces will work but can be slow for long texts (base mmBERT at `max_length≈1408`). Mitigations:
+  - Warm‑up once; cap max length to 512 in a “fast mode” toggle; show spinner while running.
+  - Prefer a small GPU (T4 small) if available; cast to fp16 on GPU.
+- Caching: `snapshot_download` uses the shared cache; subsequent restarts are faster.
+## Privacy & Safety
+- The Space processes user text in memory only; no logging beyond Gradio defaults. Mention this in the Space README.
+- Include a “Use responsibly” note (analytics/routing aid; no automated decisions) mirroring the model card.
+## Deliverables
+- `app.py` with:
+  - Robust model loading (Hub), device selection, warm‑up.
+  - Predict function returning: top‑K per head, full colored table, JSON dump.
+  - UI: textbox, Predict button, Auto‑run toggle (debounced), threshold & Top‑K sliders, tabs per view.
+  - Example(s) from the model card (`widget` example) via `gr.Examples`.
+- `requirements.txt` (gradio, transformers, torch, huggingface_hub, safetensors).
+- `README.md` with screenshots, hardware recommendation, and links to the model card.
+- `LICENSE` copied from `wimbert-synth-v0/LICENSE`.
+## Step‑By‑Step
+1) Publish/verify model on Hub (`UWV/wimbert-synth-v0`), including `model.py` and license.
+2) Create Space repo with SDK=Gradio and pick hardware (CPU → OK; GPU → faster).
+3) Add Space files (`app.py`, `requirements.txt`, `README.md`, `LICENSE`).
+4) Implement and test inference locally (CPU) with a few sample texts; tune debounce/threshold defaults.
+5) Push Space; verify cold‑start time and inference latency; adjust max_length and hardware if needed.
+6) Polish visuals (colors, fonts, accessibility), add screenshots, and publish.
+## Nice‑To‑Haves (Later)
+- Per‑class thresholds (if you decide to introduce learned or tuned thresholds).
+- ONNX/Optimum path for CPU acceleration.
+- Session‑level analytics (aggregate latency, not storing user text).
+- Download CSV/JSON of the current result.
+- Translations for UI labels (NL/EN toggle).
+```
+Summary: Use Gradio for a single‑page Space that downloads the Apache‑licensed model from the Hub, offers both button‑based and debounced live inference, and presents per‑head probabilities as color‑coded tables with numeric values, plus top‑K and JSON outputs.
+```

app.py CHANGED Viewed

@@ -14,7 +14,6 @@ from huggingface_hub import snapshot_download
 MODEL_REPO = "UWV/wimbert-synth-v0"
 DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 DTYPE = torch.float16 if DEVICE.type == "cuda" else torch.float32
-MAX_LENGTH = 512  # Default to 512 for better CPU performance
 print(f"🔧 Loading model from {MODEL_REPO}...")
 print(f"🖥️  Device: {DEVICE} ({DTYPE})")
@@ -37,14 +36,14 @@ if DTYPE == torch.float16:
 # Warm-up inference
 with torch.no_grad():
-    dummy_input = tokenizer("Warm-up", return_tensors="pt", padding="max_length",
-                           max_length=MAX_LENGTH, truncation=True)
     _ = model.predict(
         dummy_input["input_ids"].to(DEVICE),
         dummy_input["attention_mask"].to(DEVICE)
     )
-print(f"✅ Model loaded and warmed up")
 # Extract label names
 LABELS_ONDERWERP = config["labels"]["onderwerp"]
@@ -52,10 +51,33 @@ LABELS_BELEVING = config["labels"]["beleving"]
 def prob_to_color(prob: float, threshold: float) -> str:
-    """Generate CSS style for probability visualization"""
-    lightness = 95 - int(prob * 65)
-    border = "2px solid #1e3a8a" if prob >= threshold else "1px solid #e5e7eb"
-    return f"background: hsl(210, 80%, {lightness}%); border: {border}; padding: 6px 12px; border-radius: 4px; margin: 2px 0;"
 def format_topk(labels: list, probs: list, threshold: float, topk: int) -> str:
@@ -97,15 +119,17 @@ def predict(text: str, threshold: float, topk: int):
         empty_msg = "<p style='color: #666; font-style: italic;'>Voer een bericht in om te classificeren...</p>"
         return empty_msg, empty_msg, {}
-    # Tokenize
     inputs = tokenizer(
         text,
         return_tensors="pt",
-        padding="max_length",
-        max_length=MAX_LENGTH,
-        truncation=True
     )
     # Move to device
     input_ids = inputs["input_ids"].to(DEVICE)
     attention_mask = inputs["attention_mask"].to(DEVICE)
@@ -132,6 +156,8 @@ def predict(text: str, threshold: float, topk: int):
     # Generate JSON output
     json_output = {
         "text": text,
         "threshold": threshold,
         "onderwerp": {
             "probabilities": {label: float(prob) for label, prob in zip(LABELS_ONDERWERP, onderwerp_probs)},
@@ -146,6 +172,29 @@ def predict(text: str, threshold: float, topk: int):
     return summary_html, all_labels_html, json_output
 def load_examples():
     """Load example texts"""
     try:
@@ -168,9 +217,9 @@ with gr.Blocks(title="WimBERT Synth v0", theme=gr.themes.Soft()) as demo:
             input_text = gr.Textbox(
                 label="Signaalbericht (Nederlands)",
                 lines=8,
-                placeholder="Bijv: Ik kan niet parkeren bij mijn huis en de website voor vergunningen werkt niet...",
-                info="Voer een bericht in en klik op 'Voorspel'"
             )
             with gr.Row():
                 predict_btn = gr.Button("🔮 Voorspel", variant="primary", scale=2)
                 clear_btn = gr.ClearButton([input_text], value="🗑️ Wissen", scale=1)
@@ -195,7 +244,7 @@ with gr.Blocks(title="WimBERT Synth v0", theme=gr.themes.Soft()) as demo:
             gr.Markdown(f"""
             **Hardware:** {DEVICE.type.upper()}
             **Dtype:** {DTYPE}
-            **Max length:** {MAX_LENGTH}
             """)
     with gr.Tabs():
@@ -225,6 +274,15 @@ with gr.Blocks(title="WimBERT Synth v0", theme=gr.themes.Soft()) as demo:
     """)
     # Event handlers
     predict_btn.click(
         fn=predict,
         inputs=[input_text, threshold_slider, topk_slider],

 MODEL_REPO = "UWV/wimbert-synth-v0"
 DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 DTYPE = torch.float16 if DEVICE.type == "cuda" else torch.float32
 print(f"🔧 Loading model from {MODEL_REPO}...")
 print(f"🖥️  Device: {DEVICE} ({DTYPE})")
 # Warm-up inference
 with torch.no_grad():
+    dummy_input = tokenizer("Warm-up", return_tensors="pt", truncation=True,
+                           max_length=config["max_length"])
     _ = model.predict(
         dummy_input["input_ids"].to(DEVICE),
         dummy_input["attention_mask"].to(DEVICE)
     )
+print(f"✅ Model loaded and warmed up (max_length: {config['max_length']})")
 # Extract label names
 LABELS_ONDERWERP = config["labels"]["onderwerp"]
 def prob_to_color(prob: float, threshold: float) -> str:
+    """Generate CSS style for probability visualization (10X UX approved)"""
+    # Green gradient: low prob = very light green, high prob = saturated green
+    # Use HSL: Hue=145 (green), Saturation increases with prob, Lightness decreases
+    saturation = 30 + int(prob * 50)  # 30% to 80%
+    lightness = 92 - int(prob * 55)   # 92% to 37%
+    # Text color: white for dark backgrounds (prob > 0.6), dark for light
+    text_color = "#ffffff" if prob > 0.6 else "#1f2937"
+    # Border: thick + accent for predicted, subtle for others
+    if prob >= threshold:
+        border = "2px solid #059669"
+        box_shadow = "0 1px 3px rgba(5, 150, 105, 0.3)"
+    else:
+        border = "1px solid #d1d5db"
+        box_shadow = "none"
+    return (
+        f"background: hsl(145, {saturation}%, {lightness}%); "
+        f"color: {text_color}; "
+        f"border: {border}; "
+        f"box-shadow: {box_shadow}; "
+        f"padding: 6px 12px; "
+        f"border-radius: 4px; "
+        f"margin: 2px 0; "
+        f"font-weight: 500;"
+    )
 def format_topk(labels: list, probs: list, threshold: float, topk: int) -> str:
         empty_msg = "<p style='color: #666; font-style: italic;'>Voer een bericht in om te classificeren...</p>"
         return empty_msg, empty_msg, {}
+    # Tokenize with dynamic length (only truncate if needed)
     inputs = tokenizer(
         text,
         return_tensors="pt",
+        truncation=True,
+        max_length=config["max_length"]  # 1408 from model config
     )
+    # Get actual sequence length (non-padding tokens)
+    actual_length = inputs["attention_mask"].sum().item()
     # Move to device
     input_ids = inputs["input_ids"].to(DEVICE)
     attention_mask = inputs["attention_mask"].to(DEVICE)
     # Generate JSON output
     json_output = {
         "text": text,
+        "token_count": actual_length,
+        "max_length": config["max_length"],
         "threshold": threshold,
         "onderwerp": {
             "probabilities": {label: float(prob) for label, prob in zip(LABELS_ONDERWERP, onderwerp_probs)},
     return summary_html, all_labels_html, json_output
+def count_tokens(text: str) -> str:
+    """Count tokens for live feedback"""
+    if not text or not text.strip():
+        return "📏 Tokens: 0 / 1408"
+    # Quick tokenization (no GPU needed, just counting)
+    tokens = tokenizer(text, truncation=True, max_length=config["max_length"])
+    actual_length = sum(tokens["attention_mask"])
+    # Color code based on usage
+    if actual_length > config["max_length"]:
+        color = "#dc2626"  # Red: truncated
+        warning = " ⚠️ (truncated)"
+    elif actual_length > config["max_length"] * 0.8:
+        color = "#f59e0b"  # Orange: getting long
+        warning = ""
+    else:
+        color = "#059669"  # Green: all good
+        warning = ""
+    return f"<span style='color: {color}; font-size: 0.875rem; font-weight: 500;'>📏 Tokens: {actual_length} / {config['max_length']}{warning}</span>"
 def load_examples():
     """Load example texts"""
     try:
             input_text = gr.Textbox(
                 label="Signaalbericht (Nederlands)",
                 lines=8,
+                placeholder="Bijv: Ik kan niet parkeren bij mijn huis en de website voor vergunningen werkt niet..."
             )
+            token_counter = gr.HTML(value="<span style='color: #6b7280; font-size: 0.875rem;'>📏 Tokens: 0 / 1408</span>")
             with gr.Row():
                 predict_btn = gr.Button("🔮 Voorspel", variant="primary", scale=2)
                 clear_btn = gr.ClearButton([input_text], value="🗑️ Wissen", scale=1)
             gr.Markdown(f"""
             **Hardware:** {DEVICE.type.upper()}
             **Dtype:** {DTYPE}
+            **Max length:** {config['max_length']}
             """)
     with gr.Tabs():
     """)
     # Event handlers
+    # Live token counting as user types
+    input_text.change(
+        fn=count_tokens,
+        inputs=input_text,
+        outputs=token_counter
+    )
+    # Prediction on button click
     predict_btn.click(
         fn=predict,
         inputs=[input_text, threshold_slider, topk_slider],