Spaces:
Running
Running
WimBERT Synth v0 — Hugging Face Space Plan
This plan describes a lightweight, reliable Space to demo the dual‑head multi‑label classifier (onderwerp + beleving) defined by wimbert-synth-v0/model.py, with labels from wimbert-synth-v0/label_names.json and licensing in wimbert-synth-v0/LICENSE (Apache‑2.0).
Goals
- Input: a single Dutch “signaalbericht” (free‑text).
- Output: per head (onderwerp, beleving), show probabilities for all labels:
- Visual: color‑coded list/table where color intensity reflects probability.
- Numeric: exact probability values (0–1) and top‑K summary.
- “Predicted” set using an adjustable threshold (default 0.5).
- UX: one‑click Predict button; optional “live” inference (after brief inactivity).
- Portable, reproducible, and fast enough on CPU; optionally GPU‑ready.
Toolkit Choice
- Gradio is the best fit for this demo on Spaces:
- First‑class support on Hugging Face Spaces, minimal boilerplate (
app.py). - Simple event model (button click, input change) and components for text, tabs, HTML, charts.
- Easy to serve both a compact top‑K view and a full “all labels” view with custom styling.
- No Streamlit server/page lifecycle complexities for this small, single‑page inference app.
- First‑class support on Hugging Face Spaces, minimal boilerplate (
Model + License
- Model artifacts live in
wimbert-synth-v0/with Apache‑2.0 license (redistribution permitted with attribution). Use the exactLICENSEin the Space repo. - The model is large (~1.2 GB for
model.safetensors). To keep the Space repo small and boot times predictable, prefer hosting the model as a separate Model repo on the Hub, then download/cache in the Space at runtime.- Recommended: publish a model repo, e.g.
UWV/wimbert-synth-v0, containing:model.safetensors,config.json, tokenizer files,dual_head_state.pt,label_names.json,model.py,README.md,LICENSE.
- The Space loads via
DualHeadModel.from_pretrained(<model_repo_or_local_dir>).
- Recommended: publish a model repo, e.g.
UX & Visualization
- Input:
gr.Textbox(label="Signaalbericht", lines=6, placeholder=...). - Controls:
Predictbutton (primary path).Auto-runtoggle to enable live inference: trigger after user stops typing for ~600–800 ms (using Gradio’s input event with debounce or a simple timer wrapper). If performance on CPU is borderline, keep off by default.Thresholdslider (0.0–1.0, default 0.5) to highlight predicted labels.Top‑Kslider (1–15, default 5) to size the summary.
- Output: tabs per head and views:
- Tab 1: “Samenvatting” → two columns for Onderwerp and Beleving, each listing Top‑K labels with probabilities.
- Tab 2: “Alle labels” → scrollable, color‑coded tables (or HTML lists) for every label with exact probabilities.
- Tab 3: “JSON/CSV” → exportable raw probabilities (dict of label → prob) + list of predicted labels at current threshold.
- Color mapping:
- Use a light‑to‑dark monochrome (e.g., blue/green) where intensity ∝ probability; add a subtle border for > threshold.
- Ensure text contrast (AA) and include numbers to avoid relying on color alone (accessibility).
Space Layout
- Repo root (Space):
app.py— Gradio app with UI + inference.requirements.txt— runtime deps.README.md— usage, model card link, privacy note.LICENSE— Apache‑2.0 (fromwimbert-synth-v0/LICENSE).- Optional:
assets/(logo),examples/(preset texts),.gitattributes.
- The model is not vendored into the Space to avoid 1.2 GB LFS; it’s pulled at startup via
huggingface_hub.snapshot_downloadorfrom_pretrainedon the Hub repo.
Dependencies
gradio>=4.0transformers>=4.40torch(CPU is fine; GPU preferred if available)safetensors,huggingface_hub- Optional perf:
accelerate(device placement),onnxruntime/optimum(future optimization)
Inference Design
- Load once at Space start (global singleton). Warm up with a short dummy input.
- Device: choose
cudaif available, else CPU. Cast tofloat16on GPU; keepfloat32on CPU. - Tokenization: use
max_lengthfromdual_head_state.ptconfig; allow truncation; optionally expose a compact/fast mode (e.g., cap at 512) if CPU latency needs improvement. - Output structures:
- Dicts for each head:
[ {label, prob, predicted} ... ]withpredicted = prob >= threshold. - Top‑K lists derived from the sorted full list.
- Dicts for each head:
- Visualization adapters render the above into: HTML tables (for color‑coding), and JSON/CSV text.
Event Flow
- User edits text.
- If Auto‑run enabled, debounce and run; else wait for Predict button.
- Tokenize → model.predict → probs (two tensors).
- Sort, slice to Top‑K summary and prepare full tables.
- Render to tabs and compact “Predicted labels” chips (one line per head).
Pseudocode Sketch (app.py)
import gradio as gr
import torch, json, importlib.util
from huggingface_hub import snapshot_download
MODEL_REPO = "UWV/wimbert-synth-v0"
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Download/copy model folder and import DualHeadModel
model_dir = snapshot_download(MODEL_REPO)
spec = importlib.util.spec_from_file_location("model", f"{model_dir}/model.py")
model_mod = importlib.util.module_from_spec(spec); spec.loader.exec_module(model_mod)
DualHeadModel = model_mod.DualHeadModel
model, tokenizer, cfg = DualHeadModel.from_pretrained(model_dir, device=DEVICE)
# Warm-up
_ = model.predict(*tokenizer("Hoi", return_tensors="pt", padding="max_length", max_length=cfg["max_length"]).values())
def predict(text, threshold, topk):
enc = tokenizer(text or "", truncation=True, padding="max_length", max_length=cfg["max_length"], return_tensors="pt")
on_p, be_p = model.predict(enc["input_ids"].to(DEVICE), enc["attention_mask"].to(DEVICE))
# Convert to python lists and build views ...
return topk_view, all_labels_html, json_text
with gr.Blocks(title="WimBERT Synth v0") as demo:
# Inputs, controls, tabs, outputs ...
...
if __name__ == "__main__":
demo.launch()
Performance Notes
- CPU on free Spaces will work but can be slow for long texts (base mmBERT at
max_length≈1408). Mitigations:- Warm‑up once; cap max length to 512 in a “fast mode” toggle; show spinner while running.
- Prefer a small GPU (T4 small) if available; cast to fp16 on GPU.
- Caching:
snapshot_downloaduses the shared cache; subsequent restarts are faster.
Privacy & Safety
- The Space processes user text in memory only; no logging beyond Gradio defaults. Mention this in the Space README.
- Include a “Use responsibly” note (analytics/routing aid; no automated decisions) mirroring the model card.
Deliverables
app.pywith:- Robust model loading (Hub), device selection, warm‑up.
- Predict function returning: top‑K per head, full colored table, JSON dump.
- UI: textbox, Predict button, Auto‑run toggle (debounced), threshold & Top‑K sliders, tabs per view.
- Example(s) from the model card (
widgetexample) viagr.Examples.
requirements.txt(gradio, transformers, torch, huggingface_hub, safetensors).README.mdwith screenshots, hardware recommendation, and links to the model card.LICENSEcopied fromwimbert-synth-v0/LICENSE.
Step‑By‑Step
- Publish/verify model on Hub (
UWV/wimbert-synth-v0), includingmodel.pyand license. - Create Space repo with SDK=Gradio and pick hardware (CPU → OK; GPU → faster).
- Add Space files (
app.py,requirements.txt,README.md,LICENSE). - Implement and test inference locally (CPU) with a few sample texts; tune debounce/threshold defaults.
- Push Space; verify cold‑start time and inference latency; adjust max_length and hardware if needed.
- Polish visuals (colors, fonts, accessibility), add screenshots, and publish.
Nice‑To‑Haves (Later)
- Per‑class thresholds (if you decide to introduce learned or tuned thresholds).
- ONNX/Optimum path for CPU acceleration.
- Session‑level analytics (aggregate latency, not storing user text).
- Download CSV/JSON of the current result.
- Translations for UI labels (NL/EN toggle).
Summary: Use Gradio for a single‑page Space that downloads the Apache‑licensed model from the Hub, offers both button‑based and debounced live inference, and presents per‑head probabilities as color‑coded tables with numeric values, plus top‑K and JSON outputs.