--- license: apache-2.0 base_model: phobert-v2 tags: - vietnamese - hate-speech - span-detection - token-classification - nlp datasets: - visolex/ViHOS model-index: - name: phobert-v2-hsd-span results: - task: type: token-classification name: Hate Speech Span Detection dataset: name: visolex/ViHOS type: visolex/ViHOS metrics: - type: f1 value: 0.6326 - type: precision value: 0.6494 - type: recall value: 0.6305 - type: exact_match value: 0.0000 --- # phobert-v2-hsd-span: Hate Speech Span Detection (Vietnamese) This model is a fine-tuned version of [phobert-v2](https://huggingface.co/phobert-v2) for Vietnamese **Hate Speech Span Detection**. ## Model Details - Base Model: `phobert-v2` - Description: Vietnamese Hate Speech Span Detection - Framework: HuggingFace Transformers - Task: Hate Speech Span Detection (token/char-level spans) ### Hyperparameters - Max sequence length: `64` - Learning rate: `5e-6` - Batch size: `32` - Epochs: `100` - Early stopping patience: `5` ## Results - F1: `0.6326` - Precision: `0.6494` - Recall: `0.6305` - Exact Match: `0.0000` ## Usage ```python from transformers import AutoTokenizer, AutoModelForTokenClassification import torch model_name = "phobert-v2-hsd-span" tok = AutoTokenizer.from_pretrained(model_name) model = AutoModelForTokenClassification.from_pretrained(model_name) text = "Ví dụ câu tiếng Việt có nội dung thù ghét ..." enc = tok(text, return_tensors="pt", truncation=True, max_length=256, is_split_into_words=False) with torch.no_grad(): logits = model(**enc).logits pred_ids = logits.argmax(-1)[0].tolist() # TODO: chuyển pred_ids -> spans theo scheme nhãn của bạn (BIO/BILOU/char-offset) ``` ## License Apache-2.0 ## Acknowledgments - Base model: [phobert-v2](https://huggingface.co/phobert-v2)