File size: 1,859 Bytes
91517b2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
---
license: apache-2.0
base_model: bilstm
tags:
- vietnamese
- hate-speech
- span-detection
- token-classification
- nlp
datasets:
- visolex/ViHOS
model-index:
- name: bilstm-hsd-span
results:
- task:
type: token-classification
name: Hate Speech Span Detection
dataset:
name: visolex/ViHOS
type: visolex/ViHOS
metrics:
- type: f1
value: 0.6668
- type: precision
value: 0.7418
- type: recall
value: 0.6362
- type: exact_match
value: 0.0127
---
# bilstm-hsd-span: Hate Speech Span Detection (Vietnamese)
This model is a fine-tuned version of [bilstm](https://huggingface.co/bilstm) for Vietnamese **Hate Speech Span Detection**.
## Model Details
- Base Model: `bilstm`
- Description: Vietnamese Hate Speech Span Detection
- Framework: HuggingFace Transformers
- Task: Hate Speech Span Detection (token/char-level spans)
### Hyperparameters
- Max sequence length: `64`
- Learning rate: `5e-6`
- Batch size: `32`
- Epochs: `100`
- Early stopping patience: `5`
## Results
- F1: `0.6668`
- Precision: `0.7418`
- Recall: `0.6362`
- Exact Match: `0.0127`
## Usage
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
model_name = "bilstm-hsd-span"
tok = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
text = "Ví dụ câu tiếng Việt có nội dung thù ghét ..."
enc = tok(text, return_tensors="pt", truncation=True, max_length=256, is_split_into_words=False)
with torch.no_grad():
logits = model(**enc).logits
pred_ids = logits.argmax(-1)[0].tolist()
# TODO: chuyển pred_ids -> spans theo scheme nhãn của bạn (BIO/BILOU/char-offset)
```
## License
Apache-2.0
## Acknowledgments
- Base model: [bilstm](https://huggingface.co/bilstm)
|