File size: 1,859 Bytes
91517b2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
license: apache-2.0
base_model: bilstm
tags:
- vietnamese
- hate-speech
- span-detection
- token-classification
- nlp
datasets:
- visolex/ViHOS
model-index:
- name: bilstm-hsd-span
  results:
  - task:
      type: token-classification
      name: Hate Speech Span Detection
    dataset:
      name: visolex/ViHOS
      type: visolex/ViHOS
    metrics:
      - type: f1
        value: 0.6668
      - type: precision
        value: 0.7418
      - type: recall
        value: 0.6362
      - type: exact_match
        value: 0.0127
---

# bilstm-hsd-span: Hate Speech Span Detection (Vietnamese)

This model is a fine-tuned version of [bilstm](https://huggingface.co/bilstm) for Vietnamese **Hate Speech Span Detection**.

## Model Details

- Base Model: `bilstm`
- Description: Vietnamese Hate Speech Span Detection
- Framework: HuggingFace Transformers
- Task: Hate Speech Span Detection (token/char-level spans)

### Hyperparameters

- Max sequence length: `64`
- Learning rate: `5e-6`
- Batch size: `32`
- Epochs: `100`
- Early stopping patience: `5`

## Results

- F1: `0.6668`
- Precision: `0.7418`
- Recall: `0.6362`
- Exact Match: `0.0127`

## Usage

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

model_name = "bilstm-hsd-span"
tok = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
text = "Ví dụ câu tiếng Việt có nội dung thù ghét ..."
enc = tok(text, return_tensors="pt", truncation=True, max_length=256, is_split_into_words=False)
with torch.no_grad():
    logits = model(**enc).logits
    pred_ids = logits.argmax(-1)[0].tolist()
# TODO: chuyển pred_ids -> spans theo scheme nhãn của bạn (BIO/BILOU/char-offset)
```

## License

Apache-2.0

## Acknowledgments

- Base model: [bilstm](https://huggingface.co/bilstm)