visolex
/

vit5-hsd-span

Token Classification

Model card Files Files and versions

vit5-hsd-span / README.md

AnnyNguyen's picture

Upload README.md with huggingface_hub

0fd4ec8 verified about 1 month ago

|

history blame contribute delete

1.82 kB

	---
	license: apache-2.0
	base_model: vit5
	tags:
	- vietnamese
	- hate-speech
	- span-detection
	- token-classification
	- nlp
	datasets:
	- visolex/ViHOS
	model-index:
	- name: vit5-hsd-span
	results:
	- task:
	type: token-classification
	name: Hate Speech Span Detection
	dataset:
	name: visolex/ViHOS
	type: visolex/ViHOS
	metrics:
	- type: f1
	value: N/A
	- type: precision
	value: N/A
	- type: recall
	value: N/A
	- type: exact_match
	value: 0.0063
	---

	# vit5-hsd-span: Hate Speech Span Detection (Vietnamese)

	This model is a fine-tuned version of [vit5](https://huggingface.co/vit5) for Vietnamese Hate Speech Span Detection.

	## Model Details

	- Base Model: `vit5`
	- Description: Vietnamese Hate Speech Span Detection
	- Framework: HuggingFace Transformers
	- Task: Hate Speech Span Detection (token/char-level spans)

	### Hyperparameters

	- Max sequence length: `64`
	- Learning rate: `5e-6`
	- Batch size: `32`
	- Epochs: `100`
	- Early stopping patience: `5`

	## Results

	- F1: `N/A`
	- Precision: `N/A`
	- Recall: `N/A`
	- Exact Match: `0.0063`

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForTokenClassification
	import torch

	model_name = "vit5-hsd-span"
	tok = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForTokenClassification.from_pretrained(model_name)
	text = "Ví dụ câu tiếng Việt có nội dung thù ghét ..."
	enc = tok(text, return_tensors="pt", truncation=True, max_length=256, is_split_into_words=False)
	with torch.no_grad():
	logits = model(**enc).logits
	pred_ids = logits.argmax(-1)[0].tolist()
	# TODO: chuyển pred_ids -> spans theo scheme nhãn của bạn (BIO/BILOU/char-offset)
	```

	## License

	Apache-2.0

	## Acknowledgments

	- Base model: [vit5](https://huggingface.co/vit5)