roberta-fnf-taglish-v1
RoBERTa-Tagalog fine-tuned for binary fake-news detection (real, fake).
Training setup
- Cluster-disjoint splits on cleaned FNF corpus
- Train-only Taglish paraphrase augmentation
- Base tokenizer:
jcblaise/roberta-tagalog-base
Quickstart
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
repo = 'renshhhh/roberta-fnf-taglish-v1'
tok = AutoTokenizer.from_pretrained(repo, use_fast=True)
mdl = AutoModelForSequenceClassification.from_pretrained(repo)
text = 'Ito ay balitang halimbawa lang, hindi totoong artikulo.'
batch = tok(text, return_tensors='pt', truncation=True, max_length=256)
with torch.no_grad():
probs = mdl(**batch).logits.softmax(-1).tolist()[0]
id2label = mdl.config.id2label
print({id2label[i]: float(probs[i]) for i in range(len(probs))})
Eval (cluster-disjoint test)
- Accuracy = 0.947
- Weighted F1 = 0.947
- Downloads last month
- 26
Evaluation results
- f1 on FNF (cluster-disjoint test + Taglish train-only augmentation)test set self-reported0.947
- accuracy on FNF (cluster-disjoint test + Taglish train-only augmentation)test set self-reported0.947