A multi-label text classification model based on TURKCELL/roberta-base-turkish-uncased, fine-tuned to detect 14 unsafe content categories in Turkish texts.
The model is designed to serve as a guardrail safety filter for chatbots and other LLM-powered systems.
🧠 Model Overview
| Property | Value |
|---|---|
| Base model | TURKCELL/roberta-base-turkish-uncased |
| Task | Multi-label classification (safety moderation) |
| Language | Turkish |
| Labels (unsafe topics) | siyaset, toplumsal cinsiyet, şiddet, din, suç, cinsellik, göç, kimlik, uluslararası ilişkiler, toplumsal eleştiri, bahis, ruhsal, zararlı madde, kişisel haklar |
| Output | One or multiple unsafe topics triggered, or SAFE |
| Thresholds | Class-specific, tuned on validation set using F2 optimization |
⚙️ Usage
You can load the model directly with a single pipeline:
from transformers import pipeline
# PART 2 — LOAD WITH A SINGLE pipeline CALL AND INFER
from transformers import pipeline
REPO_ID = "yeniguno/roberta-turkish-bantopic-uncased"
clf = pipeline(
task="text-classification",
model=REPO_ID,
tokenizer=REPO_ID,
trust_remote_code=True
)
print(clf("Bu akşam arkadaşlarımla film izleyeceğim."))
print(clf("Ekrem İmamoğlu'nun mevcut iktidarı yenecek olması bazılarını korkutuyor."))
Option 2 — Pure Transformers (no remote code), apply thresholds yourself
import json, numpy as np
from transformers import pipeline
from huggingface_hub import hf_hub_download
REPO_ID = "yeniguno/roberta-turkish-bantopic-uncased"
# return all label probabilities (sigmoid) for multi-label use
clf = pipeline(
task="text-classification",
model=REPO_ID,
tokenizer=REPO_ID,
top_k=None,
function_to_apply="sigmoid"
)
# load per-label thresholds + label mapping from the repo
th_path = hf_hub_download(repo_id=REPO_ID, filename="thresholds.json")
lb_path = hf_hub_download(repo_id=REPO_ID, filename="labels.json")
thresholds = np.array(json.load(open(th_path)), dtype=float)
id2label = {int(k): v for k, v in json.load(open(lb_path)).items()}
label2id = {v:k for k,v in id2label.items()}
def guard_predict(text: str, return_scores: bool=False):
out = clf(text) # list of {'label': name or 'LABEL_i', 'score': float} for all labels
scores = np.zeros(len(id2label), dtype=float)
for d in out:
lab = d["label"]
idx = int(lab.split("_")[1]) if lab.startswith("LABEL_") else label2id[lab]
scores[idx] = float(d["score"])
fired = [(id2label[i], float(scores[i])) for i in range(len(scores)) if scores[i] >= thresholds[i]]
if not fired:
return {"status": "SAFE"} if not return_scores else {
"status": "SAFE",
"scores": {id2label[i]: float(scores[i]) for i in range(len(scores))}
}
fired.sort(key=lambda x: x[1], reverse=True)
return {"status": "UNSAFE", "triggered": fired} if not return_scores else {
"status": "UNSAFE",
"triggered": fired,
"scores": {id2label[i]: float(scores[i]) for i in range(len(scores))}
}
print(clf("Bu akşam arkadaşlarımla film izleyeceğim."))
print(clf("Ekrem İmamoğlu'nun mevcut iktidarı yenecek olması bazılarını korkutuyor."))
🧩 Intended Use
This model acts as a pre-filter or guardrail before sending user inputs to an LLM. It helps detect and block or flag text that contains or relates to sensitive categories such as violence, crime, drugs, sexual content, or discrimination.
It is not a hate-speech classifier or a legal moderation system. It simply detects topic-level presence of unsafe domains.
📊 Training Details
- Training data size: ~300k Turkish text samples
- Positive (unsafe) examples: ~95k
- Negative (safe) examples: ~205k
- Loss function: BCEWithLogitsLoss with positive class weighting
- Optimizer: AdamW (lr=2e-5)
- Epochs: 3
- Batch size: 16 (train), 32 (eval)
- Hardware: NVIDIA RTX 5090 (32GB)
🧪 Evaluation Results
| Metric | Validation | Test |
|---|---|---|
| Micro Precision | 0.35 | 0.34 |
| Micro Recall | 0.83 | 0.83 |
| Micro F1 | 0.49 | 0.48 |
| Macro Precision | 0.29 | 0.24 |
| Macro Recall | 0.63 | 0.60 |
| Macro F1 | 0.38 | 0.34 |
- Downloads last month
- 35
Model tree for yeniguno/roberta-turkish-bantopic-uncased
Base model
TURKCELL/roberta-base-turkish-uncased