RoBERTa for Cyberbullying Detection
This is a roberta-base model fine-tuned for the specific task of detecting cyberbullying and toxic language in text. The model has been trained on a diverse and balanced dataset aggregated from multiple public sources, making it robust for real-world chat and social media conversations.
This model is intended to be used as part of a privacy-first system where analysis is performed locally on a user's device.
Model Description
- Base Model:
roberta-base - Fine-tuning Task: Binary Text Classification (Cyberbullying vs. Not Cyberbullying)
- Language: English
How to Use
The easiest way to use this model is with a pipeline from the transformers library.
!pip install transformers
from transformers import pipeline
model_name = "nayan90k/roberta-finetuned-cyberbullying-detection"
classifier = pipeline("text-classification", model=model_name)
results = classifier([
"I love this project, it's so helpful!",
"You are a total loser and everyone knows it."
])
print(results)
# Expected Output:
# [
# {'label': 'LABEL_0', 'score': 0.99...}, # Not bullying
# {'label': 'LABEL_1', 'score': 0.98...} # Bullying
# ]
Training Data
This model was trained on a custom, aggregated dataset compiled from several public sources to ensure diversity. The final, cleaned, and balanced dataset is available on the Hub:
The dataset contains 136,440 samples, perfectly balanced between two classes:
0: Not Cyberbullying1: Cyberbullying
Data was sourced from Twitter, Wikipedia talk pages, and YouTube comments, among others.
Training Procedure
The model was fine-tuned for 1 epoch using the following hyperparameters with the transformers Trainer:
- Learning Rate:
2e-5 - Batch Size:
16 - Optimizer: AdamW
- Warmup Steps:
500 - Weight Decay:
0.01
The full training script and environment setup can be found at the project's GitHub repository: github.com/Kamal-Nayan-Kumar/GuardianAI.
Evaluation Results
The model was evaluated on a held-out test set of 13,644 samples, achieving the following results:
| Metric | Score |
|---|---|
| Accuracy | 0.9000 |
| F1-Score | 0.9025 |
| Precision | 0.8803 |
| Recall | 0.9258 |
Intended Use and Limitations
This model is designed to be a component in a larger system for monitoring online conversations for potential harm, particularly for the safety of younger users.
Intended Use
- As a backend service for a chat application to flag potentially harmful content in real-time.
- To be run locally on a user's device to preserve privacy.
Limitations and Bias
- The model is trained primarily on English text and will not perform well on other languages or code-mixed text.
- While the dataset is diverse, it may not capture all forms of slang, sarcasm, or context-specific bullying, which can lead to both false positives and false negatives.
- The definition of "cyberbullying" is subjective and can vary culturally. The model's predictions reflect the biases of the original dataset annotators.
- It should be used as a tool to flag potential issues for human review, not as a final arbiter of what constitutes bullying.
- Downloads last month
- 89
Model tree for nayan90k/roberta-finetuned-cyberbullying-detection
Base model
FacebookAI/roberta-base