RoBERTa for Cyberbullying Detection

This is a roberta-base model fine-tuned for the specific task of detecting cyberbullying and toxic language in text. The model has been trained on a diverse and balanced dataset aggregated from multiple public sources, making it robust for real-world chat and social media conversations.

This model is intended to be used as part of a privacy-first system where analysis is performed locally on a user's device.

Model Description

  • Base Model: roberta-base
  • Fine-tuning Task: Binary Text Classification (Cyberbullying vs. Not Cyberbullying)
  • Language: English

How to Use

The easiest way to use this model is with a pipeline from the transformers library.

!pip install transformers
from transformers import pipeline

model_name = "nayan90k/roberta-finetuned-cyberbullying-detection"

classifier = pipeline("text-classification", model=model_name)

results = classifier([
    "I love this project, it's so helpful!",
    "You are a total loser and everyone knows it."
])
print(results)

# Expected Output:
# [
#  {'label': 'LABEL_0', 'score': 0.99...}, # Not bullying
#  {'label': 'LABEL_1', 'score': 0.98...} # Bullying
# ]

Training Data

This model was trained on a custom, aggregated dataset compiled from several public sources to ensure diversity. The final, cleaned, and balanced dataset is available on the Hub:

The dataset contains 136,440 samples, perfectly balanced between two classes:

  • 0: Not Cyberbullying
  • 1: Cyberbullying

Data was sourced from Twitter, Wikipedia talk pages, and YouTube comments, among others.

Training Procedure

The model was fine-tuned for 1 epoch using the following hyperparameters with the transformers Trainer:

  • Learning Rate: 2e-5
  • Batch Size: 16
  • Optimizer: AdamW
  • Warmup Steps: 500
  • Weight Decay: 0.01

The full training script and environment setup can be found at the project's GitHub repository: github.com/Kamal-Nayan-Kumar/GuardianAI.

Evaluation Results

The model was evaluated on a held-out test set of 13,644 samples, achieving the following results:

Metric Score
Accuracy 0.9000
F1-Score 0.9025
Precision 0.8803
Recall 0.9258

Intended Use and Limitations

This model is designed to be a component in a larger system for monitoring online conversations for potential harm, particularly for the safety of younger users.

Intended Use

  • As a backend service for a chat application to flag potentially harmful content in real-time.
  • To be run locally on a user's device to preserve privacy.

Limitations and Bias

  • The model is trained primarily on English text and will not perform well on other languages or code-mixed text.
  • While the dataset is diverse, it may not capture all forms of slang, sarcasm, or context-specific bullying, which can lead to both false positives and false negatives.
  • The definition of "cyberbullying" is subjective and can vary culturally. The model's predictions reflect the biases of the original dataset annotators.
  • It should be used as a tool to flag potential issues for human review, not as a final arbiter of what constitutes bullying.
Downloads last month
89
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nayan90k/roberta-finetuned-cyberbullying-detection

Finetuned
(1939)
this model

Dataset used to train nayan90k/roberta-finetuned-cyberbullying-detection