RoBERTa for Cyberbullying Detection

This is a roberta-base model fine-tuned for the specific task of detecting cyberbullying and toxic language in text. The model has been trained on a diverse and balanced dataset aggregated from multiple public sources, making it robust for real-world chat and social media conversations.

This model is intended to be used as part of a privacy-first system where analysis is performed locally on a user's device.

Model Description

Base Model: roberta-base
Fine-tuning Task: Binary Text Classification (Cyberbullying vs. Not Cyberbullying)
Language: English

How to Use

The easiest way to use this model is with a pipeline from the transformers library.

!pip install transformers

from transformers import pipeline

model_name = "nayan90k/roberta-finetuned-cyberbullying-detection"

classifier = pipeline("text-classification", model=model_name)

results = classifier([
    "I love this project, it's so helpful!",
    "You are a total loser and everyone knows it."
])
print(results)

# Expected Output:
# [
#  {'label': 'LABEL_0', 'score': 0.99...}, # Not bullying
#  {'label': 'LABEL_1', 'score': 0.98...} # Bullying
# ]

Training Data

This model was trained on a custom, aggregated dataset compiled from several public sources to ensure diversity. The final, cleaned, and balanced dataset is available on the Hub:

Dataset: nayan90k/cyberbullying-tweets-balanced

The dataset contains 136,440 samples, perfectly balanced between two classes:

0: Not Cyberbullying
1: Cyberbullying

Data was sourced from Twitter, Wikipedia talk pages, and YouTube comments, among others.

Training Procedure

The model was fine-tuned for 1 epoch using the following hyperparameters with the transformers Trainer:

Learning Rate: 2e-5
Batch Size: 16
Optimizer: AdamW
Warmup Steps: 500
Weight Decay: 0.01

The full training script and environment setup can be found at the project's GitHub repository: github.com/Kamal-Nayan-Kumar/GuardianAI.

Evaluation Results

The model was evaluated on a held-out test set of 13,644 samples, achieving the following results:

Metric	Score
Accuracy	0.9000
F1-Score	0.9025
Precision	0.8803
Recall	0.9258

Intended Use and Limitations

This model is designed to be a component in a larger system for monitoring online conversations for potential harm, particularly for the safety of younger users.

Intended Use

As a backend service for a chat application to flag potentially harmful content in real-time.
To be run locally on a user's device to preserve privacy.

Limitations and Bias

The model is trained primarily on English text and will not perform well on other languages or code-mixed text.
While the dataset is diverse, it may not capture all forms of slang, sarcasm, or context-specific bullying, which can lead to both false positives and false negatives.
The definition of "cyberbullying" is subjective and can vary culturally. The model's predictions reflect the biases of the original dataset annotators.
It should be used as a tool to flag potential issues for human review, not as a final arbiter of what constitutes bullying.

Downloads last month: 89

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nayan90k/roberta-finetuned-cyberbullying-detection

Base model

FacebookAI/roberta-base

Finetuned

(1939)

this model

nayan90k
/

roberta-finetuned-cyberbullying-detection