---
license: apache-2.0
language:
- ar
- arz
library_name: transformers
pipeline_tag: text-classification
widget:
- text: عامل ايه يا باشا ؟
  output:
  - label: Neutral
    score: 0.999
  - label: Hate
    score: 0.001
- text: مبحبش الخلايجه
  output:
  - label: Hate
    score: 0.998
  - label: Neutral
    score: 0.002
datasets:
- IbrahimAmin/egyptian-arabic-hate-speech
base_model:
- UBC-NLP/MARBERTv2
---

# 🇪🇬 Egyptian-Arabic Hate Speech Detection 🗣️🚫

**Author**: [IbrahimAmin](https://huggingface.co/IbrahimAmin), Mostafa Abbas, Rany Hatem, Andrew Ihab, Mohamed Waleed Fahkr \
**License**: Apache-2.0 \
**Paper**: [*Fine-tuning Arabic Pre-Trained Transformer Models for Egyptian-Arabic Dialect Offensive Language and Hate Speech Detection and Classification*](https://ieeexplore.ieee.org/document/10009167) \
**Languages**: Arabic (Egyptian Dialect)

---

## 🧠 Model Card

This model is a fine-tuned version of [MARBERTv2](https://huggingface.co/UBC-NLP/MARBERTv2). We finetuned this model for binary text classification `(Neutral-Hate)` 
on a **sampled version** of [a custom Egyptian-Arabic hate speech dataset](https://huggingface.co/datasets/IbrahimAmin/egyptian-arabic-hate-speech).

---

## 🔧 How to Use

```python
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification

device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = AutoModelForSequenceClassification.from_pretrained("IbrahimAmin/marbertv2-finetuned-egyptian-hate-speech-detection")
tokenizer = AutoTokenizer.from_pretrained("IbrahimAmin/marbertv2-finetuned-egyptian-hate-speech-detection")

classifier = pipeline("text-classification", model=model, tokenizer=tokenizer, device=device)
result = classifier("مبحبش الخلايجه")
print(result)
```

---

## ⚠️ Limitations & Biases

- Trained specifically on Egyptian Arabic; performance may degrade on MSA or other dialects.
- Social and political content may introduce bias in predictions.
- Borderline and sarcastic content may be misclassified.

---

## ⚠️ Disclaimer
This model is intended for research and content moderation purposes and is not meant to offend, harm, or promote discrimination against any individual or group. 
It is important to use this model responsibly and consider the context in which it is applied. Any offensive content detected by the model should be treated with 
caution and handled appropriately.

---

## 👏 Acknowledgement

Model fine-tuning, data collection, annotation and pre-processing for this work were performed as part of a Graduation Project from the Faculty of Engineering, AASTMT, Computer Engineering Program.

---

## 📖 Citation 

If you use this model in your work, please cite:

~~~
@INPROCEEDINGS{10009167,
  author={Ahmed, Ibrahim and Abbas, Mostafa and Hatem, Rany and Ihab, Andrew and Fahkr, Mohamed Waleed},
  booktitle={2022 20th International Conference on Language Engineering (ESOLEC)}, 
  title={Fine-tuning Arabic Pre-Trained Transformer Models for Egyptian-Arabic Dialect Offensive Language and Hate Speech Detection and Classification}, 
  year={2022},
  volume={20},
  number={},
  pages={170-174},
  keywords={Social networking (online);Text categorization;Hate speech;Blogs;Transformers;Natural language processing;Task analysis;Arabic Hate Speech;Natural Language Processing;Transformers;Text Classification},
  doi={10.1109/ESOLEC54569.2022.10009167}}
~~~