--- license: apache-2.0 language: - ar - arz library_name: transformers pipeline_tag: text-classification widget: - text: عامل ايه يا باشا ؟ output: - label: Neutral score: 0.999 - label: Hate score: 0.001 - text: مبحبش الخلايجه output: - label: Hate score: 0.998 - label: Neutral score: 0.002 datasets: - IbrahimAmin/egyptian-arabic-hate-speech base_model: - UBC-NLP/MARBERTv2 --- # 🇪🇬 Egyptian-Arabic Hate Speech Detection 🗣️🚫 **Author**: [IbrahimAmin](https://huggingface.co/IbrahimAmin), Mostafa Abbas, Rany Hatem, Andrew Ihab, Mohamed Waleed Fahkr \ **License**: Apache-2.0 \ **Paper**: [*Fine-tuning Arabic Pre-Trained Transformer Models for Egyptian-Arabic Dialect Offensive Language and Hate Speech Detection and Classification*](https://ieeexplore.ieee.org/document/10009167) \ **Languages**: Arabic (Egyptian Dialect) --- ## 🧠 Model Card This model is a fine-tuned version of [MARBERTv2](https://huggingface.co/UBC-NLP/MARBERTv2). We finetuned this model for binary text classification `(Neutral-Hate)` on a **sampled version** of [a custom Egyptian-Arabic hate speech dataset](https://huggingface.co/datasets/IbrahimAmin/egyptian-arabic-hate-speech). --- ## 🔧 How to Use ```python import torch from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification device = "cuda:0" if torch.cuda.is_available() else "cpu" model = AutoModelForSequenceClassification.from_pretrained("IbrahimAmin/marbertv2-finetuned-egyptian-hate-speech-detection") tokenizer = AutoTokenizer.from_pretrained("IbrahimAmin/marbertv2-finetuned-egyptian-hate-speech-detection") classifier = pipeline("text-classification", model=model, tokenizer=tokenizer, device=device) result = classifier("مبحبش الخلايجه") print(result) ``` --- ## ⚠️ Limitations & Biases - Trained specifically on Egyptian Arabic; performance may degrade on MSA or other dialects. - Social and political content may introduce bias in predictions. - Borderline and sarcastic content may be misclassified. --- ## ⚠️ Disclaimer This model is intended for research and content moderation purposes and is not meant to offend, harm, or promote discrimination against any individual or group. It is important to use this model responsibly and consider the context in which it is applied. Any offensive content detected by the model should be treated with caution and handled appropriately. --- ## 👏 Acknowledgement Model fine-tuning, data collection, annotation and pre-processing for this work were performed as part of a Graduation Project from the Faculty of Engineering, AASTMT, Computer Engineering Program. --- ## 📖 Citation If you use this model in your work, please cite: ~~~ @INPROCEEDINGS{10009167, author={Ahmed, Ibrahim and Abbas, Mostafa and Hatem, Rany and Ihab, Andrew and Fahkr, Mohamed Waleed}, booktitle={2022 20th International Conference on Language Engineering (ESOLEC)}, title={Fine-tuning Arabic Pre-Trained Transformer Models for Egyptian-Arabic Dialect Offensive Language and Hate Speech Detection and Classification}, year={2022}, volume={20}, number={}, pages={170-174}, keywords={Social networking (online);Text categorization;Hate speech;Blogs;Transformers;Natural language processing;Task analysis;Arabic Hate Speech;Natural Language Processing;Transformers;Text Classification}, doi={10.1109/ESOLEC54569.2022.10009167}} ~~~