You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

xlm-twitter-stormfront_incels_counter_ner_CfA

This model is a fine-tuned classifier on the Counter dataset (Riabi et al., 2025) (anonymized version) (Riabi et al., 2024) to predict calls for action in radical content. It is based on a domain-adapted version of XLM-T.

The model was introduced as part of the research presented in IYKYK: Using language models to decode extremist cryptolects (de Kock et al., 2025)


πŸ“ Description

This model builds on a version of XLM-T that was domain-adapted using masked language modeling (MLM) on approximately 18 million posts from two online extremist communities: Stormfront and Incels. This domain adaptation improves the model's ability to represent radical and coded in-group language.

The domain-adapted encoder was then fine-tuned in a multi-task setting on the Counter dataset, where the main task is Call for Action (CfA) classification, paired with Named Entity Recognition (NER) as an auxiliary task. The two tasks share the same encoder, and task-specific classification heads are trained jointly.

πŸ“Œ This release corresponds to the best-performing model variant as described in the paper.


βœ… Intended Use

  • Research on extremist or radical language detection
  • Analysis of online hate speech and coded in-group language
  • Supporting moderation and intervention efforts in academic or policy contexts

βš–οΈ Ethical Considerations

Handling extremist text data carries significant ethical risks. This model was developed under strict research protocols and is released only for responsible, academic, and policy research purposes. Repeated exposure to extremist content can be harmful; proper support and mental health considerations are advised for practitioners using this model.


πŸ“– Citation

If you use this model, please cite the following works:

@misc{dekock2025iykyk,
  title={IYKYK: Using language models to decode extremist cryptolects},
  author={Christine de Kock and Arij Riabi and Zeerak Talat and Michael Sejr Schlichtkrull and Pranava Madhyastha and Eduard Hovy},
  year={2025},
  eprint={2506.05635},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2506.05635}
}

@inproceedings{riabi2025beyond,
  title = {Beyond Dataset Creation: Critical View of Annotation Variation and Bias Probing of a Dataset for Online Radical Content Detection},
  author = {Riabi, Arij and Mouilleron, Virginie and Mahamdi, Menel and Antoun, Wissam and Seddah, DjamΓ©},
  booktitle = {Proceedings of the 31st International Conference on Computational Linguistics},
  year = {2025},
  url = {https://aclanthology.org/2025.coling-main.578/}
}

@inproceedings{riabi2024counter,
  title = {Cloaked Classifiers: Pseudonymization Strategies on Sensitive Classification Tasks},
  author = {Riabi, Arij and Mouilleron, Virginie and Mahamdi, Menel and Seddah, DjamΓ©},
  booktitle = {Proceedings of the Workshop on Privacy in NLP},
  year = {2024},
  url = {https://aclanthology.org/2024.privatenlp-1.13/}
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support