roberta-large-group-mention-detector-uk-manifestos
roberta-large model finetuned for social group mention detectin in political texts
Model Details
Model Description
Token classification model for (social) group mention detection based on Licht & Sczepanski (2025)
This token classification has been finetuned on human sequence annotations of sentences of British parties' election manifestos for the following entity types:
- social group
- implicit social group reference
- political group
- political institution
- organization, public institution, or collective actor
Please refer to Licht & Sczepanski (2025) for details.
- Developed by: Hauke Licht
- Model type: roberta
- Language(s) (NLP): ['en']
- License: apache-2.0
- Finetuned from model: roberta-large
- Funded by: Center for Comparative and International Studies of the ETH Zurich and the University of Zurich and the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy โ EXC 2126/1 โ 390838866
Model Sources
- Repository: https://github.com/haukelicht/group_mention_detection/release/
- Paper: https://doi.org/10.31219/osf.io/ufb96
- Demo: [More Information Needed]
Uses
Bias, Risks, and Limitations
- Evaluation of the classifier in held-out data shows that it makes mistakes (see section Results).
- The model has been finetuned only on human-annotated labeled sentences sampled from British parties party manifestos. Applying the classifier in other domains can lead to higher error rates than those reported in section Results below.
- The data used to finetune the model come from human annotators. Human annotators can be biased and factors like gender and social background can impact their annotations judgments. This may lead to bias in the detection of specific social groups.
Recommendations
- Users who want to apply the model outside its training data domain (British parties' election programs) should evaluate its performance in the target data.
- Users who want to apply the model outside its training data domain (British parties' election programs) should contuninue to finetune this model on labeled data.
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import pipeline
model_id = "haukelicht/roberta-base-group-mention-detector-uk-manifestos"
classifier = pipeline(task="ner", model=model_id, aggregation_strategy="simple")
text = "Our party fights for the deprived and the vulnerable in our country."
annotations = classifier(text)
print(annotations)
# get annotations' character start and end indexes
locations = [(anno['start'], anno['end']) for anno in annotations]
locations
# index the source text using first annotation as an example
loc = locations[0]
text[slice(*loc)]
Training Details
Training Data
The train, dev, and test splits used for model finetuning and evaluation are available on Github: https://github.com/haukelicht/group_mention_detection/release/splits
Training Procedure
Training Hyperparameters
- epochs: 6
- learning rate: 1e-05
- batch size: 32
- weight decay: 0.01
- warmup ratio: 0.1
Evaluation
Testing Data, Factors & Metrics
Testing Data
The train, dev, and test splits used for model finetuning and evaluation are available on Github: https://github.com/haukelicht/group_mention_detection/release/splits
Metrics
- seq-eval F1: strict seqeuence labeling evaluation metric per CoNLL-2000 shared task based on https://github.com/chakki-works/seqeval
- "soft" seq-eval F1: a more lenient seqeuence labeling evaluation metric that reports span level average performance suzmmarized across examples per https://github.com/haukelicht/soft-seqeval
- sentence-level F1: binary measure of detection performance considering a sentence a positive example/prediction if it contains at least one enttiy to of the given type
Results
| type | seq-eval F1 | soft seq-eval F1 | sentence level F1 | 
|---|---|---|---|
| social group | 0.739 | 0.789 | 0.941 | 
| political group | 0.914 | 0.917 | 0.987 | 
| political institution | 0.700 | 0.740 | 0.958 | 
| organization, public institution, or collective actor | 0.613 | 0.625 | 0.935 | 
| implicit social group reference | 0.731 | 0.634 | 0.956 | 
Citation
BibTeX:
[More Information Needed]
APA:
Licht, H., & Sczepanski, R. (2025). Detecting Group Mentions in Political Rhetoric: A Supervised Learning Approach. forthcoming in British Journal of Political Science. Preprint available at OSF
More Information
https://github.com/haukelicht/group_mention_detection/release
Model Card Contact
- Downloads last month
- 2
Model tree for haukelicht/roberta-large-group-mention-detector-uk-manifestos
Base model
FacebookAI/roberta-largeCollection including haukelicht/roberta-large-group-mention-detector-uk-manifestos
Evaluation results
- social group (seqeval) on custom human-labeled sequence annotation dataset (see model card details)self-reported0.739
- political group (seqeval) on custom human-labeled sequence annotation dataset (see model card details)self-reported0.914
- political institution (seqeval) on custom human-labeled sequence annotation dataset (see model card details)self-reported0.700
- organization, public institution, or collective actor (seqeval) on custom human-labeled sequence annotation dataset (see model card details)self-reported0.613
- implicit social group reference (seqeval) on custom human-labeled sequence annotation dataset (see model card details)self-reported0.731