--- license: apache-2.0 datasets: - eriktks/conll2003 language: - en base_model: - stefan-it/ModernBERT-large-tokenizer-fix tags: - ner --- # ✨ ModernBERT Large for NER This repository hosts an ModernBERT Large model that was fine-tuned on the CoNLL-2003 NER dataset with the awesome Flair libary. Please notice the following caveats: * ⚠️ To workaround a tokenizer problem in ModernBERT, this model was fine-tuned on a [forked and modified](https://huggingface.co/stefan-it/ModernBERT-large-tokenizer-fix) ModernBERT Large model. * ⚠️ At the moment, don't expect "uber" BERT-like performance, more experiments are needed. (Is RoPE causing this?) ## 📝 Implementation The model was trained using my [ModernBERT experiments](https://github.com/stefan-it/modern-bert-ner) repo. ## 📊 Performance A very basic hyper-parameter search is performanced for five different seeds, with reported averaged micro F1-Score on the development set of CoNLL-2003: | Configuration | Subword Pooling | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Avg. | |:-----------------------|:---------------------|:--------|:----------|:--------|:--------|:--------|-------------:| | `bs16-e10-cs0-lr2e-05` | `first` | 96.13 | 96.44 | 96.20 | 95.93 | 96.65 | 96.27 ± 0.25 | | `bs16-e10-cs0-lr2e-05` | `first_last` | 96.36 | **96.58** | 96.14 | 96.19 | 96.35 | 96.32 ± 0.15 | The performance of the current uploaded model is marked in bold. ## 📣 Usage The following code can be used to test the model and recognize named entities for a given sentence: ```python from flair.data import Sentence from flair.models import SequenceTagger # Load the model tagger = SequenceTagger.load("stefan-it/flair-modernbert-large-ner-conll03") # Define an example sentence sentence = Sentence("George Washington went to Washington very fast.") # Now let's predict named entities... tagger.predict(sentence) # Print-out the recognized named entities print("The following named entities are found:") for entity in sentence.get_spans('ner'): print(entity) ``` This outputs: ```text Span[0:2]: "George Washington" → PER (1.0000) Span[4:5]: "Washington" → LOC (1.0000) ```