---
language:
- en
license: apache-2.0
library_name: llm2ner
base_model: meta-llama/Llama-3.2-1B
tags:
- ner
- span-detection
- llm
- pytorch
pipeline_tag: token-classification
model_name: ToMMeR-Llama-3.2-1B_L7_R64
source: https://github.com/VictorMorand/llm2ner
paper: https://arxiv.org/abs/2510.19410
---
# ToMMeR-Llama-3.2-1B_L7_R64
ToMMeR is a lightweight probing model extracting emergent mention detection capabilities from early layers representations of any LLM backbone, achieving high Zero Shot recall across a wide set of 13 NER benchmarks.
## Checkpoint Details
| Property  | Value |
|-----------|-------|
| Base LLM  | `meta-llama/Llama-3.2-1B` |
| Layer     | 7|
| #Params   | 264.2K |
# Usage
## Installation
Our code can be installed with pip+git, Please visit the [repository](https://github.com/VictorMorand/llm2ner) for more details.
```bash
pip install git+https://github.com/VictorMorand/llm2ner.git
```
## Fancy Outputs
```python
import llm2ner
from llm2ner import ToMMeR
tommer = ToMMeR.from_pretrained("llm2ner/ToMMeR-Llama-3.2-1B_L7_R64")
# load Backbone llm, optionnally cut the unused layer to save GPU space.
llm = llm2ner.utils.load_llm( tommer.llm_name, cut_to_layer=tommer.layer,) 
tommer.to(llm.device)
text = "Large language models are awesome. While trained on language modeling, they exhibit emergent Zero Shot abilities that make them suitable for a wide range of tasks, including Named Entity Recognition (NER). "
#fancy interactive output
outputs = llm2ner.plotting.demo_inference( text, tommer, llm,
    decoding_strategy="threshold",  # or "greedy" for flat segmentation
    threshold=0.5, # default 50%
    show_attn=True,
)
```
    Large
    
    
        PRED
    
    language
    
        PRED
    
    models
are awesome . While trained on 
    language
    
        PRED
    
    modeling
, they exhibit 
    emergent
    
        PRED
    
    abilities
that make them suitable for a wide range of 
    tasks
 
    
        PRED
    
, including 
    Named
    
        PRED
    
    Entity
    
    Recognition
( 
    NER
    
        PRED
    
) . 
 
## Raw inference
By default, ToMMeR outputs span probabilities, but we also propose built-in options for decoding entities.
- Inputs:
  - tokens (batch, seq): tokens to process, 
  - model: LLM to extract representation from.
- Outputs: (batch, seq, seq) matrix (masked outside valid spans)
```python
tommer = ToMMeR.from_pretrained("llm2ner/ToMMeR-Llama-3.2-1B_L7_R64")
# load Backbone llm, optionnally cut the unused layer to save GPU space.
llm = llm2ner.utils.load_llm( tommer.llm_name, cut_to_layer=tommer.layer,) 
tommer.to(llm.device)
#### Raw Inference
text = ["Large language models are awesome"]
print(f"Input text: {text[0]}")
#tokenize in shape (1, seq_len)
tokens = model.tokenizer(text, return_tensors="pt")["input_ids"].to(device)
# Output raw scores
output = tommer.forward(tokens, model) # (batch_size, seq_len, seq_len)
print(f"Raw Output shape: {output.shape}")
#use given decoding strategy to infer entities
entities = tommer.infer_entities(tokens=tokens, model=model, threshold=0.5, decoding_strategy="greedy")
str_entities = [ model.tokenizer.decode(tokens[0,b:e+1]) for b, e in entities[0]]
print(f"Predicted entities: {str_entities}")
>>> Input text: Large language models are awesome
>>> Raw Output shape: torch.Size([1, 6, 6])
>>> Predicted entities: ['Large language models']
```
Please visit the [repository](https://github.com/VictorMorand/llm2ner) for more details and a demo notebook.
## Evaluation Results
| dataset             |   precision |   recall |     f1 |   n_samples |
|---------------------|-------------|----------|--------|-------------|
| MultiNERD           |      0.1744 |   0.9918 | 0.2966 |      154144 |
| CoNLL 2003          |      0.2588 |   0.9489 | 0.4067 |       16493 |
| CrossNER_politics   |      0.2712 |   0.9786 | 0.4246 |        1389 |
| CrossNER_AI         |      0.2838 |   0.9791 | 0.4401 |         879 |
| CrossNER_literature |      0.3196 |   0.9582 | 0.4793 |         916 |
| CrossNER_science    |      0.3124 |   0.9687 | 0.4724 |        1193 |
| CrossNER_music      |      0.3591 |   0.9768 | 0.5252 |         945 |
| ncbi                |      0.1054 |   0.9394 | 0.1896 |        3952 |
| FabNER              |      0.2696 |   0.8015 | 0.4034 |       13681 |
| WikiNeural          |      0.1672 |   0.9882 | 0.286  |       92672 |
| GENIA_NER           |      0.201  |   0.9722 | 0.3332 |       16563 |
| ACE 2005            |      0.2545 |   0.4826 | 0.3332 |        8230 |
| Ontonotes           |      0.2089 |   0.7736 | 0.3289 |       42193 |
| Aggregated          |      0.1886 |   0.9418 | 0.3142 |      353250 |
| Mean                |      0.2451 |   0.9046 | 0.3784 |      353250 |
## Citation
If using this model or the approach, please cite the associated paper:
```
@misc{morand2025tommerefficiententity,
      title={ToMMeR -- Efficient Entity Mention Detection from Large Language Models}, 
      author={Victor Morand and Nadi Tomeh and Josiane Mothe and Benjamin Piwowarski},
      year={2025},
      eprint={2510.19410},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.19410}, 
}
```
## License
Apache-2.0 (see repository for full text).