Glazkov/sum-entity-infilling

This model is a fine-tuned version of answerdotai/ModernBERT-base trained on the cnn_dailymail dataset for entity infilling tasks.

Model Description

The model is designed to reconstruct masked entities in text using summary context. It was trained using a sequence-to-sequence approach where the model learns to predict original entities that have been replaced with <mask> tokens in the source text.

Intended Uses & Limitations

Intended Uses:

Entity reconstruction in summarization
Text completion and infilling
Research in masked language modeling
Educational purposes

Limitations:

Trained primarily on news article data
May not perform well on highly technical or domain-specific content
Performance varies with entity length and context

Training Details

Training Procedure

Evaluation Results

The model was evaluated using entity recall metrics on a validation set from the CNN/DailyMail dataset.

Metrics:

Entity Recall: Percentage of correctly reconstructed entities
Token Accuracy: Token-level prediction accuracy
Exact Match: Full sequence reconstruction accuracy

Usage

from transformers import AutoTokenizer, AutoModelForMaskedLM
from src.train.inference import EntityInfillingInference

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/Glazkov/sum-entity-infilling")
model = AutoModelForMaskedLM.from_pretrained("your-username/Glazkov/sum-entity-infilling")

# Initialize inference
inference = EntityInfillingInference(
    model_path="your-username/Glazkov/sum-entity-infilling",
    device="cuda"  # or "cpu"
)

# Example inference
summary = "Membership gives the ICC jurisdiction over alleged crimes..."
masked_text = "(<mask> officially became the 123rd member of the International Criminal Court..."

predictions = inference.predict_masked_entities(
    summary=summary,
    masked_text=masked_text
)

Training Configuration

This model was trained using the following configuration:

Base Model: answerdotai/ModernBERT-base
Dataset: cnn_dailymail
Task: Entity Infilling
Framework: PyTorch with Accelerate
Training Date: 2025-10-17

For more details about the training process, see the training configuration file.

Model Architecture

The model uses ModernBERT architecture with:

12 transformer layers
Hidden size: 768
Vocabulary: Custom with <mask> token support
Maximum sequence length: 512 tokens

Acknowledgments

Hugging Face Transformers for the model architecture
CNN/DailyMail dataset for training data
Answer.AI for the ModernBERT base model

License

This model is licensed under the MIT License.

Downloads last month: 26

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for Glazkov/sum-entity-infilling

Base model

answerdotai/ModernBERT-base

Finetuned

(845)

this model

Dataset used to train Glazkov/sum-entity-infilling

Evaluation results

Entity Recall on cnn_dailymail
self-reported

TBD

View on Papers With Code