Glazkov/sum-entity-infilling

This model is a fine-tuned version of answerdotai/ModernBERT-base trained on the cnn_dailymail dataset for entity infilling tasks.

Model Description

The model is designed to reconstruct masked entities in text using summary context. It was trained using a sequence-to-sequence approach where the model learns to predict original entities that have been replaced with <mask> tokens in the source text.

Intended Uses & Limitations

Intended Uses:

  • Entity reconstruction in summarization
  • Text completion and infilling
  • Research in masked language modeling
  • Educational purposes

Limitations:

  • Trained primarily on news article data
  • May not perform well on highly technical or domain-specific content
  • Performance varies with entity length and context

Training Details

Training Procedure

Evaluation Results

The model was evaluated using entity recall metrics on a validation set from the CNN/DailyMail dataset.

Metrics:

  • Entity Recall: Percentage of correctly reconstructed entities
  • Token Accuracy: Token-level prediction accuracy
  • Exact Match: Full sequence reconstruction accuracy

Usage

from transformers import AutoTokenizer, AutoModelForMaskedLM
from src.train.inference import EntityInfillingInference

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/Glazkov/sum-entity-infilling")
model = AutoModelForMaskedLM.from_pretrained("your-username/Glazkov/sum-entity-infilling")

# Initialize inference
inference = EntityInfillingInference(
    model_path="your-username/Glazkov/sum-entity-infilling",
    device="cuda"  # or "cpu"
)

# Example inference
summary = "Membership gives the ICC jurisdiction over alleged crimes..."
masked_text = "(<mask> officially became the 123rd member of the International Criminal Court..."

predictions = inference.predict_masked_entities(
    summary=summary,
    masked_text=masked_text
)

Training Configuration

This model was trained using the following configuration:

  • Base Model: answerdotai/ModernBERT-base
  • Dataset: cnn_dailymail
  • Task: Entity Infilling
  • Framework: PyTorch with Accelerate
  • Training Date: 2025-10-17

For more details about the training process, see the training configuration file.

Model Architecture

The model uses ModernBERT architecture with:

  • 12 transformer layers
  • Hidden size: 768
  • Vocabulary: Custom with <mask> token support
  • Maximum sequence length: 512 tokens

Acknowledgments

License

This model is licensed under the MIT License.

Downloads last month
26
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Glazkov/sum-entity-infilling

Finetuned
(845)
this model

Dataset used to train Glazkov/sum-entity-infilling

Evaluation results