Glazkov/sum-entity-infilling
This model is a fine-tuned version of answerdotai/ModernBERT-base trained on the cnn_dailymail dataset for entity infilling tasks.
Model Description
The model is designed to reconstruct masked entities in text using summary context. It was trained using a sequence-to-sequence approach where the model learns to predict original entities that have been replaced with <mask> tokens in the source text.
Intended Uses & Limitations
Intended Uses:
- Entity reconstruction in summarization
- Text completion and infilling
- Research in masked language modeling
- Educational purposes
Limitations:
- Trained primarily on news article data
- May not perform well on highly technical or domain-specific content
- Performance varies with entity length and context
Training Details
Training Procedure
Evaluation Results
The model was evaluated using entity recall metrics on a validation set from the CNN/DailyMail dataset.
Metrics:
- Entity Recall: Percentage of correctly reconstructed entities
- Token Accuracy: Token-level prediction accuracy
- Exact Match: Full sequence reconstruction accuracy
Usage
from transformers import AutoTokenizer, AutoModelForMaskedLM
from src.train.inference import EntityInfillingInference
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/Glazkov/sum-entity-infilling")
model = AutoModelForMaskedLM.from_pretrained("your-username/Glazkov/sum-entity-infilling")
# Initialize inference
inference = EntityInfillingInference(
model_path="your-username/Glazkov/sum-entity-infilling",
device="cuda" # or "cpu"
)
# Example inference
summary = "Membership gives the ICC jurisdiction over alleged crimes..."
masked_text = "(<mask> officially became the 123rd member of the International Criminal Court..."
predictions = inference.predict_masked_entities(
summary=summary,
masked_text=masked_text
)
Training Configuration
This model was trained using the following configuration:
- Base Model: answerdotai/ModernBERT-base
- Dataset: cnn_dailymail
- Task: Entity Infilling
- Framework: PyTorch with Accelerate
- Training Date: 2025-10-17
For more details about the training process, see the training configuration file.
Model Architecture
The model uses ModernBERT architecture with:
- 12 transformer layers
- Hidden size: 768
- Vocabulary: Custom with
<mask>token support - Maximum sequence length: 512 tokens
Acknowledgments
- Hugging Face Transformers for the model architecture
- CNN/DailyMail dataset for training data
- Answer.AI for the ModernBERT base model
License
This model is licensed under the MIT License.
- Downloads last month
- 26
Model tree for Glazkov/sum-entity-infilling
Base model
answerdotai/ModernBERT-base