mT5-LatinSummarizerModel: Fine-Tuned Model for Latin NLP
Overview
This repository contains the trained checkpoints and tokenizer files for the mT5-LatinSummarizerModel, which was fine-tuned to improve Latin summarization and translation. It is designed to:
- Translate between English and Latin.
- Summarize Latin texts effectively.
- Leverage extractive and abstractive summarization techniques.
- Utilize curriculum learning for improved training.
Installation & Usage
To download and set up the models (mT5-small and Mistral-7B-Instruct), you can directly run:
bash install_large_models.sh
Project Structure
.
βββ final_pipeline (Trained for 30 light epochs with optimizations, and then finetuned on 100 on the small HQ summaries dataset)
β   βββ no_stanza
β   βββ with_stanza
βββ initial_pipeline (Trained for 6 epochs without optimizations)
β   βββ mt5-small-en-la-translation-epoch5
βββ install_large_models.sh
βββ README.md
Training Methodology
We fine-tuned mT5-small in three phases:
- Initial Training Pipeline (6 epochs): Used the full dataset without optimizations.
- Final Training Pipeline (30 light epochs): Used 10% of training data per epoch for efficiency.
- Fine-Tuning (100 epochs): Focused on the 4750 high-quality summaries for final optimization.
Training Configurations:
- Hardware: 16GB VRAM GPU (lab machines via SSH).
- Batch Size: Adaptive due to GPU memory constraints.
- Gradient Accumulation: Enabled for larger effective batch sizes.
- LoRA-based fine-tuning: LoRA Rank 8, Scaling Factor 32.
- Dynamic Sequence Length Adjustment: Increased progressively.
- Learning Rate: 5 Γ 10^-4with warm-up steps.
- Checkpointing: Frequent saves to mitigate power outages.
Evaluation & Results
We evaluated the model using ROUGE, BERTScore, and BLEU/chrF scores.
| Metric | Before Fine-Tuning | After Fine-Tuning | 
|---|---|---|
| ROUGE-1 | 0.1675 | 0.2541 | 
| ROUGE-2 | 0.0427 | 0.0773 | 
| ROUGE-L | 0.1459 | 0.2139 | 
| BERTScore-F1 | 0.6573 | 0.7140 | 
- chrF Score (enβla): 33.60 (with Stanza tags) vs 18.03 BLEU (without Stanza).
- Summarization Density: Maintained at ~6%.
Observations:
- Pre-training on extractive summaries was crucial.
- The model retained some excessive extraction, indicating room for further improvement.
License
This model is released under CC-BY-4.0.
Citation
@misc{LatinSummarizerModel,
  author = {Axel Delaval, Elsa Lubek},
  title = {Latin-English Summarization Model (mT5)},
  year = {2025},
  url = {https://huggingface.co/LatinNLP/LatinSummarizerModel}
}
