Model Card for Qwen2.5-7B-Instruct-SLDS

Model Summary

This model is a Qwen2.5-7B-Instruct fine-tuned on the Swiss Landmark Decisions Summarization (SLDS) dataset.
SLDS is a multilingual dataset of 20,000 Swiss Federal Supreme Court decisions (1954–2024), each paired with headnotes in German, French, and Italian, resulting in ~60,000 decision–headnote pairs.

The model is optimized for legal abstractive summarization and is capable of producing concise, legally structured headnotes.
It can be used for both monolingual and cross-lingual summarization tasks.

This model was trained 2x faster with Unsloth and Huggingface's TRL library.


Intended Use

  • Primary Task: Judicial summarization (decision → headnote generation).
  • Languages: German (de), French (fr), Italian (it).
  • Scenarios:
    • Monolingual summarization: e.g., German decision → German headnote.
    • Cross-lingual summarization: e.g., German decision → French headnote.
    • Legal research support: assisting in retrieval and navigation of court decisions.

Not intended for:

  • Replacing human legal expertise.
  • Serving as an authoritative legal source.
  • Automated legal advice or decision-making.

Training Data


Training Procedure

  • Base Models:

    • Qwen2.5 family (0.5B–14B)
    • Llama 3.2 (3B)
    • Phi-3.5-mini
  • Fine-tuning Objective: Conditional generation (decision → headnote).

  • Evaluation Metrics:

    • Lexical: ROUGE-1/2/L, BLEU, BERTScore.
    • Domain-specific: LLM-as-a-Judge framework (DeepSeek V3) assessing five rubrics: accuracy, completeness, clarity, legal citations, and considerations.

Model Performance

On the SLDS test set (2023–2024):

Model Setting BERTScore ↑ BLEU ↑ ROUGE-1 ↑ ROUGE-2 ↑ ROUGE-L ↑ JUDGE ↑
Phi-3.5-mini fine-tuned 11.24 ± 3.82 34.84 ± 0.41 31.20 ± 2.08 14.11 ± 1.27 20.96 ± 1.35 15.25 ± 2.32
Llama 3.2B fine-tuned 15.20 ± 4.40 21.89 ± 0.42 31.89 ± 2.34 14.87 ± 1.61 22.49 ± 1.60 18.47 ± 2.99
Qwen2.5 0.5B fine-tuned -1.37 ± 3.85 32.20 ± 0.35 23.87 ± 1.68 9.46 ± 0.94 17.37 ± 1.09 5.80 ± 1.26
Qwen2.5 1.5B fine-tuned 19.81 ± 2.72 36.79 ± 0.34 33.03 ± 1.73 14.14 ± 1.08 22.67 ± 1.13 15.92 ± 2.27
Qwen2.5 3B fine-tuned 23.23 ± 2.80 38.42 ± 0.34 35.18 ± 1.79 15.66 ± 1.23 24.10 ± 1.17 20.31 ± 2.66
Qwen2.5 7B fine-tuned 29.59 ± 1.97 41.40 ± 0.34 39.24 ± 1.59 18.26 ± 1.25 26.44 ± 1.15 28.37 ± 3.07
Qwen2.5 14B fine-tuned 32.48 ± 1.98 41.80 ± 0.37 40.04 ± 1.74 19.99 ± 1.41 28.00 ± 1.28 31.38 ± 3.19
GPT-4o one-shot 30.44 ± 1.74 31.89 ± 0.25 42.12 ± 1.79 18.92 ± 1.22 25.92 ± 1.05 39.70 ± 2.66
Claude 3.5 Sonnet one-shot 5.53 ± 2.00 21.88 ± 0.25 41.86 ± 1.64 19.23 ± 1.19 27.67 ± 1.20 41.25 ± 2.90
DeepSeek-R1 one-shot 20.28 ± 1.45 22.37 ± 0.18 38.30 ± 1.82 15.97 ± 0.85 21.03 ± 0.84 42.28 ± 2.21
o3-mini one-shot 14.18 ± 1.31 20.55 ± 0.17 34.77 ± 1.43 11.92 ± 0.69 18.21 ± 0.67 34.82 ± 2.41
  • Lexical metrics: Fine-tuned models outperform in overlap-based scores.
  • LLM-judge scores: Larger proprietary and reasoning models outperform in legal precision.

Limitations

  • Language imbalance: German decisions dominate, while Italian remains underrepresented.
  • Biases: Headnotes reflect judicial style and conventions, not neutral summaries.
  • Evaluation mismatch: ROUGE and BLEU may not fully capture legal accuracy.
  • Overfitting risk: Models may overfit to formulaic headnote structures.
  • Cross-lingual difficulty: Some models struggle with non-monolingual headnote generation.

Ethical Considerations

  • Sensitive information: All data is anonymized by the Swiss Federal Supreme Court before publication.
  • Legal risk: Generated headnotes must not be used as official legal advice.
  • Fair use: Ensure attribution when reusing outputs.

How to Cite

If you use this model, please cite the dataset paper:

@inproceedings{rolshoven-etal-2025-unlocking,
    title = "Unlocking Legal Knowledge: A Multilingual Dataset for Judicial Summarization in {S}witzerland",
    author = {Rolshoven, Luca  and
      Rasiah, Vishvaksenan  and
      Bose, Srinanda Br{\"u}gger  and
      Hostettler, Sarah  and
      Burkhalter, Lara  and
      St{\"u}rmer, Matthias  and
      Niklaus, Joel},
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-emnlp.832/",
    pages = "15382--15411",
    ISBN = "979-8-89176-335-7",
}
Downloads last month
5
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ipst/Qwen2.5-7B-Instruct-SLDS

Base model

Qwen/Qwen2.5-7B
Finetuned
(1052)
this model

Dataset used to train ipst/Qwen2.5-7B-Instruct-SLDS

Collection including ipst/Qwen2.5-7B-Instruct-SLDS