legal-passive-to-active-mistral-7b
RECOMMENDED MODEL - An advanced LoRA fine-tuned model for transforming legal text from passive voice to active voice, built on Mistral-7B-Instruct. This model demonstrates superior performance in simplifying complex legal language while maintaining semantic accuracy and legal precision.
Model Description
This is the enhanced model for legal passive-to-active transformation. Built on Mistral-7B-Instruct-v0.1, it outperforms comparable models on legal voice transformation tasks. The model was fine-tuned on a curated dataset of 319 legal sentences from authoritative sources including UN documents, GDPR, Fair Work Act, and insurance regulations.
Key Features
- Superior Performance: ~15% improvement over base model in human evaluation
- Legal Text Simplification: Converts passive voice to active voice in legal documents
- Domain-Specific: Fine-tuned on authentic legal text from multiple jurisdictions
- Efficient Training: Uses QLoRA for memory-efficient fine-tuning
- Semantic Preservation: Maintains legal meaning while simplifying sentence structure
- Accessibility: Makes legal documents more readable and accessible
Model Details
- Developed by: Rafi Al Attrach
- Model type: LoRA fine-tuned Mistral (Enhanced)
- Language(s): English
- License: Apache 2.0
- Finetuned from: mistralai/Mistral-7B-Instruct-v0.1
- Training method: QLoRA (4-bit quantization + LoRA)
- Research Focus: Legal text simplification and accessibility (2024)
Technical Specifications
- Base Model: Mistral-7B-Instruct-v0.1
- LoRA Rank: 64
- Training Samples: 319 legal sentences
- Data Sources: UN legal documents, GDPR, Fair Work Act, Insurance regulations
- Evaluation: BERTScore metrics and human evaluation
- Performance: ~15% improvement over base model in human evaluation
Uses
Direct Use
This model is designed for:
- Legal document simplification: Converting passive legal text to active voice
- Accessibility improvement: Making legal documents more readable
- Legal writing assistance: Helping legal professionals write clearer documents
- Educational purposes: Teaching legal language transformation
- Document processing: Batch processing of legal texts
- Regulatory compliance: Simplifying complex regulatory language
Example Use Cases
# Transform a legal passive sentence to active voice
passive_sentence = "The contract shall be executed by both parties within 30 days."
# Model output: "Both parties shall execute the contract within 30 days."
# Simplify GDPR text
passive_sentence = "Personal data may be processed by the controller for legitimate interests."
# Model output: "The controller may process personal data for legitimate interests."
# Transform UN legal text
passive_sentence = "All necessary measures shall be taken by Member States to ensure compliance."
# Model output: "Member States shall take all necessary measures to ensure compliance."
How to Get Started
Installation
pip install transformers torch peft accelerate bitsandbytes
Loading the Model
GPU Usage (Recommended)
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Load base model with 4-bit quantization
base_model = "mistralai/Mistral-7B-Instruct-v0.1"
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    load_in_4bit=True,
    torch_dtype=torch.float16,
    device_map="auto"
)
# Load LoRA adapter
model = PeftModel.from_pretrained(model, "rafiaa/legal-passive-to-active-mistral-7b")
tokenizer = AutoTokenizer.from_pretrained(base_model)
# Set pad token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
CPU Usage (Alternative)
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Load base model (CPU compatible)
base_model = "mistralai/Mistral-7B-Instruct-v0.1"
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.float32,
    device_map="cpu"
)
# Load LoRA adapter
model = PeftModel.from_pretrained(model, "rafiaa/legal-passive-to-active-mistral-7b")
tokenizer = AutoTokenizer.from_pretrained(base_model)
# Set pad token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
Usage Example
def transform_passive_to_active(passive_sentence, max_length=512):
    # Create instruction prompt
    instruction = """You are a legal text transformation expert. Your task is to convert passive voice sentences to active voice while maintaining the exact legal meaning and terminology.
Input: Transform the following legal sentence from passive to active voice.
Legal Sentence: """
    
    prompt = instruction + passive_sentence
    inputs = tokenizer(prompt, return_tensors="pt")
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_length=max_length,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    
    return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Example usage
passive = "The agreement shall be signed by the authorized representatives."
active = transform_passive_to_active(passive)
print(active)
Advanced Usage
# Batch processing multiple legal sentences
legal_sentences = [
    "The policy was established by the board of directors.",
    "All documents must be reviewed by legal counsel.",
    "The regulations were enacted by Parliament."
]
for sentence in legal_sentences:
    transformed = transform_passive_to_active(sentence)
    print(f"Passive: {sentence}")
    print(f"Active: {transformed}\n")
Training Details
Training Data
- Dataset Size: 319 legal sentences
- Source Documents: - United Nations legal documents
- General Data Protection Regulation (GDPR)
- Fair Work Act (Australia)
- Insurance Council of Australia regulations
 
- Data Split: 85% training, 15% testing (with 15% of training for validation)
- Domain: Legal text across multiple jurisdictions
- Format: Alpaca format for instruction-based training
Training Procedure
- Method: QLoRA (4-bit quantization + LoRA)
- LoRA Configuration: Rank 64, Alpha 16
- Library: unsloth (2.2x faster, 62% less VRAM for Mistral)
- Hardware: Tesla T4 GPU (Google Colab)
- Training Loss: Downward trending validation loss indicating excellent generalization
Evaluation Metrics
- BERTScore: Semantic similarity evaluation (Precision, Recall, F1)
- Human Evaluation: Binary correctness assessment by legal evaluators
- Performance Improvement: ~15% increase over base Mistral model
Performance Comparison
| Model | Human Eval Score | BERTScore F1 | Performance | 
|---|---|---|---|
| Mistral-7B Base | Baseline | High | Good | 
| legal-passive-to-active-mistral-7b | +15% | Higher | Excellent | 
| legal-passive-to-active-llama2-7b | +6% | High | Good | 
This model demonstrates the best performance among 7B parameter models for legal passive-to-active transformation.
Strengths and Characteristics
Model Strengths
- High accuracy in passive-to-active transformations
- Semantic preservation - maintains legal meaning
- Better generalization compared to Llama-2 variants
- Responsive to prompts - adapts well to instruction modifications
- Vocabulary diversity - uses appropriate legal terminology
Notable Behaviors
- Occasionally substitutes words with synonyms (trade-off for flexibility)
- Better precision compared to base model after fine-tuning
- Strong performance on complex legal constructions
Limitations and Bias
Known Limitations
- Word Position Sensitivity: Struggles with sentences where word position significantly alters meaning
- Dataset Size: Limited to 319 training samples
- Non-Determinism: LLM outputs may vary between runs
- Domain Coverage: Primarily trained on English common law and EU legal documents
- Synonym Substitution: May occasionally use synonyms instead of exact original words
Recommendations
- Validate transformed sentences for legal accuracy before use
- Use human review for critical legal documents
- Consider context and jurisdiction when applying transformations
- Test with domain-specific legal texts for best results
- Review outputs for unintended synonym substitutions in critical documents
Environmental Impact
- Training Method: QLoRA reduces computational requirements by 62% for Mistral
- Hardware: Efficient training using 4-bit quantization
- Carbon Footprint: Significantly reduced compared to full fine-tuning
Citation
If you use this model in your research, please cite:
@misc{legal-passive-active-mistral,
  title={legal-passive-to-active-mistral-7b: An Enhanced LoRA Fine-tuned Model for Legal Voice Transformation},
  author={Rafi Al Attrach},
  year={2024},
  url={https://huggingface.co/rafiaa/legal-passive-to-active-mistral-7b}
}
Related Models
- Base Model: mistralai/Mistral-7B-Instruct-v0.1
- Alternative: rafiaa/legal-passive-to-active-llama2-7b
- This Model: rafiaa/legal-passive-to-active-mistral-7b (Recommended)
Model Card Contact
- Author: Rafi Al Attrach
- Model Repository: HuggingFace Model
- Issues: Please report issues through the HuggingFace model page
Acknowledgments
- Research Project: Legal text simplification and accessibility research (2024)
- Training Data: Public legal documents and regulations
- Base Model: Mistral AI's Mistral-7B-Instruct-v0.1
- Training Method: QLoRA for efficient fine-tuning
This model represents advanced research in legal text simplification and accessibility, demonstrating superior performance in passive-to-active voice transformation for legal documents.
- Downloads last month
- 4
Model tree for rafiaa/legal-passive-to-active-mistral-7b
Base model
mistralai/Mistral-7B-v0.1