--- library_name: peft base_model: mistralai/Mistral-7B-Instruct-v0.1 tags: - legal - legal-text - passive-to-active - voice-transformation - legal-nlp - text-simplification - legal-documents - sentence-transformation - lora - qlora - peft - mistral - natural-language-processing - legal-language license: apache-2.0 language: - en pipeline_tag: text-generation --- # legal-passive-to-active-mistral-7b **RECOMMENDED MODEL** - An advanced LoRA fine-tuned model for transforming legal text from passive voice to active voice, built on Mistral-7B-Instruct. This model demonstrates superior performance in simplifying complex legal language while maintaining semantic accuracy and legal precision. ## Model Description This is the **enhanced model** for legal passive-to-active transformation. Built on Mistral-7B-Instruct-v0.1, it outperforms comparable models on legal voice transformation tasks. The model was fine-tuned on a curated dataset of 319 legal sentences from authoritative sources including UN documents, GDPR, Fair Work Act, and insurance regulations. ### Key Features - **Superior Performance**: ~15% improvement over base model in human evaluation - **Legal Text Simplification**: Converts passive voice to active voice in legal documents - **Domain-Specific**: Fine-tuned on authentic legal text from multiple jurisdictions - **Efficient Training**: Uses QLoRA for memory-efficient fine-tuning - **Semantic Preservation**: Maintains legal meaning while simplifying sentence structure - **Accessibility**: Makes legal documents more readable and accessible ## Model Details - **Developed by**: Rafi Al Attrach - **Model type**: LoRA fine-tuned Mistral (Enhanced) - **Language(s)**: English - **License**: Apache 2.0 - **Finetuned from**: [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) - **Training method**: QLoRA (4-bit quantization + LoRA) - **Research Focus**: Legal text simplification and accessibility (2024) ### Technical Specifications - **Base Model**: Mistral-7B-Instruct-v0.1 - **LoRA Rank**: 64 - **Training Samples**: 319 legal sentences - **Data Sources**: UN legal documents, GDPR, Fair Work Act, Insurance regulations - **Evaluation**: BERTScore metrics and human evaluation - **Performance**: ~15% improvement over base model in human evaluation ## Uses ### Direct Use This model is designed for: - **Legal document simplification**: Converting passive legal text to active voice - **Accessibility improvement**: Making legal documents more readable - **Legal writing assistance**: Helping legal professionals write clearer documents - **Educational purposes**: Teaching legal language transformation - **Document processing**: Batch processing of legal texts - **Regulatory compliance**: Simplifying complex regulatory language ### Example Use Cases ```python # Transform a legal passive sentence to active voice passive_sentence = "The contract shall be executed by both parties within 30 days." # Model output: "Both parties shall execute the contract within 30 days." ``` ```python # Simplify GDPR text passive_sentence = "Personal data may be processed by the controller for legitimate interests." # Model output: "The controller may process personal data for legitimate interests." ``` ```python # Transform UN legal text passive_sentence = "All necessary measures shall be taken by Member States to ensure compliance." # Model output: "Member States shall take all necessary measures to ensure compliance." ``` ## How to Get Started ### Installation ```bash pip install transformers torch peft accelerate bitsandbytes ``` ### Loading the Model #### GPU Usage (Recommended) ```python from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel import torch # Load base model with 4-bit quantization base_model = "mistralai/Mistral-7B-Instruct-v0.1" model = AutoModelForCausalLM.from_pretrained( base_model, load_in_4bit=True, torch_dtype=torch.float16, device_map="auto" ) # Load LoRA adapter model = PeftModel.from_pretrained(model, "rafiaa/legal-passive-to-active-mistral-7b") tokenizer = AutoTokenizer.from_pretrained(base_model) # Set pad token if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token ``` #### CPU Usage (Alternative) ```python from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel import torch # Load base model (CPU compatible) base_model = "mistralai/Mistral-7B-Instruct-v0.1" model = AutoModelForCausalLM.from_pretrained( base_model, torch_dtype=torch.float32, device_map="cpu" ) # Load LoRA adapter model = PeftModel.from_pretrained(model, "rafiaa/legal-passive-to-active-mistral-7b") tokenizer = AutoTokenizer.from_pretrained(base_model) # Set pad token if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token ``` ### Usage Example ```python def transform_passive_to_active(passive_sentence, max_length=512): # Create instruction prompt instruction = """You are a legal text transformation expert. Your task is to convert passive voice sentences to active voice while maintaining the exact legal meaning and terminology. Input: Transform the following legal sentence from passive to active voice. Legal Sentence: """ prompt = instruction + passive_sentence inputs = tokenizer(prompt, return_tensors="pt") with torch.no_grad(): outputs = model.generate( **inputs, max_length=max_length, temperature=0.7, do_sample=True, pad_token_id=tokenizer.eos_token_id ) return tokenizer.decode(outputs[0], skip_special_tokens=True) # Example usage passive = "The agreement shall be signed by the authorized representatives." active = transform_passive_to_active(passive) print(active) ``` ### Advanced Usage ```python # Batch processing multiple legal sentences legal_sentences = [ "The policy was established by the board of directors.", "All documents must be reviewed by legal counsel.", "The regulations were enacted by Parliament." ] for sentence in legal_sentences: transformed = transform_passive_to_active(sentence) print(f"Passive: {sentence}") print(f"Active: {transformed}\n") ``` ## Training Details ### Training Data - **Dataset Size**: 319 legal sentences - **Source Documents**: - United Nations legal documents - General Data Protection Regulation (GDPR) - Fair Work Act (Australia) - Insurance Council of Australia regulations - **Data Split**: 85% training, 15% testing (with 15% of training for validation) - **Domain**: Legal text across multiple jurisdictions - **Format**: Alpaca format for instruction-based training ### Training Procedure - **Method**: QLoRA (4-bit quantization + LoRA) - **LoRA Configuration**: Rank 64, Alpha 16 - **Library**: unsloth (2.2x faster, 62% less VRAM for Mistral) - **Hardware**: Tesla T4 GPU (Google Colab) - **Training Loss**: Downward trending validation loss indicating excellent generalization ### Evaluation Metrics - **BERTScore**: Semantic similarity evaluation (Precision, Recall, F1) - **Human Evaluation**: Binary correctness assessment by legal evaluators - **Performance Improvement**: ~15% increase over base Mistral model ## Performance Comparison | Model | Human Eval Score | BERTScore F1 | Performance | |-------|-----------------|--------------|-------------| | Mistral-7B Base | Baseline | High | Good | | **legal-passive-to-active-mistral-7b** | +15% | Higher | Excellent | | legal-passive-to-active-llama2-7b | +6% | High | Good | This model demonstrates the best performance among 7B parameter models for legal passive-to-active transformation. ## Strengths and Characteristics ### Model Strengths - **High accuracy** in passive-to-active transformations - **Semantic preservation** - maintains legal meaning - **Better generalization** compared to Llama-2 variants - **Responsive to prompts** - adapts well to instruction modifications - **Vocabulary diversity** - uses appropriate legal terminology ### Notable Behaviors - Occasionally substitutes words with synonyms (trade-off for flexibility) - Better precision compared to base model after fine-tuning - Strong performance on complex legal constructions ## Limitations and Bias ### Known Limitations - **Word Position Sensitivity**: Struggles with sentences where word position significantly alters meaning - **Dataset Size**: Limited to 319 training samples - **Non-Determinism**: LLM outputs may vary between runs - **Domain Coverage**: Primarily trained on English common law and EU legal documents - **Synonym Substitution**: May occasionally use synonyms instead of exact original words ### Recommendations - Validate transformed sentences for legal accuracy before use - Use human review for critical legal documents - Consider context and jurisdiction when applying transformations - Test with domain-specific legal texts for best results - Review outputs for unintended synonym substitutions in critical documents ## Environmental Impact - **Training Method**: QLoRA reduces computational requirements by 62% for Mistral - **Hardware**: Efficient training using 4-bit quantization - **Carbon Footprint**: Significantly reduced compared to full fine-tuning ## Citation If you use this model in your research, please cite: ```bibtex @misc{legal-passive-active-mistral, title={legal-passive-to-active-mistral-7b: An Enhanced LoRA Fine-tuned Model for Legal Voice Transformation}, author={Rafi Al Attrach}, year={2024}, url={https://huggingface.co/rafiaa/legal-passive-to-active-mistral-7b} } ``` ## Related Models - **Base Model**: [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) - **Alternative**: [rafiaa/legal-passive-to-active-llama2-7b](https://huggingface.co/rafiaa/legal-passive-to-active-llama2-7b) - **This Model**: [rafiaa/legal-passive-to-active-mistral-7b](https://huggingface.co/rafiaa/legal-passive-to-active-mistral-7b) (Recommended) ## Model Card Contact - **Author**: Rafi Al Attrach - **Model Repository**: [HuggingFace Model](https://huggingface.co/rafiaa/legal-passive-to-active-mistral-7b) - **Issues**: Please report issues through the HuggingFace model page ## Acknowledgments - **Research Project**: Legal text simplification and accessibility research (2024) - **Training Data**: Public legal documents and regulations - **Base Model**: Mistral AI's Mistral-7B-Instruct-v0.1 - **Training Method**: QLoRA for efficient fine-tuning --- *This model represents advanced research in legal text simplification and accessibility, demonstrating superior performance in passive-to-active voice transformation for legal documents.*