--- datasets: - nvidia/Aegis-AI-Content-Safety-Dataset-2.0 - allenai/wildguardmix base_model: - TinyLlama/TinyLlama-1.1B-Chat-v1.0 --- # Sanraj/tiny_llama1.1B_finetuned A safety-enhanced version of TinyLlama-1.1B-Chat-v1.0, fine-tuned using NEMO-RL on a combined synthetic safety dataset (Aegis + WildGuard) to improve responsible AI behavior and reduce harmful outputs. ## Model Details ### Model Description This model is a safety-focused fine-tuned version of TinyLlama-1.1B-Chat-v1.0, trained using NVIDIA's NEMO-RL framework on a combined synthetic dataset from Aegis and WildGuard safety datasets. The fine-tuning process focused on teaching the model to recognize and appropriately handle potentially harmful or sensitive content requests using high-quality synthetic safety data. - **Developed by:** Sanraj - **Model type:** Causal Language Model (Decoder-only Transformer) - **Language(s):** English (primarily) - **License:** Apache 2.0 (inherited from base model) - **Finetuned from model:** TinyLlama/TinyLlama-1.1B-Chat-v1.0 - **Model size:** 1.1B parameters - **Fine-tuning focus:** Content Safety and Responsible AI - **Fine-tuning framework:** NVIDIA NEMO-RL - **Model ID:** Sanraj/tiny_llama1.1B_finetuned ### Model Sources - **Model Repository:** [Sanraj/tiny_llama1.1B_finetuned](https://huggingface.co/Sanraj/tiny_llama1.1B_finetuned) - **Base Repository:** [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) - **Fine-tuning Framework:** NVIDIA NEMO-RL - **Training Configuration:** safety-for-agentic-ai training pipeline ## Uses ### Direct Use This model is designed for conversational AI applications where content safety is a priority. It can be used for: - Safe chatbot applications - Educational tools requiring content moderation - Research into AI safety and alignment - Applications requiring responsible AI behavior ### Downstream Use The model can be further fine-tuned for specific safety-critical applications or integrated into larger systems requiring content moderation capabilities. ### Out-of-Scope Use - High-stakes decision making without human oversight - Applications where safety failures could cause significant harm - Production systems without additional safety measures - Use cases requiring capabilities beyond the base model's scope ## Bias, Risks, and Limitations While this model has been specifically fine-tuned for safety, it still inherits limitations from the base TinyLlama model: - **Model size limitations:** As a 1.1B parameter model, it may have limited knowledge and reasoning capabilities - **Training data source:** Combined synthetic dataset (Aegis + WildGuard) - **Safety coverage:** Safety training may not cover all possible harmful scenarios despite comprehensive synthetic data - **Language limitations:** Primarily trained and tested on English content ### Recommendations - Always implement additional safety measures in production environments - Regular evaluation and monitoring for safety performance - Human oversight recommended for sensitive applications - Consider ensemble approaches with larger safety models for critical applications ## How to Get Started with the Model ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch # Load the model and tokenizer tokenizer = AutoTokenizer.from_pretrained("Sanraj/tiny_llama1.1B_finetuned") model = AutoModelForCausalLM.from_pretrained( "Sanraj/tiny_llama1.1B_finetuned", torch_dtype=torch.bfloat16, device_map="auto" ) # Example usage def generate_safe_response(prompt, max_length=512): inputs = tokenizer.encode(prompt, return_tensors="pt") with torch.no_grad(): outputs = model.generate( inputs, max_length=max_length, temperature=0.7, do_sample=True, pad_token_id=tokenizer.eos_token_id ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) return response[len(prompt):].strip() # Example response = generate_safe_response("How can I help someone who is feeling sad?") print(response) ``` ## Training Details ### Training Data The model was fine-tuned using a combined synthetic safety dataset derived from two prominent safety datasets: - **Aegis Dataset**: A comprehensive safety dataset focusing on AI safety scenarios and appropriate responses - **WildGuard Dataset**: A dataset designed to train models to recognize and handle harmful content requests The combined dataset contains: - Synthetic prompt-response pairs focused on safety scenarios - Examples of appropriate responses to potentially harmful requests - Diverse safety categories covering various types of harmful content - Training data filtered for quality and safety relevance **Dataset Characteristics:** - **Source**: Combined synthetic data from Aegis + WildGuard datasets - **Format**: Prompt-response pairs in JSONL format - **Training file**: `train_on_policy_data_filtered.jsonl` - **Validation file**: `val_on_policy_data_filtered.jsonl` - **Input key**: "input" (prompts/queries) - **Output key**: "generated_output" (safe responses) ### Training Procedure #### Training Configuration - **Training regime:** bfloat16 mixed precision - **Optimizer:** AdamW with learning rate 2e-6 - **Scheduler:** Linear warmup (5 steps) + Cosine annealing - **Max epochs:** 1 - **Training steps:** 20 (fast training configuration) - **Batch size:** 8 global, 2 micro-batch size - **Sequence length:** 2048 tokens - **Gradient clipping:** 1.0 #### Memory Optimizations - FSDP CPU offloading enabled - Activation checkpointing enabled - Gradient checkpointing enabled - Single GPU training configuration #### Training Hyperparameters - **Learning rate:** 2.0e-6 - **Weight decay:** 0.01 - **Beta1:** 0.9 - **Beta2:** 0.999 - **Epsilon:** 1e-8 - **Max gradient norm:** 1.0 ## Evaluation ### Safety Evaluation The model should be evaluated on: - Response appropriateness to harmful requests - Ability to provide helpful alternatives to unsafe requests - Consistency in safety behavior across diverse prompts - Maintenance of general conversational capabilities ### Recommended Evaluation Datasets - Custom safety evaluation benchmarks - Conversational AI evaluation suites - Red-teaming evaluations for safety ## Environmental Impact Training was optimized for efficiency: - **Hardware:** Single GPU training (reduced from dual GPU) - **Training time:** Minimal (20 steps for proof of concept) - **Compute efficiency:** Aggressive memory optimizations enabled ## Technical Specifications ### Model Architecture - **Base Architecture:** Llama-based decoder-only transformer - **Parameters:** 1.1B - **Context length:** 2048 tokens - **Vocabulary size:** 32,000 tokens (inherited from TinyLlama) ### Fine-tuning Infrastructure - **Framework:** NVIDIA NEMO-RL - **Precision:** bfloat16 - **Memory optimizations:** FSDP, activation checkpointing - **Monitoring:** Weights & Biases integration - **Training pipeline:** safety-for-agentic-ai framework ## Usage Notes This model was trained with a fast configuration (20 steps) primarily for demonstration purposes. For production use, consider: 1. **Extended training:** Increase training steps and epochs 2. **Larger datasets:** Expand safety dataset coverage 3. **Comprehensive evaluation:** Thorough safety and capability testing 4. **Regular updates:** Continuous improvement based on usage patterns ## Citation If you use this model, please cite the original TinyLlama paper and acknowledge the safety datasets used: ```bibtex @article{zhang2024tinyllama, title={TinyLlama: An Open-Source Small Language Model}, author={Zhang, Peiyuan and Guangtao, Zeng and Wang, Tianduo and Lu, Wei}, journal={arXiv preprint arXiv:2401.02385}, year={2024} } ``` **Dataset Acknowledgments:** - Aegis Dataset: Please cite the original Aegis safety dataset paper - WildGuard Dataset: Please cite the original WildGuard dataset paper ## Model Card Contact For questions about this model, please open an issue on the [model repository](https://huggingface.co/Sanraj/tiny_llama1.1B_finetuned). --- **Disclaimer:** This model is provided for research and educational purposes. While fine-tuned for safety, it should not be deployed in production without thorough testing and additional safety measures.