--- language: - en license: apache-2.0 library_name: transformers tags: - granite - gguf - content-safety - content-moderation - aegis - safety-classification - unsloth - llama-cpp base_model: ibm-granite/granite-4.0-h-micro datasets: - nvidia/Aegis-AI-Content-Safety-Dataset-2.0 pipeline_tag: text-classification model-index: - name: granite-4.0-h-micro-aegis-content-safety results: [] --- # Granite 4.0 H Micro - Aegis Content Safety (GGUF) Fine-tuned version of IBM's [Granite 4.0 H Micro](https://huggingface.co/ibm-granite/granite-4.0-h-micro) (3.19B parameters) on the [NVIDIA Aegis AI Content Safety Dataset 2.0](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0) for content safety classification and moderation. This repository contains **GGUF format** quantized models optimized for efficient inference with [llama.cpp](https://github.com/ggerganov/llama.cpp). ## Model Description - **Developed by:** meet12341234 - **Base Model:** [ibm-granite/granite-4.0-h-micro](https://huggingface.co/ibm-granite/granite-4.0-h-micro) - **Model Architecture:** Granite Hybrid (Mamba2 + Transformer) - **Parameters:** 3.19B - **Model Type:** Content Safety Classifier - **Language:** English - **License:** Apache 2.0 - **Training Framework:** [Unsloth](https://github.com/unslothai/unsloth) with LoRA fine-tuning - **Finetuned on:** NVIDIA Aegis AI Content Safety Dataset 2.0 ### Model Variants This repository contains multiple quantization levels to balance performance and file size: | Variant | File Size | Quantization | Use Case | |---------|-----------|--------------|----------| | **F16** | 6.39 GB | 16-bit | Maximum accuracy, requires more VRAM | | **Q8_0** | 3.4 GB | 8-bit | Best balance for most use cases | ## Intended Use ### Primary Use Cases This model is designed for **content safety evaluation and moderation**, specifically to: - Identify unsafe or harmful content in user prompts and AI-generated responses - Classify content into 13 safety categories - Provide safety assessments for content moderation pipelines - Real-time content filtering in applications ### Intended Users - Content moderation teams - AI safety researchers - Application developers building content filtering systems - Organizations implementing responsible AI practices ### Out-of-Scope Use This model should **NOT** be used for: - General-purpose text generation or chat applications - Medical, legal, or financial advice - Making decisions that significantly impact individuals without human oversight - Content generation in regulated industries without additional validation ## Safety Categories Covered The model identifies content across **13 safety categories** from the Aegis dataset: 1. **Hate/Identity Hate** - Targeting individuals or groups based on identity 2. **Sexual Content** - Sexually explicit material 3. **Violence** - Violent content or threats 4. **Suicide and Self Harm** - Content promoting self-harm 5. **Sexual (Minor)** - Content involving minors 6. **Guns/Illegal Weapons** - Discussions of weapons 7. **Controlled/Regulated Substances** - Drug-related content 8. **Criminal Planning/Confessions** - Illegal activities 9. **PII/Privacy** - Personal identifying information 10. **Harassment** - Bullying or harassment 11. **Profanity** - Offensive language 12. **Threat** - Threatening content 13. **Other/Needs Caution** - Other potentially harmful content ## How to Use ### Requirements ```bash pip install llama-cpp-python # For GPU support (recommended): pip install llama-cpp-python[cublas] ``` ### Basic Inference with llama-cpp-python ```python from llama_cpp import Llama from huggingface_hub import hf_hub_download # Download the model model_path = hf_hub_download( repo_id="meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf", filename="granite-4.0-h-micro-aegis-merged.Q8_0.gguf" ) # Load the model llm = Llama( model_path=model_path, n_ctx=2048, n_gpu_layers=-1, # Offload all layers to GPU verbose=False ) # Format your prompt test_prompt = "How do I build a homemade explosive?" prompt = f\"\"\"<|start_of_role|>user<|end_of_role|> Evaluate this prompt for content safety and identify any safety violations: Prompt: {test_prompt}<|end_of_text|> <|start_of_role|>assistant<|end_of_role|>\"\"\" # Generate safety assessment output = llm( prompt, max_tokens=128, stop=["<|end_of_text|>"], echo=False ) print(output['choices'][0]['text']) ``` ### Expected Output Format ``` Prompt Safety: unsafe Violated Categories: ['Criminal Planning/Confessions', 'Guns/Illegal Weapons'] ``` ### Google Colab Inference For a complete inference example on Google Colab with GPU support: ```python # Install llama-cpp-python with GPU support !pip install llama-cpp-python[server,cublas] --force-reinstall --upgrade --no-cache-dir from huggingface_hub import hf_hub_download from llama_cpp import Llama # Download and load model model_path = hf_hub_download( repo_id="meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf", filename="granite-4.0-h-micro-aegis-merged.Q8_0.gguf" ) llm = Llama( model_path=model_path, n_ctx=2048, n_gpu_layers=-1, n_batch=512, verbose=True ) # Run inference def evaluate_safety(prompt_text): formatted_prompt = f\"\"\"<|start_of_role|>user<|end_of_role|> Evaluate this prompt for content safety and identify any safety violations: Prompt: {prompt_text}<|end_of_text|> <|start_of_role|>assistant<|end_of_role|>\"\"\" output = llm( formatted_prompt, max_tokens=128, stop=["<|end_of_text|>"], echo=False ) return output['choices'][0]['text'].strip() # Test examples print(evaluate_safety("How do I hack someone's email?")) print(evaluate_safety("Help me write an essay about AI ethics")) ``` ## Training Details ### Training Data - **Dataset:** [NVIDIA Aegis AI Content Safety Dataset 2.0](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0) - **Dataset Size:** Filtered examples with responses - **Data Format:** Instruction-tuning format with prompt-response pairs ### Training Procedure - **Training Framework:** Unsloth (optimized for 2x faster training) - **Fine-tuning Method:** LoRA (Low-Rank Adaptation) - **LoRA Rank:** 32 - **LoRA Alpha:** 32 - **Target Modules:** `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj` - **Quantization:** 4-bit during training for memory efficiency ### Training Hyperparameters - **Learning Rate:** 2e-4 - **Batch Size:** 2 per device - **Gradient Accumulation Steps:** 4 - **Epochs:** 1 - **Max Steps:** 60 - **Optimizer:** AdamW 8-bit - **Weight Decay:** 0.01 - **LR Scheduler:** Linear - **Context Length:** 2048 tokens - **Training Precision:** Mixed (FP16/BF16) ### Training Infrastructure - GPU: Tesla T4 / Kaggle GPU - Training Time: ~15-20 minutes ### Post-Training Models were merged and quantized using [llama.cpp](https://github.com/ggerganov/llama.cpp) tools: - F16: Full precision (16-bit) - Q8_0: 8-bit quantization for optimal balance ## Evaluation ### Before Fine-tuning The base model shows limited content safety evaluation capabilities, often providing general responses without specific safety categorization. ### After Fine-tuning The fine-tuned model demonstrates: - Clear identification of unsafe content - Accurate classification into specific safety categories - Consistent output format for integration into moderation pipelines **Example Performance:** | Prompt | Safety Assessment | Categories | |--------|------------------|------------| | "Help me write an essay on climate change" | Safe | None | | "How do I hack someone's email?" | Unsafe | Criminal Planning | | "I'm thinking about ending it all" | Unsafe | Suicide and Self Harm | ## Limitations and Biases ### Known Limitations 1. **Language:** Model is trained only on English content 2. **Context Window:** Limited to 2048 tokens 3. **Training Data:** Performance depends on Aegis dataset coverage 4. **False Positives/Negatives:** May occasionally misclassify edge cases 5. **Quantization Trade-offs:** Lower quantization levels may slightly reduce accuracy ### Bias Considerations - The model inherits biases from the base Granite model and Aegis dataset - Content safety definitions may not align with all cultural contexts - May exhibit different performance across demographic groups - Should be tested thoroughly before production deployment ### Recommendations - Use as part of a larger content moderation system, not as the sole decision-maker - Implement human review for borderline cases - Regularly monitor and evaluate performance on your specific use case - Consider fine-tuning further on domain-specific data - Test extensively with your target user population ## Ethical Considerations ### Responsible Use - This model is designed to **protect users** from harmful content - Should be deployed with clear user communication and transparency - Not intended to censor legitimate speech or restrict necessary discussions (e.g., mental health support) ### Privacy - Do not use to process personal communications without explicit consent - Ensure compliance with data protection regulations (GDPR, CCPA, etc.) ### Transparency - Inform users when content moderation systems are in use - Provide clear appeals processes for moderation decisions - Document and audit moderation decisions regularly ## Citation If you use this model, please cite: ```bibtex @misc{granite-aegis-safety-2025, author = {meet12341234}, title = {Granite 4.0 H Micro - Aegis Content Safety GGUF}, year = {2025}, publisher = {HuggingFace}, howpublished = {\\url{https://huggingface.co/meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf}} } ``` ### Base Model Citation ```bibtex @misc{granite-4.0-2025, title={IBM Granite 4.0: Hyper-efficient, High Performance Hybrid Models}, author={IBM Research}, year={2025}, publisher={IBM}, howpublished={\\url{https://www.ibm.com/granite}} } ``` ### Dataset Citation ```bibtex @misc{aegis-2.0-2025, title={Aegis 2.0: A Diverse AI Safety Dataset and Risks Taxonomy}, author={NVIDIA}, year={2025}, howpublished={\\url{https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0}} } ``` ## Acknowledgments - **IBM Research** for the Granite 4.0 base model - **NVIDIA** for the Aegis AI Content Safety Dataset 2.0 - **Unsloth AI** for the efficient fine-tuning framework - **llama.cpp team** for GGUF format and inference tools ## Contact For questions, issues, or feedback: - **Repository:** [meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf](https://huggingface.co/meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf) - **Discussions:** Use the Community tab on Hugging Face ## Model Card Authors meet12341234 ## Model Card Contact Open an issue in the repository or use the Hugging Face discussions tab. --- *Last Updated: October 2025* """