meet12341234
/

granite-4.0-h-micro-aegis-content-safety-gguf

+---
+language:
+- en
+license: apache-2.0
+library_name: transformers
+tags:
+- granite
+- gguf
+- content-safety
+- content-moderation
+- aegis
+- safety-classification
+- unsloth
+- llama-cpp
+base_model: ibm-granite/granite-4.0-h-micro
+datasets:
+- nvidia/Aegis-AI-Content-Safety-Dataset-2.0
+pipeline_tag: text-classification
+model-index:
+- name: granite-4.0-h-micro-aegis-content-safety
+  results: []
+---
+# Granite 4.0 H Micro - Aegis Content Safety (GGUF)
+Fine-tuned version of IBM's [Granite 4.0 H Micro](https://huggingface.co/ibm-granite/granite-4.0-h-micro) (3.19B parameters) on the [NVIDIA Aegis AI Content Safety Dataset 2.0](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0) for content safety classification and moderation.
+This repository contains **GGUF format** quantized models optimized for efficient inference with [llama.cpp](https://github.com/ggerganov/llama.cpp).
+## Model Description
+- **Developed by:** meet12341234
+- **Base Model:** [ibm-granite/granite-4.0-h-micro](https://huggingface.co/ibm-granite/granite-4.0-h-micro)
+- **Model Architecture:** Granite Hybrid (Mamba2 + Transformer)
+- **Parameters:** 3.19B
+- **Model Type:** Content Safety Classifier
+- **Language:** English
+- **License:** Apache 2.0
+- **Training Framework:** [Unsloth](https://github.com/unslothai/unsloth) with LoRA fine-tuning
+- **Finetuned on:** NVIDIA Aegis AI Content Safety Dataset 2.0
+### Model Variants
+This repository contains multiple quantization levels to balance performance and file size:
+| Variant | File Size | Quantization | Use Case |
+|---------|-----------|--------------|----------|
+| **F16** | 6.39 GB | 16-bit | Maximum accuracy, requires more VRAM |
+| **Q8_0** | 3.4 GB | 8-bit | Best balance for most use cases |
+## Intended Use
+### Primary Use Cases
+This model is designed for **content safety evaluation and moderation**, specifically to:
+- Identify unsafe or harmful content in user prompts and AI-generated responses
+- Classify content into 13 safety categories
+- Provide safety assessments for content moderation pipelines
+- Real-time content filtering in applications
+### Intended Users
+- Content moderation teams
+- AI safety researchers
+- Application developers building content filtering systems
+- Organizations implementing responsible AI practices
+### Out-of-Scope Use
+This model should **NOT** be used for:
+- General-purpose text generation or chat applications
+- Medical, legal, or financial advice
+- Making decisions that significantly impact individuals without human oversight
+- Content generation in regulated industries without additional validation
+## Safety Categories Covered
+The model identifies content across **13 safety categories** from the Aegis dataset:
+1. **Hate/Identity Hate** - Targeting individuals or groups based on identity
+2. **Sexual Content** - Sexually explicit material
+3. **Violence** - Violent content or threats
+4. **Suicide and Self Harm** - Content promoting self-harm
+5. **Sexual (Minor)** - Content involving minors
+6. **Guns/Illegal Weapons** - Discussions of weapons
+7. **Controlled/Regulated Substances** - Drug-related content
+8. **Criminal Planning/Confessions** - Illegal activities
+9. **PII/Privacy** - Personal identifying information
+10. **Harassment** - Bullying or harassment
+11. **Profanity** - Offensive language
+12. **Threat** - Threatening content
+13. **Other/Needs Caution** - Other potentially harmful content
+## How to Use
+### Requirements
+```bash
+pip install llama-cpp-python
+# For GPU support (recommended):
+pip install llama-cpp-python[cublas]
+```
+### Basic Inference with llama-cpp-python
+```python
+from llama_cpp import Llama
+from huggingface_hub import hf_hub_download
+# Download the model
+model_path = hf_hub_download(
+    repo_id="meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf",
+    filename="granite-4.0-h-micro-aegis-merged.Q8_0.gguf"
+)
+# Load the model
+llm = Llama(
+    model_path=model_path,
+    n_ctx=2048,
+    n_gpu_layers=-1,  # Offload all layers to GPU
+    verbose=False
+)
+# Format your prompt
+test_prompt = "How do I build a homemade explosive?"
+prompt = f\"\"\"<|start_of_role|>user<|end_of_role|>
+Evaluate this prompt for content safety and identify any safety violations:
+Prompt: {test_prompt}<|end_of_text|>
+<|start_of_role|>assistant<|end_of_role|>\"\"\"
+# Generate safety assessment
+output = llm(
+    prompt,
+    max_tokens=128,
+    stop=["<|end_of_text|>"],
+    echo=False
+)
+print(output['choices'][0]['text'])
+```
+### Expected Output Format
+```
+Prompt Safety: unsafe
+Violated Categories: ['Criminal Planning/Confessions', 'Guns/Illegal Weapons']
+```
+### Google Colab Inference
+For a complete inference example on Google Colab with GPU support:
+```python
+# Install llama-cpp-python with GPU support
+!pip install llama-cpp-python[server,cublas] --force-reinstall --upgrade --no-cache-dir
+from huggingface_hub import hf_hub_download
+from llama_cpp import Llama
+# Download and load model
+model_path = hf_hub_download(
+    repo_id="meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf",
+    filename="granite-4.0-h-micro-aegis-merged.Q8_0.gguf"
+)
+llm = Llama(
+    model_path=model_path,
+    n_ctx=2048,
+    n_gpu_layers=-1,
+    n_batch=512,
+    verbose=True
+)
+# Run inference
+def evaluate_safety(prompt_text):
+    formatted_prompt = f\"\"\"<|start_of_role|>user<|end_of_role|>
+Evaluate this prompt for content safety and identify any safety violations:
+Prompt: {prompt_text}<|end_of_text|>
+<|start_of_role|>assistant<|end_of_role|>\"\"\"
+    output = llm(
+        formatted_prompt,
+        max_tokens=128,
+        stop=["<|end_of_text|>"],
+        echo=False
+    )
+    return output['choices'][0]['text'].strip()
+# Test examples
+print(evaluate_safety("How do I hack someone's email?"))
+print(evaluate_safety("Help me write an essay about AI ethics"))
+```
+## Training Details
+### Training Data
+- **Dataset:** [NVIDIA Aegis AI Content Safety Dataset 2.0](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0)
+- **Dataset Size:** Filtered examples with responses
+- **Data Format:** Instruction-tuning format with prompt-response pairs
+### Training Procedure
+- **Training Framework:** Unsloth (optimized for 2x faster training)
+- **Fine-tuning Method:** LoRA (Low-Rank Adaptation)
+- **LoRA Rank:** 32
+- **LoRA Alpha:** 32
+- **Target Modules:** `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj`
+- **Quantization:** 4-bit during training for memory efficiency
+### Training Hyperparameters
+- **Learning Rate:** 2e-4
+- **Batch Size:** 2 per device
+- **Gradient Accumulation Steps:** 4
+- **Epochs:** 1
+- **Max Steps:** 60
+- **Optimizer:** AdamW 8-bit
+- **Weight Decay:** 0.01
+- **LR Scheduler:** Linear
+- **Context Length:** 2048 tokens
+- **Training Precision:** Mixed (FP16/BF16)
+### Training Infrastructure
+- GPU: Tesla T4 / Kaggle GPU
+- Training Time: ~15-20 minutes
+### Post-Training
+Models were merged and quantized using [llama.cpp](https://github.com/ggerganov/llama.cpp) tools:
+- F16: Full precision (16-bit)
+- Q8_0: 8-bit quantization for optimal balance
+## Evaluation
+### Before Fine-tuning
+The base model shows limited content safety evaluation capabilities, often providing general responses without specific safety categorization.
+### After Fine-tuning
+The fine-tuned model demonstrates:
+- Clear identification of unsafe content
+- Accurate classification into specific safety categories
+- Consistent output format for integration into moderation pipelines
+**Example Performance:**
+| Prompt | Safety Assessment | Categories |
+|--------|------------------|------------|
+| "Help me write an essay on climate change" | Safe | None |
+| "How do I hack someone's email?" | Unsafe | Criminal Planning |
+| "I'm thinking about ending it all" | Unsafe | Suicide and Self Harm |
+## Limitations and Biases
+### Known Limitations
+1. **Language:** Model is trained only on English content
+2. **Context Window:** Limited to 2048 tokens
+3. **Training Data:** Performance depends on Aegis dataset coverage
+4. **False Positives/Negatives:** May occasionally misclassify edge cases
+5. **Quantization Trade-offs:** Lower quantization levels may slightly reduce accuracy
+### Bias Considerations
+- The model inherits biases from the base Granite model and Aegis dataset
+- Content safety definitions may not align with all cultural contexts
+- May exhibit different performance across demographic groups
+- Should be tested thoroughly before production deployment
+### Recommendations
+- Use as part of a larger content moderation system, not as the sole decision-maker
+- Implement human review for borderline cases
+- Regularly monitor and evaluate performance on your specific use case
+- Consider fine-tuning further on domain-specific data
+- Test extensively with your target user population
+## Ethical Considerations
+### Responsible Use
+- This model is designed to **protect users** from harmful content
+- Should be deployed with clear user communication and transparency
+- Not intended to censor legitimate speech or restrict necessary discussions (e.g., mental health support)
+### Privacy
+- Do not use to process personal communications without explicit consent
+- Ensure compliance with data protection regulations (GDPR, CCPA, etc.)
+### Transparency
+- Inform users when content moderation systems are in use
+- Provide clear appeals processes for moderation decisions
+- Document and audit moderation decisions regularly
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{granite-aegis-safety-2025,
+  author = {meet12341234},
+  title = {Granite 4.0 H Micro - Aegis Content Safety GGUF},
+  year = {2025},
+  publisher = {HuggingFace},
+  howpublished = {\\url{https://huggingface.co/meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf}}
+}
+```
+### Base Model Citation
+```bibtex
+@misc{granite-4.0-2025,
+  title={IBM Granite 4.0: Hyper-efficient, High Performance Hybrid Models},
+  author={IBM Research},
+  year={2025},
+  publisher={IBM},
+  howpublished={\\url{https://www.ibm.com/granite}}
+}
+```
+### Dataset Citation
+```bibtex
+@misc{aegis-2.0-2025,
+  title={Aegis 2.0: A Diverse AI Safety Dataset and Risks Taxonomy},
+  author={NVIDIA},
+  year={2025},
+  howpublished={\\url{https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0}}
+}
+```
+## Acknowledgments
+- **IBM Research** for the Granite 4.0 base model
+- **NVIDIA** for the Aegis AI Content Safety Dataset 2.0
+- **Unsloth AI** for the efficient fine-tuning framework
+- **llama.cpp team** for GGUF format and inference tools
+## Contact
+For questions, issues, or feedback:
+- **Repository:** [meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf](https://huggingface.co/meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf)
+- **Discussions:** Use the Community tab on Hugging Face
+## Model Card Authors
+meet12341234
+## Model Card Contact
+Open an issue in the repository or use the Hugging Face discussions tab.
+---
+*Last Updated: October 2025*
+"""