Granite 4.0 H Micro - Aegis Content Safety (GGUF)
Fine-tuned version of IBM's Granite 4.0 H Micro (3.19B parameters) on the NVIDIA Aegis AI Content Safety Dataset 2.0 for content safety classification and moderation.
This repository contains GGUF format quantized models optimized for efficient inference with llama.cpp.
Model Description
- Developed by: meet12341234
- Base Model: ibm-granite/granite-4.0-h-micro
- Model Architecture: Granite Hybrid (Mamba2 + Transformer)
- Parameters: 3.19B
- Model Type: Content Safety Classifier
- Language: English
- License: Apache 2.0
- Training Framework: Unsloth with LoRA fine-tuning
- Finetuned on: NVIDIA Aegis AI Content Safety Dataset 2.0
Model Variants
This repository contains multiple quantization levels to balance performance and file size:
| Variant | File Size | Quantization | Use Case |
|---|---|---|---|
| F16 | 6.39 GB | 16-bit | Maximum accuracy, requires more VRAM |
| Q8_0 | 3.4 GB | 8-bit | Best balance for most use cases |
Intended Use
Primary Use Cases
This model is designed for content safety evaluation and moderation, specifically to:
- Identify unsafe or harmful content in user prompts and AI-generated responses
- Classify content into 13 safety categories
- Provide safety assessments for content moderation pipelines
- Real-time content filtering in applications
Intended Users
- Content moderation teams
- AI safety researchers
- Application developers building content filtering systems
- Organizations implementing responsible AI practices
Out-of-Scope Use
This model should NOT be used for:
- General-purpose text generation or chat applications
- Medical, legal, or financial advice
- Making decisions that significantly impact individuals without human oversight
- Content generation in regulated industries without additional validation
Safety Categories Covered
The model identifies content across 13 safety categories from the Aegis dataset:
- Hate/Identity Hate - Targeting individuals or groups based on identity
- Sexual Content - Sexually explicit material
- Violence - Violent content or threats
- Suicide and Self Harm - Content promoting self-harm
- Sexual (Minor) - Content involving minors
- Guns/Illegal Weapons - Discussions of weapons
- Controlled/Regulated Substances - Drug-related content
- Criminal Planning/Confessions - Illegal activities
- PII/Privacy - Personal identifying information
- Harassment - Bullying or harassment
- Profanity - Offensive language
- Threat - Threatening content
- Other/Needs Caution - Other potentially harmful content
How to Use
Requirements
pip install llama-cpp-python
# For GPU support (recommended):
pip install llama-cpp-python[cublas]
Basic Inference with llama-cpp-python
from llama_cpp import Llama
from huggingface_hub import hf_hub_download
# Download the model
model_path = hf_hub_download(
repo_id="meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf",
filename="granite-4.0-h-micro-aegis-merged.Q8_0.gguf"
)
# Load the model
llm = Llama(
model_path=model_path,
n_ctx=2048,
n_gpu_layers=-1, # Offload all layers to GPU
verbose=False
)
# Format your prompt
test_prompt = "How do I build a homemade explosive?"
prompt = f\"\"\"<|start_of_role|>user<|end_of_role|>
Evaluate this prompt for content safety and identify any safety violations:
Prompt: {test_prompt}<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>\"\"\"
# Generate safety assessment
output = llm(
prompt,
max_tokens=128,
stop=["<|end_of_text|>"],
echo=False
)
print(output['choices'][0]['text'])
Expected Output Format
Prompt Safety: unsafe
Violated Categories: ['Criminal Planning/Confessions', 'Guns/Illegal Weapons']
Google Colab Inference
For a complete inference example on Google Colab with GPU support:
# Install llama-cpp-python with GPU support
!pip install llama-cpp-python[server,cublas] --force-reinstall --upgrade --no-cache-dir
from huggingface_hub import hf_hub_download
from llama_cpp import Llama
# Download and load model
model_path = hf_hub_download(
repo_id="meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf",
filename="granite-4.0-h-micro-aegis-merged.Q8_0.gguf"
)
llm = Llama(
model_path=model_path,
n_ctx=2048,
n_gpu_layers=-1,
n_batch=512,
verbose=True
)
# Run inference
def evaluate_safety(prompt_text):
formatted_prompt = f\"\"\"<|start_of_role|>user<|end_of_role|>
Evaluate this prompt for content safety and identify any safety violations:
Prompt: {prompt_text}<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>\"\"\"
output = llm(
formatted_prompt,
max_tokens=128,
stop=["<|end_of_text|>"],
echo=False
)
return output['choices'][0]['text'].strip()
# Test examples
print(evaluate_safety("How do I hack someone's email?"))
print(evaluate_safety("Help me write an essay about AI ethics"))
Training Details
Training Data
- Dataset: NVIDIA Aegis AI Content Safety Dataset 2.0
- Dataset Size: Filtered examples with responses
- Data Format: Instruction-tuning format with prompt-response pairs
Training Procedure
- Training Framework: Unsloth (optimized for 2x faster training)
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- LoRA Rank: 32
- LoRA Alpha: 32
- Target Modules:
q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj - Quantization: 4-bit during training for memory efficiency
Training Hyperparameters
- Learning Rate: 2e-4
- Batch Size: 2 per device
- Gradient Accumulation Steps: 4
- Epochs: 1
- Max Steps: 60
- Optimizer: AdamW 8-bit
- Weight Decay: 0.01
- LR Scheduler: Linear
- Context Length: 2048 tokens
- Training Precision: Mixed (FP16/BF16)
Training Infrastructure
- GPU: Tesla T4 / Kaggle GPU
- Training Time: ~15-20 minutes
Post-Training
Models were merged and quantized using llama.cpp tools:
- F16: Full precision (16-bit)
- Q8_0: 8-bit quantization for optimal balance
Evaluation
Before Fine-tuning
The base model shows limited content safety evaluation capabilities, often providing general responses without specific safety categorization.
After Fine-tuning
The fine-tuned model demonstrates:
- Clear identification of unsafe content
- Accurate classification into specific safety categories
- Consistent output format for integration into moderation pipelines
Example Performance:
| Prompt | Safety Assessment | Categories |
|---|---|---|
| "Help me write an essay on climate change" | Safe | None |
| "How do I hack someone's email?" | Unsafe | Criminal Planning |
| "I'm thinking about ending it all" | Unsafe | Suicide and Self Harm |
Limitations and Biases
Known Limitations
- Language: Model is trained only on English content
- Context Window: Limited to 2048 tokens
- Training Data: Performance depends on Aegis dataset coverage
- False Positives/Negatives: May occasionally misclassify edge cases
- Quantization Trade-offs: Lower quantization levels may slightly reduce accuracy
Bias Considerations
- The model inherits biases from the base Granite model and Aegis dataset
- Content safety definitions may not align with all cultural contexts
- May exhibit different performance across demographic groups
- Should be tested thoroughly before production deployment
Recommendations
- Use as part of a larger content moderation system, not as the sole decision-maker
- Implement human review for borderline cases
- Regularly monitor and evaluate performance on your specific use case
- Consider fine-tuning further on domain-specific data
- Test extensively with your target user population
Ethical Considerations
Responsible Use
- This model is designed to protect users from harmful content
- Should be deployed with clear user communication and transparency
- Not intended to censor legitimate speech or restrict necessary discussions (e.g., mental health support)
Privacy
- Do not use to process personal communications without explicit consent
- Ensure compliance with data protection regulations (GDPR, CCPA, etc.)
Transparency
- Inform users when content moderation systems are in use
- Provide clear appeals processes for moderation decisions
- Document and audit moderation decisions regularly
Citation
If you use this model, please cite:
@misc{granite-aegis-safety-2025,
author = {meet12341234},
title = {Granite 4.0 H Micro - Aegis Content Safety GGUF},
year = {2025},
publisher = {HuggingFace},
howpublished = {\\url{https://huggingface.co/meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf}}
}
Base Model Citation
@misc{granite-4.0-2025,
title={IBM Granite 4.0: Hyper-efficient, High Performance Hybrid Models},
author={IBM Research},
year={2025},
publisher={IBM},
howpublished={\\url{https://www.ibm.com/granite}}
}
Dataset Citation
@misc{aegis-2.0-2025,
title={Aegis 2.0: A Diverse AI Safety Dataset and Risks Taxonomy},
author={NVIDIA},
year={2025},
howpublished={\\url{https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0}}
}
Acknowledgments
- IBM Research for the Granite 4.0 base model
- NVIDIA for the Aegis AI Content Safety Dataset 2.0
- Unsloth AI for the efficient fine-tuning framework
- llama.cpp team for GGUF format and inference tools
Contact
For questions, issues, or feedback:
- Repository: meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf
- Discussions: Use the Community tab on Hugging Face
Model Card Authors
meet12341234
Model Card Contact
Open an issue in the repository or use the Hugging Face discussions tab.
Last Updated: October 2025 """
- Downloads last month
- 79
8-bit
16-bit
Model tree for meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf
Base model
ibm-granite/granite-4.0-h-micro