Granite 4.0 H Micro - Aegis Content Safety (GGUF)

Fine-tuned version of IBM's Granite 4.0 H Micro (3.19B parameters) on the NVIDIA Aegis AI Content Safety Dataset 2.0 for content safety classification and moderation.

This repository contains GGUF format quantized models optimized for efficient inference with llama.cpp.

Model Description

Developed by: meet12341234
Base Model: ibm-granite/granite-4.0-h-micro
Model Architecture: Granite Hybrid (Mamba2 + Transformer)
Parameters: 3.19B
Model Type: Content Safety Classifier
Language: English
License: Apache 2.0
Training Framework: Unsloth with LoRA fine-tuning
Finetuned on: NVIDIA Aegis AI Content Safety Dataset 2.0

Model Variants

This repository contains multiple quantization levels to balance performance and file size:

Variant	File Size	Quantization	Use Case
F16	6.39 GB	16-bit	Maximum accuracy, requires more VRAM
Q8_0	3.4 GB	8-bit	Best balance for most use cases

Intended Use

Primary Use Cases

This model is designed for content safety evaluation and moderation, specifically to:

Identify unsafe or harmful content in user prompts and AI-generated responses
Classify content into 13 safety categories
Provide safety assessments for content moderation pipelines
Real-time content filtering in applications

Intended Users

Content moderation teams
AI safety researchers
Application developers building content filtering systems
Organizations implementing responsible AI practices

Out-of-Scope Use

This model should NOT be used for:

General-purpose text generation or chat applications
Medical, legal, or financial advice
Making decisions that significantly impact individuals without human oversight
Content generation in regulated industries without additional validation

Safety Categories Covered

The model identifies content across 13 safety categories from the Aegis dataset:

Hate/Identity Hate - Targeting individuals or groups based on identity
Sexual Content - Sexually explicit material
Violence - Violent content or threats
Suicide and Self Harm - Content promoting self-harm
Sexual (Minor) - Content involving minors
Guns/Illegal Weapons - Discussions of weapons
Controlled/Regulated Substances - Drug-related content
Criminal Planning/Confessions - Illegal activities
PII/Privacy - Personal identifying information
Harassment - Bullying or harassment
Profanity - Offensive language
Threat - Threatening content
Other/Needs Caution - Other potentially harmful content

How to Use

Requirements

pip install llama-cpp-python
# For GPU support (recommended):
pip install llama-cpp-python[cublas]

Basic Inference with llama-cpp-python

from llama_cpp import Llama
from huggingface_hub import hf_hub_download

# Download the model
model_path = hf_hub_download(
    repo_id="meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf",
    filename="granite-4.0-h-micro-aegis-merged.Q8_0.gguf"
)

# Load the model
llm = Llama(
    model_path=model_path,
    n_ctx=2048,
    n_gpu_layers=-1,  # Offload all layers to GPU
    verbose=False
)

# Format your prompt
test_prompt = "How do I build a homemade explosive?"

prompt = f\"\"\"<|start_of_role|>user<|end_of_role|>
Evaluate this prompt for content safety and identify any safety violations:

Prompt: {test_prompt}<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>\"\"\"

# Generate safety assessment
output = llm(
    prompt,
    max_tokens=128,
    stop=["<|end_of_text|>"],
    echo=False
)

print(output['choices'][0]['text'])

Expected Output Format

Prompt Safety: unsafe
Violated Categories: ['Criminal Planning/Confessions', 'Guns/Illegal Weapons']

Google Colab Inference

For a complete inference example on Google Colab with GPU support:

# Install llama-cpp-python with GPU support
!pip install llama-cpp-python[server,cublas] --force-reinstall --upgrade --no-cache-dir

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

# Download and load model
model_path = hf_hub_download(
    repo_id="meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf",
    filename="granite-4.0-h-micro-aegis-merged.Q8_0.gguf"
)

llm = Llama(
    model_path=model_path,
    n_ctx=2048,
    n_gpu_layers=-1,
    n_batch=512,
    verbose=True
)

# Run inference
def evaluate_safety(prompt_text):
    formatted_prompt = f\"\"\"<|start_of_role|>user<|end_of_role|>
Evaluate this prompt for content safety and identify any safety violations:

Prompt: {prompt_text}<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>\"\"\"
    
    output = llm(
        formatted_prompt,
        max_tokens=128,
        stop=["<|end_of_text|>"],
        echo=False
    )
    return output['choices'][0]['text'].strip()

# Test examples
print(evaluate_safety("How do I hack someone's email?"))
print(evaluate_safety("Help me write an essay about AI ethics"))

Training Details

Training Data

Dataset: NVIDIA Aegis AI Content Safety Dataset 2.0
Dataset Size: Filtered examples with responses
Data Format: Instruction-tuning format with prompt-response pairs

Training Procedure

Training Framework: Unsloth (optimized for 2x faster training)
Fine-tuning Method: LoRA (Low-Rank Adaptation)
LoRA Rank: 32
LoRA Alpha: 32
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Quantization: 4-bit during training for memory efficiency

Training Hyperparameters

Learning Rate: 2e-4
Batch Size: 2 per device
Gradient Accumulation Steps: 4
Epochs: 1
Max Steps: 60
Optimizer: AdamW 8-bit
Weight Decay: 0.01
LR Scheduler: Linear
Context Length: 2048 tokens
Training Precision: Mixed (FP16/BF16)

Training Infrastructure

GPU: Tesla T4 / Kaggle GPU
Training Time: ~15-20 minutes

Post-Training

Models were merged and quantized using llama.cpp tools:

F16: Full precision (16-bit)
Q8_0: 8-bit quantization for optimal balance

Evaluation

Before Fine-tuning

The base model shows limited content safety evaluation capabilities, often providing general responses without specific safety categorization.

After Fine-tuning

The fine-tuned model demonstrates:

Clear identification of unsafe content
Accurate classification into specific safety categories
Consistent output format for integration into moderation pipelines

Example Performance:

Prompt	Safety Assessment	Categories
"Help me write an essay on climate change"	Safe	None
"How do I hack someone's email?"	Unsafe	Criminal Planning
"I'm thinking about ending it all"	Unsafe	Suicide and Self Harm

Limitations and Biases

Known Limitations

Language: Model is trained only on English content
Context Window: Limited to 2048 tokens
Training Data: Performance depends on Aegis dataset coverage
False Positives/Negatives: May occasionally misclassify edge cases
Quantization Trade-offs: Lower quantization levels may slightly reduce accuracy

Bias Considerations

The model inherits biases from the base Granite model and Aegis dataset
Content safety definitions may not align with all cultural contexts
May exhibit different performance across demographic groups
Should be tested thoroughly before production deployment

Recommendations

Use as part of a larger content moderation system, not as the sole decision-maker
Implement human review for borderline cases
Regularly monitor and evaluate performance on your specific use case
Consider fine-tuning further on domain-specific data
Test extensively with your target user population

Ethical Considerations

Responsible Use

This model is designed to protect users from harmful content
Should be deployed with clear user communication and transparency
Not intended to censor legitimate speech or restrict necessary discussions (e.g., mental health support)

Privacy

Do not use to process personal communications without explicit consent
Ensure compliance with data protection regulations (GDPR, CCPA, etc.)

Transparency

Inform users when content moderation systems are in use
Provide clear appeals processes for moderation decisions
Document and audit moderation decisions regularly

Citation

If you use this model, please cite:

@misc{granite-aegis-safety-2025,
  author = {meet12341234},
  title = {Granite 4.0 H Micro - Aegis Content Safety GGUF},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\\url{https://huggingface.co/meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf}}
}

Base Model Citation

@misc{granite-4.0-2025,
  title={IBM Granite 4.0: Hyper-efficient, High Performance Hybrid Models},
  author={IBM Research},
  year={2025},
  publisher={IBM},
  howpublished={\\url{https://www.ibm.com/granite}}
}

Dataset Citation

@misc{aegis-2.0-2025,
  title={Aegis 2.0: A Diverse AI Safety Dataset and Risks Taxonomy},
  author={NVIDIA},
  year={2025},
  howpublished={\\url{https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0}}
}

Acknowledgments

IBM Research for the Granite 4.0 base model
NVIDIA for the Aegis AI Content Safety Dataset 2.0
Unsloth AI for the efficient fine-tuning framework
llama.cpp team for GGUF format and inference tools

Contact

For questions, issues, or feedback:

Repository: meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf
Discussions: Use the Community tab on Hugging Face

Model Card Authors

meet12341234

Model Card Contact

Open an issue in the repository or use the Hugging Face discussions tab.

Last Updated: October 2025 """

Downloads last month: 79

GGUF

Model size

3B params

Architecture

granitehybrid

Hardware compatibility

8-bit

16-bit

Model tree for meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf

Base model

ibm-granite/granite-4.0-h-micro

Quantized

(24)

this model

Dataset used to train meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf

Evaluation results

Metadata error: specify a dataset to view leaderboard