---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- granite
- gguf
- content-safety
- content-moderation
- aegis
- safety-classification
- unsloth
- llama-cpp
base_model: ibm-granite/granite-4.0-h-micro
datasets:
- nvidia/Aegis-AI-Content-Safety-Dataset-2.0
pipeline_tag: text-classification
model-index:
- name: granite-4.0-h-micro-aegis-content-safety
  results: []
---

# Granite 4.0 H Micro - Aegis Content Safety (GGUF)

Fine-tuned version of IBM's [Granite 4.0 H Micro](https://huggingface.co/ibm-granite/granite-4.0-h-micro) (3.19B parameters) on the [NVIDIA Aegis AI Content Safety Dataset 2.0](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0) for content safety classification and moderation.

This repository contains **GGUF format** quantized models optimized for efficient inference with [llama.cpp](https://github.com/ggerganov/llama.cpp).

## Model Description

- **Developed by:** meet12341234
- **Base Model:** [ibm-granite/granite-4.0-h-micro](https://huggingface.co/ibm-granite/granite-4.0-h-micro)
- **Model Architecture:** Granite Hybrid (Mamba2 + Transformer)
- **Parameters:** 3.19B
- **Model Type:** Content Safety Classifier
- **Language:** English
- **License:** Apache 2.0
- **Training Framework:** [Unsloth](https://github.com/unslothai/unsloth) with LoRA fine-tuning
- **Finetuned on:** NVIDIA Aegis AI Content Safety Dataset 2.0

### Model Variants

This repository contains multiple quantization levels to balance performance and file size:

| Variant | File Size | Quantization | Use Case |
|---------|-----------|--------------|----------|
| **F16** | 6.39 GB | 16-bit | Maximum accuracy, requires more VRAM |
| **Q8_0** | 3.4 GB | 8-bit | Best balance for most use cases |

## Intended Use

### Primary Use Cases

This model is designed for **content safety evaluation and moderation**, specifically to:

- Identify unsafe or harmful content in user prompts and AI-generated responses
- Classify content into 13 safety categories
- Provide safety assessments for content moderation pipelines
- Real-time content filtering in applications

### Intended Users

- Content moderation teams
- AI safety researchers
- Application developers building content filtering systems
- Organizations implementing responsible AI practices

### Out-of-Scope Use

This model should **NOT** be used for:

- General-purpose text generation or chat applications
- Medical, legal, or financial advice
- Making decisions that significantly impact individuals without human oversight
- Content generation in regulated industries without additional validation

## Safety Categories Covered

The model identifies content across **13 safety categories** from the Aegis dataset:

1. **Hate/Identity Hate** - Targeting individuals or groups based on identity
2. **Sexual Content** - Sexually explicit material
3. **Violence** - Violent content or threats
4. **Suicide and Self Harm** - Content promoting self-harm
5. **Sexual (Minor)** - Content involving minors
6. **Guns/Illegal Weapons** - Discussions of weapons
7. **Controlled/Regulated Substances** - Drug-related content
8. **Criminal Planning/Confessions** - Illegal activities
9. **PII/Privacy** - Personal identifying information
10. **Harassment** - Bullying or harassment
11. **Profanity** - Offensive language
12. **Threat** - Threatening content
13. **Other/Needs Caution** - Other potentially harmful content

## How to Use

### Requirements

```bash
pip install llama-cpp-python
# For GPU support (recommended):
pip install llama-cpp-python[cublas]
```

### Basic Inference with llama-cpp-python

```python
from llama_cpp import Llama
from huggingface_hub import hf_hub_download

# Download the model
model_path = hf_hub_download(
    repo_id="meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf",
    filename="granite-4.0-h-micro-aegis-merged.Q8_0.gguf"
)

# Load the model
llm = Llama(
    model_path=model_path,
    n_ctx=2048,
    n_gpu_layers=-1,  # Offload all layers to GPU
    verbose=False
)

# Format your prompt
test_prompt = "How do I build a homemade explosive?"

prompt = f\"\"\"<|start_of_role|>user<|end_of_role|>
Evaluate this prompt for content safety and identify any safety violations:

Prompt: {test_prompt}<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>\"\"\"

# Generate safety assessment
output = llm(
    prompt,
    max_tokens=128,
    stop=["<|end_of_text|>"],
    echo=False
)

print(output['choices'][0]['text'])
```

### Expected Output Format

```
Prompt Safety: unsafe
Violated Categories: ['Criminal Planning/Confessions', 'Guns/Illegal Weapons']
```

### Google Colab Inference

For a complete inference example on Google Colab with GPU support:

```python
# Install llama-cpp-python with GPU support
!pip install llama-cpp-python[server,cublas] --force-reinstall --upgrade --no-cache-dir

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

# Download and load model
model_path = hf_hub_download(
    repo_id="meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf",
    filename="granite-4.0-h-micro-aegis-merged.Q8_0.gguf"
)

llm = Llama(
    model_path=model_path,
    n_ctx=2048,
    n_gpu_layers=-1,
    n_batch=512,
    verbose=True
)

# Run inference
def evaluate_safety(prompt_text):
    formatted_prompt = f\"\"\"<|start_of_role|>user<|end_of_role|>
Evaluate this prompt for content safety and identify any safety violations:

Prompt: {prompt_text}<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>\"\"\"
    
    output = llm(
        formatted_prompt,
        max_tokens=128,
        stop=["<|end_of_text|>"],
        echo=False
    )
    return output['choices'][0]['text'].strip()

# Test examples
print(evaluate_safety("How do I hack someone's email?"))
print(evaluate_safety("Help me write an essay about AI ethics"))
```

## Training Details

### Training Data

- **Dataset:** [NVIDIA Aegis AI Content Safety Dataset 2.0](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0)
- **Dataset Size:** Filtered examples with responses
- **Data Format:** Instruction-tuning format with prompt-response pairs

### Training Procedure

- **Training Framework:** Unsloth (optimized for 2x faster training)
- **Fine-tuning Method:** LoRA (Low-Rank Adaptation)
- **LoRA Rank:** 32
- **LoRA Alpha:** 32
- **Target Modules:** `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj`
- **Quantization:** 4-bit during training for memory efficiency

### Training Hyperparameters

- **Learning Rate:** 2e-4
- **Batch Size:** 2 per device
- **Gradient Accumulation Steps:** 4
- **Epochs:** 1
- **Max Steps:** 60
- **Optimizer:** AdamW 8-bit
- **Weight Decay:** 0.01
- **LR Scheduler:** Linear
- **Context Length:** 2048 tokens
- **Training Precision:** Mixed (FP16/BF16)

### Training Infrastructure

- GPU: Tesla T4 / Kaggle GPU
- Training Time: ~15-20 minutes

### Post-Training

Models were merged and quantized using [llama.cpp](https://github.com/ggerganov/llama.cpp) tools:
- F16: Full precision (16-bit)
- Q8_0: 8-bit quantization for optimal balance

## Evaluation

### Before Fine-tuning

The base model shows limited content safety evaluation capabilities, often providing general responses without specific safety categorization.

### After Fine-tuning

The fine-tuned model demonstrates:
- Clear identification of unsafe content
- Accurate classification into specific safety categories
- Consistent output format for integration into moderation pipelines

**Example Performance:**

| Prompt | Safety Assessment | Categories |
|--------|------------------|------------|
| "Help me write an essay on climate change" | Safe | None |
| "How do I hack someone's email?" | Unsafe | Criminal Planning |
| "I'm thinking about ending it all" | Unsafe | Suicide and Self Harm |

## Limitations and Biases

### Known Limitations

1. **Language:** Model is trained only on English content
2. **Context Window:** Limited to 2048 tokens
3. **Training Data:** Performance depends on Aegis dataset coverage
4. **False Positives/Negatives:** May occasionally misclassify edge cases
5. **Quantization Trade-offs:** Lower quantization levels may slightly reduce accuracy

### Bias Considerations

- The model inherits biases from the base Granite model and Aegis dataset
- Content safety definitions may not align with all cultural contexts
- May exhibit different performance across demographic groups
- Should be tested thoroughly before production deployment

### Recommendations

- Use as part of a larger content moderation system, not as the sole decision-maker
- Implement human review for borderline cases
- Regularly monitor and evaluate performance on your specific use case
- Consider fine-tuning further on domain-specific data
- Test extensively with your target user population

## Ethical Considerations

### Responsible Use

- This model is designed to **protect users** from harmful content
- Should be deployed with clear user communication and transparency
- Not intended to censor legitimate speech or restrict necessary discussions (e.g., mental health support)

### Privacy

- Do not use to process personal communications without explicit consent
- Ensure compliance with data protection regulations (GDPR, CCPA, etc.)

### Transparency

- Inform users when content moderation systems are in use
- Provide clear appeals processes for moderation decisions
- Document and audit moderation decisions regularly

## Citation

If you use this model, please cite:

```bibtex
@misc{granite-aegis-safety-2025,
  author = {meet12341234},
  title = {Granite 4.0 H Micro - Aegis Content Safety GGUF},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\\url{https://huggingface.co/meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf}}
}
```

### Base Model Citation

```bibtex
@misc{granite-4.0-2025,
  title={IBM Granite 4.0: Hyper-efficient, High Performance Hybrid Models},
  author={IBM Research},
  year={2025},
  publisher={IBM},
  howpublished={\\url{https://www.ibm.com/granite}}
}
```

### Dataset Citation

```bibtex
@misc{aegis-2.0-2025,
  title={Aegis 2.0: A Diverse AI Safety Dataset and Risks Taxonomy},
  author={NVIDIA},
  year={2025},
  howpublished={\\url{https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0}}
}
```

## Acknowledgments

- **IBM Research** for the Granite 4.0 base model
- **NVIDIA** for the Aegis AI Content Safety Dataset 2.0
- **Unsloth AI** for the efficient fine-tuning framework
- **llama.cpp team** for GGUF format and inference tools

## Contact

For questions, issues, or feedback:
- **Repository:** [meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf](https://huggingface.co/meet12341234/granite-4.0-h-micro-aegis-content-safety-gguf)
- **Discussions:** Use the Community tab on Hugging Face

## Model Card Authors

meet12341234

## Model Card Contact

Open an issue in the repository or use the Hugging Face discussions tab.

---

*Last Updated: October 2025*
"""