Qwen3 Guard: A Fine-tuned Safety Classifier
This model is a fine-tuned version of Qwen/Qwen3-4B-Thinking-2507 designed to serve as a comprehensive safety and hazard classifier. It was developed to combine the functionalities of leading safety models like Google's ShieldGemma and Meta's Llama-Guard into a single, efficient tool.
Model Description
Qwen3 Guard is an instruction-tuned model that classifies user prompts as either "safe" or "unsafe". What makes it unique is its ability to output classifications in two popular formats simultaneously, providing both a simple verdict and detailed harm categories.
This was achieved using Parameter-Efficient Fine-Tuning (PEFT), specifically QLoRA (4-bit quantization with Low-Rank Adaptation), which allowed for efficient training without compromising the base model's capabilities.
Key Features
- Dual-Format Output: Provides classifications in both LG4 style (
safe/unsafe+ categories) and Shield style (Yes/No). - High Accuracy: Achieves excellent performance on a balanced test set, demonstrating a strong ability to distinguish between safe and harmful content.
- Efficient & Accessible: Fine-tuned using QLoRA, making it a lightweight and accessible solution that can be run on consumer-grade hardware.
Intended Uses & Limitations
This model is ideal for academic research, content moderation experiments, and as a demonstration of building powerful, specialized guard models from general-purpose LLMs.
Limitations
This model was trained on a high-quality, but synthetic, dataset. It is not intended for production use in safety-critical applications without further rigorous testing, validation, and fine-tuning on a diverse, real-world dataset that reflects your specific use case.
Training Procedure
The model was fine-tuned for 2.0 epochs on a custom, balanced synthetic dataset designed to teach the nuances of both safe and unsafe content. The training process was optimized for performance and memory efficiency on an NVIDIA A100 GPU.
Training Hyperparameters
- Framework:
Transformers - Base Model:
Qwen/Qwen3-4B-Thinking-2507 - Quantization: 4-bit (
nf4) - LoRA
r:16 - LoRA
alpha:32 - Learning Rate:
0.0002 - Epochs:
2.0 - Effective Batch Size: 32
- Downloads last month
- 2