Qwen3 Guard: A Fine-tuned Safety Classifier

This model is a fine-tuned version of Qwen/Qwen3-4B-Thinking-2507 designed to serve as a comprehensive safety and hazard classifier. It was developed to combine the functionalities of leading safety models like Google's ShieldGemma and Meta's Llama-Guard into a single, efficient tool.

Model Description

Qwen3 Guard is an instruction-tuned model that classifies user prompts as either "safe" or "unsafe". What makes it unique is its ability to output classifications in two popular formats simultaneously, providing both a simple verdict and detailed harm categories.

This was achieved using Parameter-Efficient Fine-Tuning (PEFT), specifically QLoRA (4-bit quantization with Low-Rank Adaptation), which allowed for efficient training without compromising the base model's capabilities.

Key Features

Dual-Format Output: Provides classifications in both LG4 style (safe/unsafe + categories) and Shield style (Yes/No).
High Accuracy: Achieves excellent performance on a balanced test set, demonstrating a strong ability to distinguish between safe and harmful content.
Efficient & Accessible: Fine-tuned using QLoRA, making it a lightweight and accessible solution that can be run on consumer-grade hardware.

Intended Uses & Limitations

This model is ideal for academic research, content moderation experiments, and as a demonstration of building powerful, specialized guard models from general-purpose LLMs.

Limitations

This model was trained on a high-quality, but synthetic, dataset. It is not intended for production use in safety-critical applications without further rigorous testing, validation, and fine-tuning on a diverse, real-world dataset that reflects your specific use case.

Training Procedure

The model was fine-tuned for 2.0 epochs on a custom, balanced synthetic dataset designed to teach the nuances of both safe and unsafe content. The training process was optimized for performance and memory efficiency on an NVIDIA A100 GPU.

Training Hyperparameters

Framework: Transformers
Base Model: Qwen/Qwen3-4B-Thinking-2507
Quantization: 4-bit (nf4)
LoRA r: 16
LoRA alpha: 32
Learning Rate: 0.0002
Epochs: 2.0
Effective Batch Size: 32

Downloads last month: 2

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support