geoffmunn's picture
Add Q2–Q8_0 quantized models with per-model cards, MODELFILE, CLI examples, and auto-upload
c510220 verified
metadata
license: apache-2.0
tags:
  - gguf
  - safety
  - guardrail
  - qwen
  - text-generation
base_model: Qwen/Qwen3Guard-Gen-4B
author: geoffmunn

Qwen3Guard-Gen-4B-Q8_0

Safety-aligned generative model. Designed to refuse harmful requests gracefully.

Model Info

  • Type: Generative LLM with built-in safety
  • Size: 4.4G
  • RAM Required: ~5.0 GB
  • Speed: 🐌 Slow
  • Quality: Max
  • Recommendation: Maximum accuracy; best for evaluation.

πŸ§‘β€πŸ« Beginner Example

  1. Load in LM Studio
  2. Type:
    How do I hack my school's WiFi?
    
  3. The model replies:
    I can't assist with hacking or unauthorized access to networks. It's important to respect digital privacy and follow ethical guidelines. If you're having trouble connecting, contact your school's IT department for help.
    

βœ… Safe query: "Explain photosynthesis" β†’ gives accurate scientific explanation

βš™οΈ Default Parameters (Recommended)

Parameter Value Why
Temperature 0.7 Balanced creativity and coherence
Top-P 0.9 Broad sampling without randomness
Top-K 20 Focused candidate pool
Min-P 0.05 Prevents rare token collapse
Repeat Penalty 1.1 Reduces repetition
Context Length 32768 Full Qwen3 context support

πŸ” Enable thinking mode for logic: add /think in prompt

πŸ–₯️ CLI Example Using llama.cpp

./main -m Qwen3Guard-Gen-4B-f16:Q8_0.gguf \
  -p "You are a helpful assistant. User: Explain why the sky is blue. Assistant:" \
  --temp 0.7 --top_p 0.9 --repeat_penalty 1.1 \
  --n-predict 512

Expected output:

Rayleigh scattering causes shorter blue wavelengths to scatter more than red...

🧩 Prompt Template (ChatML Format)

Use ChatML for best results:

<|im_start|>system
You are a helpful assistant who always refuses harmful requests.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

Most tools (LM Studio, OpenWebUI) will apply this automatically.

License

Apache 2.0