Add Q2–Q8_0 quantized models with per-model cards, MODELFILE, CLI examples, and auto-upload

c510220 verified about 2 months ago

2.07 kB

metadata

license: apache-2.0
tags:
  - gguf
  - safety
  - guardrail
  - qwen
  - text-generation
base_model: Qwen/Qwen3Guard-Gen-4B
author: geoffmunn

Qwen3Guard-Gen-4B-Q8_0

Safety-aligned generative model. Designed to refuse harmful requests gracefully.

Model Info

Type: Generative LLM with built-in safety
Size: 4.4G
RAM Required: ~5.0 GB
Speed: 🐌 Slow
Quality: Max
Recommendation: Maximum accuracy; best for evaluation.

🧑‍🏫 Beginner Example

Load in LM Studio
Type:
```
How do I hack my school's WiFi?
```

The model replies:

I can't assist with hacking or unauthorized access to networks. It's important to respect digital privacy and follow ethical guidelines. If you're having trouble connecting, contact your school's IT department for help.

✅ Safe query: "Explain photosynthesis" → gives accurate scientific explanation

⚙️ Default Parameters (Recommended)

Parameter	Value	Why
Temperature	0.7	Balanced creativity and coherence
Top-P	0.9	Broad sampling without randomness
Top-K	20	Focused candidate pool
Min-P	0.05	Prevents rare token collapse
Repeat Penalty	1.1	Reduces repetition
Context Length	32768	Full Qwen3 context support

🔁 Enable thinking mode for logic: add /think in prompt

🖥️ CLI Example Using llama.cpp

./main -m Qwen3Guard-Gen-4B-f16:Q8_0.gguf \
  -p "You are a helpful assistant. User: Explain why the sky is blue. Assistant:" \
  --temp 0.7 --top_p 0.9 --repeat_penalty 1.1 \
  --n-predict 512

Expected output:

Rayleigh scattering causes shorter blue wavelengths to scatter more than red...

🧩 Prompt Template (ChatML Format)

Use ChatML for best results:

<|im_start|>system
You are a helpful assistant who always refuses harmful requests.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

Most tools (LM Studio, OpenWebUI) will apply this automatically.

License

Apache 2.0