metadata
license: apache-2.0
tags:
- gguf
- qwen
- qwen3-coder
- qwen3-coder-30b-q2
- qwen3-coder-30b-q2_k
- qwen3-coder-30b-q2_k-gguf
- llama.cpp
- quantized
- text-generation
- chat
- reasoning
- agent
- multilingual
base_model: Qwen/Qwen3-Coder-30B-A3B-Instruct
author: geoffmunn
Qwen3-Coder-30B-A3B-Instruct-f16:Q2_K
Quantized version of Qwen/Qwen3-Coder-30B-A3B-Instruct at Q2_K level, derived from f32 base weights.
Model Info
- Format: GGUF (for llama.cpp and compatible runtimes)
- Size: 11.30 GB
- Precision: Q2_K
- Base Model: Qwen/Qwen3-Coder-30B-A3B-Instruct
- Conversion Tool: llama.cpp
Quality & Performance
| Metric | Value |
|---|---|
| Quality | Minimal |
| Speed | ⚡ Fast |
| RAM Required | ~20.5 GB |
| Recommendation | Minimal quality; only for extreme memory constraints. |
Prompt Template (ChatML)
This model uses the ChatML format used by Qwen:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
Generation Parameters
Recommended defaults:
| Parameter | Value |
|---|---|
| Temperature | 0.6 |
| Top-P | 0.95 |
| Top-K | 20 |
| Min-P | 0.0 |
| Repeat Penalty | 1.1 |
Stop sequences: <|im_end|>, <|im_start|>
🖥️ CLI Example Using Ollama or TGI Server
Here’s how you can query this model via API using curl and jq. Replace the endpoint with your local server.
curl http://localhost:11434/api/generate -s -N -d '{
"model": "hf.co/geoffmunn/Qwen3-Coder-30B-A3B-Instruct-f32:Q2_K",
"prompt": "Respond exactly as follows: Summarize what a neural network is in one sentence.",
"temperature": 0.3,
"top_p": 0.95,
"top_k": 20,
"min_p": 0.0,
"repeat_penalty": 1.1,
"stream": false
}' | jq -r '.response'
🎯 Why this works well:
- The prompt is meaningful and achievable for this model size.
- Temperature tuned appropriately: lower for factual (
0.5), higher for creative (0.7). - Uses
jqto extract clean output.
Verification
Check integrity:
sha256sum -c ../SHA256SUMS.txt
Usage
Compatible with:
- LM Studio – local AI model runner
- OpenWebUI – self-hosted AI interface
- GPT4All – private, offline AI chatbot
- Directly via \llama.cpp\
License
Apache 2.0 – see base model for full terms.