File size: 3,028 Bytes
7ba860d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
---
license: apache-2.0
tags:
- gguf
- qwen
- qwen3-coder
- qwen3-coder-30b-q2
- qwen3-coder-30b-q2_k
- qwen3-coder-30b-q2_k-gguf
- llama.cpp
- quantized
- text-generation
- chat
- reasoning
- agent
- multilingual
base_model: Qwen/Qwen3-Coder-30B-A3B-Instruct
author: geoffmunn
---
# Qwen3-Coder-30B-A3B-Instruct-f16:Q2_K
Quantized version of [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) at **Q2_K** level, derived from **f32** base weights.
## Model Info
- **Format**: GGUF (for llama.cpp and compatible runtimes)
- **Size**: 11.30 GB
- **Precision**: Q2_K
- **Base Model**: [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct)
- **Conversion Tool**: [llama.cpp](https://github.com/ggerganov/llama.cpp)
## Quality & Performance
| Metric | Value |
|--------------------|-------------------------------------------------------|
| **Quality** | Minimal |
| **Speed** | ⚡ Fast |
| **RAM Required** | ~20.5 GB |
| **Recommendation** | Minimal quality; only for extreme memory constraints. |
## Prompt Template (ChatML)
This model uses the **ChatML** format used by Qwen:
```text
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
```
Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
## Generation Parameters
Recommended defaults:
| Parameter | Value |
|----------------|-------|
| Temperature | 0.6 |
| Top-P | 0.95 |
| Top-K | 20 |
| Min-P | 0.0 |
| Repeat Penalty | 1.1 |
Stop sequences: `<|im_end|>`, `<|im_start|>`
## 🖥️ CLI Example Using Ollama or TGI Server
Here’s how you can query this model via API using `curl` and `jq`. Replace the endpoint with your local server.
```bash
curl http://localhost:11434/api/generate -s -N -d '{
"model": "hf.co/geoffmunn/Qwen3-Coder-30B-A3B-Instruct-f32:Q2_K",
"prompt": "Respond exactly as follows: Summarize what a neural network is in one sentence.",
"temperature": 0.3,
"top_p": 0.95,
"top_k": 20,
"min_p": 0.0,
"repeat_penalty": 1.1,
"stream": false
}' | jq -r '.response'
```
🎯 **Why this works well**:
- The prompt is meaningful and achievable for this model size.
- Temperature tuned appropriately: lower for factual (`0.5`), higher for creative (`0.7`).
- Uses `jq` to extract clean output.
## Verification
Check integrity:
```bash
sha256sum -c ../SHA256SUMS.txt
```
## Usage
Compatible with:
- [LM Studio](https://lmstudio.ai) – local AI model runner
- [OpenWebUI](https://openwebui.com) – self-hosted AI interface
- [GPT4All](https://gpt4all.io) – private, offline AI chatbot
- Directly via \llama.cpp\
## License
Apache 2.0 – see base model for full terms.
|