Qwen3-Coder-30B-A3B-Instruct-f32-GGUF

This is a GGUF-quantized version of the Qwen/Qwen3-Coder-30B-A3B-Instruct language model.

Converted for use with llama.cpp, LM Studio, OpenWebUI, GPT4All, and more.

Why f32?

This model uses FP32 (32-bit floating point) as its base precision. This is unusual for GGUF models because:

FP32 doubles memory usage vs FP16.
Modern LLMs (including Qwen3) are trained in mixed precision and do not benefit from FP32 at inference time.
Only useful for debugging, research, or extreme numerical robustness.

F16 is probably a better choice, but you can use this to compare the difference in outputs (if any).

💡 Key Features of Qwen3-Coder-30B-A3B-Instruct:

Level	Quality	Speed	Size	Recommendation
Q2_K	Minimal	⚡ Fast	11.3 GB	Only on severely memory-constrained systems.
Q3_K_S	Low-Medium	⚡ Fast	13.3 GB	Minimal viability; avoid unless space-limited.
Q3_K_M	Low-Medium	⚡ Fast	14,7 GB	Acceptable for basic interaction.
Q4_K_S	Practical	⚡ Fast	17.5 GB	Good balance for mobile/embedded platforms.
Q4_K_M	Practical	⚡ Fast	18.6 GB	Best overall choice for most users.
Q5_K_S	Max Reasoning	🐢 Medium	21.1 GB	Slight quality gain; good for testing.
Q5_K_M	Max Reasoning	🐢 Medium	21.7 GB	Best quality available. Recommended.
Q6_K	Near-FP16	🐌 Slow	25.1 GB	Diminishing returns. Only if RAM allows.
Q8_0	Lossless*	🐌 Slow	32.5 GB	Maximum fidelity. Ideal for archival.

💡 Recommendations by Use Case

Load this model using:

Each quantized model includes its own README.md and shares a common MODELFILE.

👤 Geoff Munn (@geoffmunn)
🔗 Hugging Face Profile

This is a community conversion for local inference. Not affiliated with Alibaba Cloud or the Qwen team.

GGUF

Model size

31B params

Architecture

qwen3moe

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Base model

Quantized

(98)

this model