This is my (first) attempt at quantizing this Qwen3 model (Qwen/Qwen3-14B) using auto-round, like so:

auto-round --model "Qwen/Qwen3-14B" --scheme "W4A16" --format "auto_gptq" --output_dir "./Quantized" --model_dtype fp16

The primary purpose of these quants are to make them run on consumer AMD GPUs, which they do work, in my case.

Downloads last month
24
Safetensors
Model size
2B params
Tensor type
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pramjana/Qwen3-14B-4bit-GPTQ

Finetuned
Qwen/Qwen3-14B
Quantized
(135)
this model