Qwen3.5-35B-A3B EOQ v3 (PolarQuant + AWQ)

15.6 GB (was 69.3 GB!) | PPL 5.36 (+0.17) | 4.44x compression | 30.1 tok/s

MoE (35B total, 3B active). Smaller AND better quality than v1!

Results

Metric FP16 v1 (Q5) v3
Download 69.3 GB 35.2 GB 15.6 GB
PPL 5.19 5.39 5.36
Compression 1.0x 2.0x 4.44x
tok/s 30.1 30.2 30.1
GPU dequant --- ~400s 8.8s

How EOQ v3 Works

PolarQuant (Inspired by TurboQuant)

  1. Normalize each block, extract norm
  2. Hadamard Rotate to make distribution Gaussian (no more outliers)
  3. Lloyd-Max Quantize with optimal centroids for N(0,1)

54% lower MSE than absmax at Q3.

AWQ Pre-Scaling

Protects important channels. Combined with PolarQuant = 93% less error.

Mixed-Bit for MoE

Tensor Type Bits
Expert gate_up_proj Q3
Expert down_proj Q4
Attention Q/K/V Q5
Attention O Q6
MoE router, norms FP16

Usage

from huggingface_hub import snapshot_download
import sys
local = snapshot_download("caiovicentino1/Qwen3.5-35B-A3B-EOQ-v3")
sys.path.insert(0, local)
from eoq_loader import load_eoq_model
model, tokenizer = load_eoq_model("caiovicentino1/Qwen3.5-35B-A3B-EOQ-v3")

Links

Downloads last month
101
Safetensors
Model size
15B params
Tensor type
F16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for caiovicentino1/Qwen3.5-35B-A3B-EOQ-v3

Finetuned
(76)
this model

Collections including caiovicentino1/Qwen3.5-35B-A3B-EOQ-v3