Qwen3.5-35B-A3B EOQ v3 (PolarQuant + AWQ)

15.6 GB (was 69.3 GB!) | PPL 5.36 (+0.17) | 4.44x compression | 30.1 tok/s

MoE (35B total, 3B active). Smaller AND better quality than v1!

Results

Metric	FP16	v1 (Q5)	v3
Download	69.3 GB	35.2 GB	15.6 GB
PPL	5.19	5.39	5.36
Compression	1.0x	2.0x	4.44x
tok/s	30.1	30.2	30.1
GPU dequant	---	~400s	8.8s

How EOQ v3 Works

PolarQuant (Inspired by TurboQuant)

Normalize each block, extract norm
Hadamard Rotate to make distribution Gaussian (no more outliers)
Lloyd-Max Quantize with optimal centroids for N(0,1)

54% lower MSE than absmax at Q3.

AWQ Pre-Scaling

Protects important channels. Combined with PolarQuant = 93% less error.

Mixed-Bit for MoE

Tensor Type	Bits
Expert gate_up_proj	Q3
Expert down_proj	Q4
Attention Q/K/V	Q5
Attention O	Q6
MoE router, norms	FP16

Usage

from huggingface_hub import snapshot_download
import sys
local = snapshot_download("caiovicentino1/Qwen3.5-35B-A3B-EOQ-v3")
sys.path.insert(0, local)
from eoq_loader import load_eoq_model
model, tokenizer = load_eoq_model("caiovicentino1/Qwen3.5-35B-A3B-EOQ-v3")

Model tree for caiovicentino1/Qwen3.5-35B-A3B-EOQ-v3

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B

Finetuned

(76)

this model

Collections including caiovicentino1/Qwen3.5-35B-A3B-EOQ-v3

caiovicentino1
/

Qwen3.5-35B-A3B-EOQ-v3

Qwen3.5-35B-A3B EOQ v3 (PolarQuant + AWQ)

Results

How EOQ v3 Works

PolarQuant (Inspired by TurboQuant)

AWQ Pre-Scaling

Mixed-Bit for MoE

Usage

Links

Model tree for caiovicentino1/Qwen3.5-35B-A3B-EOQ-v3

Collections including caiovicentino1/Qwen3.5-35B-A3B-EOQ-v3

EOQ Compressed Models

Large Models (27B-35B)

PolarQuant Models