EOQ Compressed Models
Collection
EOQ (Entropy-Optimal Quantization) compressed models. Mixed-bit allocation + rANS entropy coding. Smaller download, dequant at load time. • 4 items • Updated
15.6 GB (was 69.3 GB!) | PPL 5.36 (+0.17) | 4.44x compression | 30.1 tok/s
MoE (35B total, 3B active). Smaller AND better quality than v1!
| Metric | FP16 | v1 (Q5) | v3 |
|---|---|---|---|
| Download | 69.3 GB | 35.2 GB | 15.6 GB |
| PPL | 5.19 | 5.39 | 5.36 |
| Compression | 1.0x | 2.0x | 4.44x |
| tok/s | 30.1 | 30.2 | 30.1 |
| GPU dequant | --- | ~400s | 8.8s |
54% lower MSE than absmax at Q3.
Protects important channels. Combined with PolarQuant = 93% less error.
| Tensor Type | Bits |
|---|---|
| Expert gate_up_proj | Q3 |
| Expert down_proj | Q4 |
| Attention Q/K/V | Q5 |
| Attention O | Q6 |
| MoE router, norms | FP16 |
from huggingface_hub import snapshot_download
import sys
local = snapshot_download("caiovicentino1/Qwen3.5-35B-A3B-EOQ-v3")
sys.path.insert(0, local)
from eoq_loader import load_eoq_model
model, tokenizer = load_eoq_model("caiovicentino1/Qwen3.5-35B-A3B-EOQ-v3")