This is Q4_K_M gguf quant of AesSedai/GLM-4.6-REAP-266B-A32B

What Is This?

AesSedai/GLM-4.6-REAP-266B-A32B was created using REAP (Router-weighted Expert Activation Pruning), a novel expert pruning method that selectively removes redundant experts while preserving the router's independent control over remaining experts.

See the GLM-4.5-Air version by Cerebras for more details cerebras/GLM-4.5-Air-REAP-82B-A12B

The MTP tensors were not included in this quant (though llama.cpp hasn't implemented this feature anyway)

** Imatrix **

GLM-4.6-REAP-266B-A32B-imatrix.dat

Original Model Card for GLM-4.6-REAP

Note: currently non-functional because of missing mtp.safetensors file and entry in model.safetensors.index.json

Forked from https://github.com/CerebrasResearch/reap to https://github.com/AesSedai/reap to hack in GLM-4.6 support.

Produced with:

bash experiments/pruning-cli.sh 0,1,2,3,4,5,6,7 zai-org/GLM-4.6 reap 42 0.25 theblackcat102/evol-codealpaca-v1 true true true false false
Downloads last month
74
GGUF
Model size
269B params
Architecture
glm4moe
Hardware compatibility
Log In to view the estimation
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for gghfez/GLM-4.6-REAP-266B-A32B-Q4_K

Quantized
(2)
this model