Beginning Testing: Very Promising!
I just got done testing a couple of coding prompts and this model has done really well, way better than I originally thought it would do! For example, the original GLM 4.5 Air MXFP4 GGUF that I tried a long time ago (lol), could not consistently one-shot the classic DOOM game in a single HTML file or using Pygame. However, using this variant of the model, I am able to consistently get what I want in one-shot! I am genuinely baffled at how a model much smaller due to the REAP process, has been able to outdo the original MXFP4 variant.
Thank you for making this quant for me to test! I hope that within enough time, you can do the same for GLM 4.6 (preferably the 40% reduction REAP one) and the upcoming GLM 4.6 Air REAP (when that gets finalized after release of course).
All in due time, I of course understand that you have your own life and limits.
Once again, thank you.
i am testing it too. anyhave did you tested the qwen3 80B thinking model its in q8 almost similar to the sonnet 4.5
https://github.com/cturan/llama.cpp
https://huggingface.co/lefromage/Qwen3-Next-80B-A3B-Instruct-GGUF
I'll try to quant this also today maybe
I'll try to quant this also today maybe
Any chance for autoround quants? I saw autoround had experimental support for MXFP4, so I wonder if MXFP4 ggufs are possible with autoround.
Yeah I have seen Intel's very interesting AutoRound quants. I don't think that the llama.cpp quantization supports them yet. I'll look into it.