Beginning Testing: Very Promising!

by InfernalDread - opened Oct 23

Oct 23

I just got done testing a couple of coding prompts and this model has done really well, way better than I originally thought it would do! For example, the original GLM 4.5 Air MXFP4 GGUF that I tried a long time ago (lol), could not consistently one-shot the classic DOOM game in a single HTML file or using Pygame. However, using this variant of the model, I am able to consistently get what I want in one-shot! I am genuinely baffled at how a model much smaller due to the REAP process, has been able to outdo the original MXFP4 variant.

Thank you for making this quant for me to test! I hope that within enough time, you can do the same for GLM 4.6 (preferably the 40% reduction REAP one) and the upcoming GLM 4.6 Air REAP (when that gets finalized after release of course).

All in due time, I of course understand that you have your own life and limits.

Once again, thank you.

gopi87

Oct 24

i am testing it too. anyhave did you tested the qwen3 80B thinking model its in q8 almost similar to the sonnet 4.5

https://github.com/cturan/llama.cpp

https://huggingface.co/lefromage/Qwen3-Next-80B-A3B-Instruct-GGUF

noctrex

Owner Oct 24

I'll try to quant this also today maybe

lemon07r

Oct 26

I'll try to quant this also today maybe

Any chance for autoround quants? I saw autoround had experimental support for MXFP4, so I wonder if MXFP4 ggufs are possible with autoround.

noctrex

Owner Oct 26

Yeah I have seen Intel's very interesting AutoRound quants. I don't think that the llama.cpp quantization supports them yet. I'll look into it.

noctrex changed discussion status to closed Nov 4

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment