This is a MXFP4 quant of Qwen3-Next-80B-A3B-Thinking

Welcome to the bleeding edge.

I must to point out that this a experimental release.

Say it after me, EXPERIMENTAL.

This has been made possible cause of the excellent work done by pwilkin and others.

He has a development branch of llama.cpp for Qwen3-Next.

It has not yet been released officially, and things are moving quite fast.

For the time being, as of 2025-10-24 I got the source code from his fork and compiled in order to be able to generate the GGUF's, from here: https://github.com/pwilkin/llama.cpp/tree/qwen3_next

This GGUF will run only with it.

If you cannot compile it yourself, I have made a Windows version with Vulkan support you can find here:

llama-qwen3-next-5edfe78-bin-win-vulkan-x64.zip

I should state that this may trigger false positives from your AV, this has NO virus, I compiled it on a my Windows 11 PC that i check regularly for viruses.

If you don't trust strangers giving out binaries, you can try compiling it for yourself, in order to be sure.

https://www.virustotal.com/gui/file/35a134a8977488ff6b82ce3f2b5df20da742ec212859a5e0c30813c55519f4f0

When the support for Qwen3-Next officially releases on mainline llama.cpp, I will see if these files will need a new updated quantization, and update if needed.

Downloads last month
239
GGUF
Model size
80B params
Architecture
qwen3next
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for noctrex/Qwen3-Next-80B-A3B-Thinking-MXFP4_MOE-GGUF

Quantized
(4)
this model

Collection including noctrex/Qwen3-Next-80B-A3B-Thinking-MXFP4_MOE-GGUF