Tested quant locally MMLU-Pro bench "computer science" section
Hello. Thank you for this release.
Compared it with Qwen3 25B A3B REAP by cerebras, running MMLU-Pro, "computer science" section.
aquif Q6_K quant, with 8 experts activated per token, 4000 tokens context limit: 82.2% correct answers
aquif Q6_K quant, with 16 experts activated per token, 8192 tokens context limit: 77.6% correct answers
REAP iQ5K_M quant, with 16 experts activated per token, 8192 tokens context limit: 69.0% correct answers
Actually not so representative comparison both, cause cerebras/REAP is pruned and cause it was with less bits per parameter.
Wonder why increasing activated experts actually dropped quality.
Also, noticed tendency for repetitive outputs, when model begins to either loop through few paragraphs or even few tokens over and over - this effect was observed with both models. I didn't set any repetition penalty parameters.
Wonder why increasing activated experts actually dropped quality.
that's actually common in MoE models.
when you activate more than the default amount of experts per token, you could be routing the input to non-specialized experts that dillute the quality of the output.
Also, noticed tendency for repetitive outputs, when model begins to either loop through few paragraphs or even few tokens over and over - this effect was observed with both models
yeah, we didn't improve that from the base model that much. we plan on improving repetitions with future model releases. stay tuned!