Tested quant locally MMLU-Pro bench "computer science" section

#7
by CoruNethron - opened

Hello. Thank you for this release.

Compared it with Qwen3 25B A3B REAP by cerebras, running MMLU-Pro, "computer science" section.

aquif Q6_K quant, with 8 experts activated per token, 4000 tokens context limit: 82.2% correct answers
aquif Q6_K quant, with 16 experts activated per token, 8192 tokens context limit: 77.6% correct answers
REAP iQ5K_M quant, with 16 experts activated per token, 8192 tokens context limit: 69.0% correct answers

Actually not so representative comparison both, cause cerebras/REAP is pruned and cause it was with less bits per parameter.

Wonder why increasing activated experts actually dropped quality.

Also, noticed tendency for repetitive outputs, when model begins to either loop through few paragraphs or even few tokens over and over - this effect was observed with both models. I didn't set any repetition penalty parameters.

aquif AI org

Wonder why increasing activated experts actually dropped quality.

that's actually common in MoE models.

when you activate more than the default amount of experts per token, you could be routing the input to non-specialized experts that dillute the quality of the output.

Also, noticed tendency for repetitive outputs, when model begins to either loop through few paragraphs or even few tokens over and over - this effect was observed with both models

yeah, we didn't improve that from the base model that much. we plan on improving repetitions with future model releases. stay tuned!

aquiffoo changed discussion status to closed

Sign up or log in to comment