Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resource?

This is the official Hugging Face repository for the paper: "Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resource?".

Our extensive research (250+ MoE trainings at 2B & 7B) provides strong evidence: MoE architectures with optimized backbones and activation rates consistently deliver superior performance over dense counterparts on both upstream & downstream tasks, even with identical resources.

The checkpoints are released in this repository.

More details:
Paper: https://www.arxiv.org/abs/2506.12119
Github page: https://kamanphoebe.github.io/moe-surpass-dense.github.io/

Citation

@misc{li2025mixtureofexpertssurpassdensellms,
    title  = {Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources?}, 
    author = {Houyi Li and Ka Man Lo and Ziqi Wang and Zili Wang and Wenzhen Zheng and Shuigeng Zhou and Xiangyu Zhang and Daxin Jiang},
    year   = {2025},
    eprint = {2506.12119},
    archivePrefix = {arXiv},
    primaryClass  = {cs.CL},
    url    = {https://arxiv.org/abs/2506.12119}, 
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support