Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resource?
This is the official Hugging Face repository for the paper: "Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resource?".
Our extensive research (250+ MoE trainings at 2B & 7B) provides strong evidence: MoE architectures with optimized backbones and activation rates consistently deliver superior performance over dense counterparts on both upstream & downstream tasks, even with identical resources.
The checkpoints are released in this repository.
More details: 
Paper: https://www.arxiv.org/abs/2506.12119 
Github page: https://kamanphoebe.github.io/moe-surpass-dense.github.io/
Citation
@misc{li2025mixtureofexpertssurpassdensellms,
    title  = {Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources?}, 
    author = {Houyi Li and Ka Man Lo and Ziqi Wang and Zili Wang and Wenzhen Zheng and Shuigeng Zhou and Xiangyu Zhang and Daxin Jiang},
    year   = {2025},
    eprint = {2506.12119},
    archivePrefix = {arXiv},
    primaryClass  = {cs.CL},
    url    = {https://arxiv.org/abs/2506.12119}, 
}
	Inference Providers
	NEW
	
	
	This model isn't deployed by any Inference Provider.
	๐
			
		Ask for provider support