Mixture of Experts in Large Language Models

Zhang, Danyang; Song, Junhao; Bi, Ziqian; Yuan, Yingfang; Wang, Tianyang; Yeong, Joe; Hao, Junfeng

计算机科学 > 机器学习

arXiv:2507.11181 (cs)

[提交于 2025年7月15日 ]

标题：大型语言模型中的专家混合

标题： Mixture of Experts in Large Language Models

Authors:Danyang Zhang, Junhao Song, Ziqian Bi, Yingfang Yuan, Tianyang Wang, Joe Yeong, Junfeng Hao

摘要：本文全面回顾了大型语言模型中的专家混合（MoE）架构，突出了其在保持最小计算开销的同时显著提升模型性能的能力。通过系统分析理论基础、核心架构设计和大型语言模型（LLM）应用，我们研究了专家门控和路由机制、分层和稀疏MoE配置、元学习方法、多模态和多任务学习场景、实际部署案例以及深度学习中的最新进展和挑战。我们的分析识别了MoE的关键优势，包括与等效贝叶斯方法相比更优越的模型容量、改进的任务特定性能以及高效扩展模型容量的能力。我们还强调了确保专家多样性、准确校准和可靠推理聚合的重要性，因为这些对于最大化MoE架构的有效性至关重要。最后，本综述概述了当前研究的局限性、开放挑战和有前景的未来方向，为MoE架构及其应用的持续创新提供了基础。

摘要： This paper presents a comprehensive review of the Mixture-of-Experts (MoE) architecture in large language models, highlighting its ability to significantly enhance model performance while maintaining minimal computational overhead. Through a systematic analysis spanning theoretical foundations, core architectural designs, and large language model (LLM) applications, we examine expert gating and routing mechanisms, hierarchical and sparse MoE configurations, meta-learning approaches, multimodal and multitask learning scenarios, real-world deployment cases, and recent advances and challenges in deep learning. Our analysis identifies key advantages of MoE, including superior model capacity compared to equivalent Bayesian approaches, improved task-specific performance, and the ability to scale model capacity efficiently. We also underscore the importance of ensuring expert diversity, accurate calibration, and reliable inference aggregation, as these are essential for maximizing the effectiveness of MoE architectures. Finally, this review outlines current research limitations, open challenges, and promising future directions, providing a foundation for continued innovation in MoE architecture and its applications.

主题：	机器学习 (cs.LG) ; 人工智能 (cs.AI)
引用方式：	arXiv:2507.11181 [cs.LG]
	(或者 arXiv:2507.11181v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2507.11181

提交历史

来自： Yingfang Yuan [查看电子邮件]
[v1] 星期二， 2025 年 7 月 15 日 10:36:43 UTC (8,123 KB)

计算机科学 > 机器学习

标题：大型语言模型中的专家混合

标题： Mixture of Experts in Large Language Models

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 大型语言模型中的专家混合 显示英文标题

标题： Mixture of Experts in Large Language Models

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：大型语言模型中的专家混合