Expert Merging in Sparse Mixture of Experts with Nash Bargaining

Nguyen, Dung V.; Nguyen, Anh T.; Nguyen, Minh H.; Nguyen, Luc Q.; Jiang, Shiqi; Fetaya, Ethan; Tran, Linh Duy; Chechik, Gal; Nguyen, Tan M.

计算机科学 > 机器学习

arXiv:2510.16138 (cs)

[提交于 2025年10月17日 ]

标题：稀疏专家混合中的专家合并与纳什谈判

标题： Expert Merging in Sparse Mixture of Experts with Nash Bargaining

Authors:Dung V. Nguyen, Anh T. Nguyen, Minh H. Nguyen, Luc Q. Nguyen, Shiqi Jiang, Ethan Fetaya, Linh Duy Tran, Gal Chechik, Tan M. Nguyen

摘要：现有的稀疏专家混合（SMoE）专家融合策略通常依赖于输入相关或输入无关的专家参数平均，但往往缺乏一种有原则的加权机制。在本工作中，我们通过博弈论的视角重新解释专家融合，揭示了专家之间的合作与竞争动态。基于这一观点，我们引入了专家纳什融合（NAMEx），这是一种将纳什讨价还价纳入融合过程的新框架，使专家之间的协作更加平衡和高效。此外，我们在NAMEx中引入了复杂的动量，以理论上保证收敛的方式加速专家传播。在语言建模、文本分类、图像分类以及数据损坏下的零样本鲁棒性方面的广泛实验表明，NAMEx始终优于竞争方法，并能无缝集成到流行的MoE架构中。最后，我们通过将其应用于大规模系统，包括Qwen1.5-MoE（14B）和DeepSeek-MoE（16B），展示了NAMEx的可扩展性，在零样本和微调设置中均证明了其有效性。

摘要： Existing expert merging strategies for Sparse Mixture of Experts (SMoE) typically rely on input-dependent or input-independent averaging of expert parameters, but often lack a principled weighting mechanism. In this work, we reinterpret expert merging through the lens of game theory, revealing cooperative and competitive dynamics among experts. Based on this perspective, we introduce Nash Merging of Experts (NAMEx), a novel framework that incorporates Nash Bargaining into the merging process, enabling more balanced and efficient collaboration among experts. Additionally, we incorporate complex momentum into NAMEx to accelerate expert propagation with theoretical guarantees for convergence. Extensive experiments across language modelling, text classification, image classification, and zero-shot robustness under data corruption show that NAMEx consistently outperforms competing methods while integrating seamlessly with popular MoE architectures. Finally, we demonstrate NAMEx's scalability by applying it to large-scale systems, including Qwen1.5-MoE (14B) and DeepSeek-MoE (16B), where it proves effective in both zero-shot and fine-tuning settings.

评论：	正文10页。审稿中
主题：	机器学习 (cs.LG) ; 机器学习 (stat.ML)
引用方式：	arXiv:2510.16138 [cs.LG]
	(或者 arXiv:2510.16138v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.2510.16138

提交历史

来自： Nguyen Viet Dung [查看电子邮件]
[v1] 星期五， 2025 年 10 月 17 日 18:23:01 UTC (4,388 KB)

计算机科学 > 机器学习

标题：稀疏专家混合中的专家合并与纳什谈判

标题： Expert Merging in Sparse Mixture of Experts with Nash Bargaining

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 稀疏专家混合中的专家合并与纳什谈判 显示英文标题

标题： Expert Merging in Sparse Mixture of Experts with Nash Bargaining

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：稀疏专家混合中的专家合并与纳什谈判