MoEs Are Stronger than You Think: Hyper-Parallel Inference Scaling with RoE

Zibakhsh, Soheil; Samragh, Mohammad; Nishu, Kumari; Hannah, Lauren; Kundu, Arnav; Cho, Minsik

Computer Science > Artificial Intelligence

arXiv:2509.17238 (cs)

[Submitted on 21 Sep 2025 ]

Title: MoEs Are Stronger than You Think: Hyper-Parallel Inference Scaling with RoE

Title: MoEs 比你想象的更强：通过 RoE 的超并行推理扩展

Authors:Soheil Zibakhsh, Mohammad Samragh, Kumari Nishu, Lauren Hannah, Arnav Kundu, Minsik Cho

Abstract: The generation quality of large language models (LLMs) is often improved by utilizing inference-time sequence-level scaling methods (e.g., Chain-of-Thought). We introduce hyper-parallel scaling, a complementary framework that improves prediction quality at the token level. Hyper-parallel scaling computes and aggregates multiple output proposals for a single token from the model. We implement this concept in Mixture-of-Experts (MoE) models, which we refer to as Roster of Experts (RoE). RoE is a training-free inference algorithm that turns a single MoE into a dynamic ensemble of MoEs. RoE injects controlled stochasticity into the expert routing mechanism, enabling it to sample multiple diverse experts for each token and aggregate their outputs for a more accurate final prediction.To overcome the computational cost, we introduce an efficient batching strategy and a specialized KV-caching mechanism that minimizes compute and memory overhead. For example, RoE enables a 7B MoE model to match the performance of a 10.5B MoE model while using 30% less compute for inference. These gains are achieved without any fine-tuning of model parameters.

Abstract: 大型语言模型（LLMs）的生成质量通常通过利用推理时的序列级缩放方法（例如，思维链）来提高。我们引入了超并行缩放，这是一种补充框架，在标记级别上提高预测质量。超并行缩放从模型中计算并聚合单个标记的多个输出建议。我们将这一概念实现于专家混合（MoE）模型中，我们称之为专家名单（RoE）。RoE是一种无需训练的推理算法，它将单个MoE转变为动态的MoE集合。RoE在专家路由机制中注入受控的随机性，使其能够为每个标记采样多个不同的专家，并聚合它们的输出以获得更准确的最终预测。为了克服计算成本，我们引入了一种高效的批处理策略和一种专门的KV缓存机制，以最小化计算和内存开销。例如，RoE使一个7B的MoE模型在推理时使用30%更少的计算量就能达到10.5B MoE模型的性能。这些改进是在不微调模型参数的情况下实现的。

Subjects:	Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Emerging Technologies (cs.ET); Machine Learning (cs.LG)
Cite as:	arXiv:2509.17238 [cs.AI]
	(or arXiv:2509.17238v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2509.17238

Submission history

From: Soheil Zibakhsh Shabgahi [view email]
[v1] Sun, 21 Sep 2025 21:05:29 UTC (516 KB)

Computer Science > Artificial Intelligence

Title: MoEs Are Stronger than You Think: Hyper-Parallel Inference Scaling with RoE

Title: MoEs 比你想象的更强：通过 RoE 的超并行推理扩展

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title: MoEs Are Stronger than You Think: Hyper-Parallel Inference Scaling with RoE Show Chinese title

Title: MoEs 比你想象的更强：通过 RoE 的超并行推理扩展

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Title: MoEs Are Stronger than You Think: Hyper-Parallel Inference Scaling with RoE