Load Balancing with Network Latencies via Distributed Gradient Descent

Balseiro, Santiago R.; Mirrokni, Vahab S.; Wydrowski, Bartek

计算机科学 > 分布式、并行与集群计算

arXiv:2504.10693 (cs)

[提交于 2025年4月14日 ]

标题：基于分布式梯度下降的考虑网络时延的负载均衡

标题： Load Balancing with Network Latencies via Distributed Gradient Descent

Authors:Santiago R. Balseiro, Vahab S. Mirrokni, Bartek Wydrowski

摘要：受服务大型语言模型推理请求不断增长的需求驱动，我们研究了具有网络延迟的全球服务系统的分布式负载平衡。我们考虑一个流模型，其中连续的请求流从不同的前端到达，需要被路由到处理速率依赖于工作量的远程后端进行处理。网络延迟可能导致请求的长时间传输和后端的延迟反馈。目标是减少请求的平均延迟，包括网络延迟和后端的服务延迟。我们引入了分布式梯度下降负载平衡（DGD-LB），这是一种概率路由算法，每个前端使用梯度下降动态调整路由概率。我们的算法是分布式的：除了观察其他前端对共享后端的延迟影响外，前端之间没有协调。该算法使用近似梯度，该梯度测量在延迟系统状态下额外请求的边际影响。我们算法的平衡点最小化集中式最优平均延迟，并我们提供了一种新颖的局部稳定性分析，表明当从足够接近该点的位置开始时，我们的算法会收敛到最优解。此外，我们提出了梯度下降步长的充分条件，这些条件在存在网络延迟的情况下保证收敛。数值实验表明，我们的算法是全局稳定和最优的，确认我们的稳定性条件几乎紧密，并且表明当网络延迟较大时，DGD-LB相对于文献中研究的其他负载均衡器可以带来显著的收益。

摘要： Motivated by the growing demand for serving large language model inference requests, we study distributed load balancing for global serving systems with network latencies. We consider a fluid model in which continuous flows of requests arrive at different frontends and need to be routed to distant backends for processing whose processing rates are workload dependent. Network latencies can lead to long travel times for requests and delayed feedback from backends. The objective is to minimize the average latency of requests, composed of the network latency and the serving latency at the backends. We introduce Distributed Gradient Descent Load Balancing (DGD-LB), a probabilistic routing algorithm in which each frontend adjusts the routing probabilities dynamically using gradient descent. Our algorithm is distributed: there is no coordination between frontends, except by observing the delayed impact other frontends have on shared backends. The algorithm uses an approximate gradient that measures the marginal impact of an additional request evaluated at a delayed system state. Equilibrium points of our algorithm minimize the centralized optimal average latencies, and we provide a novel local stability analysis showing that our algorithm converges to an optimal solution when started sufficiently close to that point. Moreover, we present sufficient conditions on the step-size of gradient descent that guarantee convergence in the presence of network latencies. Numerical experiments show that our algorithm is globally stable and optimal, confirm our stability conditions are nearly tight, and demonstrate that DGD-LB can lead to substantial gains relative to other load balancers studied in the literature when network latencies are large.

主题：	分布式、并行与集群计算 (cs.DC) ; 优化与控制 (math.OC)
引用方式：	arXiv:2504.10693 [cs.DC]
	(或者 arXiv:2504.10693v1 [cs.DC] 对于此版本)
	https://doi.org/10.48550/arXiv.2504.10693

提交历史

来自： Santiago Balseiro [查看电子邮件]
[v1] 星期一， 2025 年 4 月 14 日 20:30:17 UTC (105 KB)

计算机科学 > 分布式、并行与集群计算

标题：基于分布式梯度下降的考虑网络时延的负载均衡

标题： Load Balancing with Network Latencies via Distributed Gradient Descent

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 分布式、并行与集群计算

标题： 基于分布式梯度下降的考虑网络时延的负载均衡 显示英文标题

标题： Load Balancing with Network Latencies via Distributed Gradient Descent

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：基于分布式梯度下降的考虑网络时延的负载均衡