Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction

Tyurin, Alexander

数学 > 优化与控制

arXiv:2506.23836 (math)

[提交于 2025年6月30日 ]

标题：通过新的下界构造证明集中式分布式优化的可扩展性有限

标题： Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction

Authors:Alexander Tyurin

摘要：我们考虑经典联邦学习设置中的集中式分布式优化，其中 $n$个工作者共同寻找一个 $\varepsilon$-驻点的 $L$-光滑、 $d$-维非凸函数 $f$，仅能访问方差为 $\sigma^2$的无偏随机梯度。每个工作者计算一个随机梯度最多需要$h$秒，从服务器到工作者和从工作者到服务器的通信时间分别为每坐标$\tau_{s}$和$\tau_{w}$秒。分布式优化的主要动机之一是相对于$n$实现可扩展性。例如，众所周知，分布式版本的SGD有一个与方差相关的运行时间项$\frac{h \sigma^2 L \Delta}{n \varepsilon^2},$，它随着工作者数量$n,$的增加而改善，其中$\Delta = f(x^0) - f^*,$和$x^0 \in R^d$是起始点。同样地，使用无偏稀疏化压缩器，可以减少与方差相关的运行时间项和通信运行时间项。然而，一旦考虑从服务器到工作者的通信$\tau_{s}$，我们证明设计一种使用无偏随机稀疏化压缩器的方法变得不可行，该方法无法在$n$的多项对数范围内更好地缩放服务器端通信运行时间项$\tau_{s} d \frac{L \Delta}{\varepsilon}$和方差相关运行时间项$\frac{h \sigma^2 L \Delta}{\varepsilon^2},$，即使在同质（独立同分布）情况下也是如此，其中所有工作者访问相同分布。为了建立这个结果，我们构造了一个新的“最坏情况”函数，并开发了一个新的下界框架，将分析简化为随机和的集中性，我们为此证明了一个集中性边界。这些结果揭示了在分布式优化中的基本限制，即使在同质假设下也是如此。

摘要： We consider centralized distributed optimization in the classical federated learning setup, where $n$ workers jointly find an $\varepsilon$-stationary point of an $L$-smooth, $d$-dimensional nonconvex function $f$, having access only to unbiased stochastic gradients with variance $\sigma^2$. Each worker requires at most $h$ seconds to compute a stochastic gradient, and the communication times from the server to the workers and from the workers to the server are $\tau_{s}$ and $\tau_{w}$ seconds per coordinate, respectively. One of the main motivations for distributed optimization is to achieve scalability with respect to $n$. For instance, it is well known that the distributed version of SGD has a variance-dependent runtime term $\frac{h \sigma^2 L \Delta}{n \varepsilon^2},$ which improves with the number of workers $n,$ where $\Delta = f(x^0) - f^*,$ and $x^0 \in R^d$ is the starting point. Similarly, using unbiased sparsification compressors, it is possible to reduce both the variance-dependent runtime term and the communication runtime term. However, once we account for the communication from the server to the workers $\tau_{s}$, we prove that it becomes infeasible to design a method using unbiased random sparsification compressors that scales both the server-side communication runtime term $\tau_{s} d \frac{L \Delta}{\varepsilon}$ and the variance-dependent runtime term $\frac{h \sigma^2 L \Delta}{\varepsilon^2},$ better than poly-logarithmically in $n$, even in the homogeneous (i.i.d.) case, where all workers access the same distribution. To establish this result, we construct a new "worst-case" function and develop a new lower bound framework that reduces the analysis to the concentration of a random sum, for which we prove a concentration bound. These results reveal fundamental limitations in scaling distributed optimization, even under the homogeneous assumption.

主题：	优化与控制 (math.OC) ; 分布式、并行与集群计算 (cs.DC); 机器学习 (cs.LG)
引用方式：	arXiv:2506.23836 [math.OC]
	(或者 arXiv:2506.23836v1 [math.OC] 对于此版本)
	https://doi.org/10.48550/arXiv.2506.23836

提交历史

来自： Alexander Tyurin [查看电子邮件]
[v1] 星期一， 2025 年 6 月 30 日 13:27:39 UTC (233 KB)

数学 > 优化与控制

标题：通过新的下界构造证明集中式分布式优化的可扩展性有限

标题： Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

数学 > 优化与控制

标题： 通过新的下界构造证明集中式分布式优化的可扩展性有限 显示英文标题

标题： Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：通过新的下界构造证明集中式分布式优化的可扩展性有限