Variance Reduced Local SGD with Lower Communication Complexity

Liang, Xianfeng; Shen, Shuheng; Liu, Jingchang; Pan, Zhen; Chen, Enhong; Cheng, Yifei

计算机科学 > 机器学习

arXiv:1912.12844 (cs)

[提交于 2019年12月30日 ]

标题：具有更低通信复杂度的方差减少局部SGD

标题： Variance Reduced Local SGD with Lower Communication Complexity

Authors:Xianfeng Liang, Shuheng Shen, Jingchang Liu, Zhen Pan, Enhong Chen, Yifei Cheng

摘要：为了加速机器学习模型的训练，分布式随机梯度下降（SGD）及其变体被广泛采用，这些方法通过并行使用多个工作器来加快训练速度。其中，由于本地SGD具有较低的通信成本而备受关注。然而，当工作器上的数据分布不同时，本地SGD需要进行 $O(T^{\frac{3}{4}} N^{\frac{3}{4}})$ 次通信以保持其 \emph{线性迭代加速} 特性，其中 $T$ 是总迭代次数， $N$ 是工作器的数量。本文中，我们提出了方差减少的本地SGD（VRL-SGD），进一步降低了通信复杂度。得益于消除了工作器之间梯度方差的依赖性，我们在理论上证明了即使工作器访问的是非相同的数据集，VRL-SGD也能达到一个 \emph{线性迭代加速} 的结果，并且具有更低的通信复杂度 $O(T^{\frac{1}{2}} N^{\frac{3}{2}})$。我们针对三个机器学习任务进行了实验，实验结果表明，当工作器之间的数据非常多样化时，VRL-SGD的表现明显优于本地SGD。

摘要： To accelerate the training of machine learning models, distributed stochastic gradient descent (SGD) and its variants have been widely adopted, which apply multiple workers in parallel to speed up training. Among them, Local SGD has gained much attention due to its lower communication cost. Nevertheless, when the data distribution on workers is non-identical, Local SGD requires $O(T^{\frac{3}{4}} N^{\frac{3}{4}})$ communications to maintain its \emph{linear iteration speedup} property, where $T$ is the total number of iterations and $N$ is the number of workers. In this paper, we propose Variance Reduced Local SGD (VRL-SGD) to further reduce the communication complexity. Benefiting from eliminating the dependency on the gradient variance among workers, we theoretically prove that VRL-SGD achieves a \emph{linear iteration speedup} with a lower communication complexity $O(T^{\frac{1}{2}} N^{\frac{3}{2}})$ even if workers access non-identical datasets. We conduct experiments on three machine learning tasks, and the experimental results demonstrate that VRL-SGD performs impressively better than Local SGD when the data among workers are quite diverse.

评论：	25页，6个图。本文提出了一种新的局部SGD方差减少算法。
主题：	机器学习 (cs.LG) ; 分布式、并行与集群计算 (cs.DC); 优化与控制 (math.OC); 机器学习 (stat.ML)
引用方式：	arXiv:1912.12844 [cs.LG]
	(或者 arXiv:1912.12844v1 [cs.LG] 对于此版本)
	https://doi.org/10.48550/arXiv.1912.12844

提交历史

来自： Xianfeng Liang [查看电子邮件]
[v1] 星期一， 2019 年 12 月 30 日 08:15:21 UTC (802 KB)

计算机科学 > 机器学习

标题：具有更低通信复杂度的方差减少局部SGD

标题： Variance Reduced Local SGD with Lower Communication Complexity

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 机器学习

标题： 具有更低通信复杂度的方差减少局部SGD 显示英文标题

标题： Variance Reduced Local SGD with Lower Communication Complexity

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：具有更低通信复杂度的方差减少局部SGD