Stochastic Distributed Learning with Gradient Quantization and Double Variance Reduction
Samuel Horváth, Dmitry Kovalev, Konstantin Mishchenko, Sebastian Stich, Peter Richtárik
April, 2019
Abstract
We consider distributed optimization where the objective function is spread among different devices, each sending incremental model updates to a central server. To alleviate the communication bottleneck, recent work proposed various schemes to compress (e.g. quantize or sparsify) the gradients, thereby introducing additional variance 1 that might slow down convergence. For strongly convex functions with condition number kappa distributed among machines, we (i) give a scheme that converges in steps to a neighborhood of the optimal solution. For objective functions with a finite-sum structure, each worker having less than m components, we (ii) present novel variance reduced schemes that converge in steps to arbitrary accuracy .
Publication
Optimization Methods and Software