New paper: ProxSkip, a method for federated learning
Our new paper is now available on arXiv: abstract, pdf. We present a new proximal-gradient method capable of skipping computation of the proximity operator, which we designed with applications in federated learning in mind. Specifically, when the skipped operator is the averaging of local (i.e., stored on devices) iterates, this corresponds to skipping communication with provable benefits. In fact, we show that one can accelerate the convergence in terms of communication rounds, similarly to Nesterov’s acceleration but without using any momentum. Related methods, such as Scaffold, are only proved to perform comparably to gradient descent, but not better. Nevertheless, our method, when it is specialized to federated learning, is algorithmically very similar to Scaffold, so we call it Scaffnew (any guesses why?:) ).
Also, check this twitter thread if you want to read an informal description.