Papers

(2024). Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference.

PDF Cite Slides arXiv

(2024). The Road Less Scheduled.

PDF Cite arXiv

(2023). When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement.

PDF Cite arXiv

(2023). Adaptive Proximal Gradient Method for Convex Optimization.

PDF Cite arXiv

(2023). Partially Personalized Federated Learning: Breaking the Curse of Data Heterogeneity.

PDF Cite arXiv

(2023). Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy.

PDF Cite arXiv ICML

(2023). Learning-Rate-Free Learning by D-Adaptation.

PDF Cite Code arXiv ICML

(2023). Convergence of First-Order Algorithms for Meta-Learning with Moreau Envelopes.

PDF Cite arXiv

(2022). Super-Universal Regularized Newton Method.

PDF Cite Code Slides arXiv

(2022). Adaptive Learning Rates for Faster Stochastic Gradient Methods.

PDF Cite arXiv

(2022). Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays.

PDF Cite Code Slides arXiv

(2022). ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!. ICML.

PDF Cite Code Video arXiv ICML

(2022). Server-Side Stepsizes and Sampling Without Replacement Provably Help in Federated Optimization.

PDF Cite arXiv

(2021). IntSGD: Adaptive Floatless Compression of Stochastic Gradients. ICLR.

PDF Cite Code Poster Slides arXiv ICLR

(2021). Proximal and Federated Random Reshuffling. ICML.

PDF Cite Code Slides Video arXiv ICML

(2020). Random Reshuffling: Simple Analysis with Vast Improvements. NeurIPS.

PDF Cite Code Poster Slides arXiv NeurIPS

(2020). Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms. JOTA.

PDF Cite Poster arXiv JOTA

(2019). Adaptive Gradient Descent without Descent. ICML.

PDF Cite Code Poster Slides arXiv ICML Video

(2019). Tighter Theory for Local SGD on Identical and Heterogeneous Data. AISTATS.

PDF Cite Slides arXiv AISTATS

(2019). First Analysis of Local GD on Heterogeneous Data.

PDF Cite Slides arXiv NeurIPS

(2019). MISO is Making a Comeback With Better Proofs and Rates.

PDF Cite arXiv

(2019). DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate. AISTATS.

PDF Cite arXiv AISTATS

(2019). Revisiting Stochastic Extragradient. AISTATS.

PDF Cite Slides arXiv AISTATS

(2019). Stochastic Distributed Learning with Gradient Quantization and Double Variance Reduction. Optimization Methods and Software.

PDF Cite arXiv

(2019). Distributed Learning with Compressed Gradient Differences.

PDF Cite arXiv

(2019). 99% of Worker-Master Communication in Distributed Optimization Is Not Needed. UAI.

PDF Cite arXiv UAI

(2018). SEGA: Variance Reduction via Gradient Sketching. NeurIPS.

PDF Cite arXiv NIPS

(2018). A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning. ICML.

PDF Cite ICML

(2018). A Distributed Flexible Delay-tolerant Proximal Gradient Algorithm. SIOPT.

PDF Cite arXiv SIAM