1D and 2D s-step Methods for Convex Optimization

Abstract: First-order methods, such as block coordinate descent (BCD) and stochastic gradient descent (SGD), are some of the most widely used optimization methods for solving various machine learning problems. These methods typically solve an optimization problem by iteratively subsampling a few samples (SGD) or features (BCD) from the input data. SGD computes gradients with respect to the chosen samples while BCD solves a subproblem with respect to the chosen features. In a parallel setting, both BCD and SGD require interprocess communication at every iteration. This talk will introduce s-step variants of BCD and SGD which avoid communication for s iterations, where s is a tuning parameter. We borrow ideas from s-step and CA-Krylov methods and illustrate how they can be utilized to reorganize first-order methods to solve non-linear convex optimization problems. We show that these algorithms are numerically stable and attain large speedups on parallel machines. Finally, we will show how s-step BCD/SGD can be combined with existing methods in federated learning and communication-efficient learning to yield 2D distributed-memory optimization methods.

Speaker’s Bio: Aditya Devarakonda is an Assistant Professor in the Department of Computer Science at Wake Forest University. He received his Ph.D. in Computer Science from the University of California, Berkeley in 2018 and his B.S. in Computer Engineering from Rutgers University, New Brunswick in 2012. His research interests are in high-performance computing, machine learning, and scientific computing with a focus on the design, analysis, and implementation of efficient parallel algorithms.

Last Updated: July 13, 2023 - 2:45 pm

Search

Computer Science and Mathematics