Stable Anderson acceleration for Deep Learning

Abstract: Anderson acceleration (AA) is an extrapolation technique designed to speed-up fixed-point iterations, like those arising from the iterative training of Deep Learning (DL) models. Training DL models requires large datasets processed in randomly sampled batches that tend to introduce the fixed-point iteration stochastic oscillations of amplitude roughly inversely proportional to the size of the batch. These oscillations reduce, and occasionally eliminate, the positive effect of AA. To restore AA's advantage, we combine it with an adaptive moving average procedure that smooths the oscillations and results in a more regular sequence of gradient descent updates. By monitoring the relative standard deviation between consecutive iterations, we also introduce a criterion to automatically assess whether the moving average is needed. We applied the method to the following DL instantiations: (i) multi-layer perceptrons trained on the open-source dataset for regression, (ii) physics informed neural networks trained on source data to solve 2d and 100d Burgers' partial differential equations, and (iii) Residual Network with 50 layers trained on the open-source ImageNet1k dataset for image classification. Numerical results obtained using up to 1,536 NVIDIA V100 graphics processing units on the Oak Ridge Leadership Computing Facility supercomputer, Summit, showed the stabilizing effect of the moving average on AA for all the problems above.

Speaker’s Bio: Massimiliano (Max) Lupo Pasini is a Computational Scientist in the Scalable Algorithms and Coupled Physics group at the Computational Sciences and Engineering division at Oak Ridge National Laboratory (ORNL). He was previously a Postdoctoral Researcher Associate in the Scientific Computing group at the National Center for Computational Sciences Division at ORNL.

Massimiliano’s research at ORNL focuses on the development of hyper parameter optimization techniques for DL models and acceleration of computational physics applications using DL techniques as surrogate models. Massimiliano obtained his PhD in Applied Mathematics at Emory University in Atlanta, Georgia, in May 2018. The main topic of his doctorate work was the development of efficient and resilient linear solvers for upcoming computing architectures moving towards Exascale (10^18 floating point operations per second) capacities. Massimiliano obtained his Bachelor of Science, and Master of Science in Mathematical Engineering, at the Politecnico di Milano in Milan, Italy. The focus of his undergraduate and master studies was statistics and discretization techniques and reduction order models for partial differential equations.

Last Updated: November 3, 2021 - 8:01 am

Search

Computer Science and Mathematics