A team of researchers from Oak Ridge National Laboratory (ORNL) demonstrated that evolutionary algorithms can produce not only neural networks that perform a task well (e.g. highly accurate networks) but also produce networks that are energy efficient, small, and fast. They demonstrated that there is often a 2x improvement in energy usage, model size, and inference time between the networks with highest accuracy and one that will perform almost as well by analyzing the networks produced by ORNL’s Multinode Evolutionary Neural Networks for Deep Learning (MDNNDL) software on Summit. When it is expected for datacenter energy usage will account for 8% of energy usage worldwide in a decade, the ability to decrease the energy usage of a common workload will have a dramatic effect. These results indicate that there is much that could be gained by leveraging these secondary objectives within the evolutionary process in order to produce networks that are even more efficient.
Significance and Impact
As deep neural networks have been deployed in more and more applications over the past half-decade and are finding their way into an ever-increasing number of operational systems, their energy consumption becomes a concern whether running in the datacenter or on edge devices. Hyperparameter optimization and automated network design for deep learning is a quickly growing field, but much of the focus has remained only on optimizing for the performance of the machine learning task. In this work, we demonstrate that the best performing networks created through this automated network design process have radically different computational characteristics (e.g. energy usage, model size, inference time), presenting the opportunity to utilize this optimization process to make deep learning networks more energy efficient and deployable to smaller devices. Optimizing for these computational characteristics is critical as the number of applications of deep learning continues to expand.
- Utilized ORNL’s MENNDL on Summit
- Analyzed the performance of the resulting networks as a function of accuracy vs inference time, model size and energy consumption
- Utilized 3 benchmark datasets: CIFAR-10, CIFAR-100, and CINIC
- 2x improvement in energy usage, model size, and inference time
In this work, we discuss an approach for that utilizes high-performance computing (HPC) to evolve the hyperparameters and topology of convolutional neural networks in order to investigate the ability of this approach to produce energy efficient convolutional neural networks that can either be deployed at workstations or in data centers for more efficient, rapid data analysis. In particular, we analyze the performance of all of the networks created over the course of the hyperparameter optimization in terms of three metrics that are related to energy efficiency: inference time, model size, and energy consumption. We show that the different networks evolved can have radically different performance characteristics, and that as a byproduct of creating hundreds to thousands of different network topologies, we can choose our “best” network from among the evolved networks to suit our computational needs. The key conclusions of this work are that (1) automated network design approaches that do not optimize for computational characteristics can have radically different energy consumption, model size, and inference time, and (2) there is an opportunity to utilize automated network design as an approach to creating more efficient networks in addition to current work in areas such as network pruning.