An analytical model is developed and evaluated that describes the behavior of novel HPC applications with highly stochastic and dynamic resource requirements for both CPU and memory arising from fields like neuroscience, genome research and bioinformatics.
Significance and Impact
New applications that do not follow the traditional HPC model can now be deployed efficiently on large-scale systems without relying on inefficient ad-hoc solutions.
- The model describes applications as a chain of tasks each with probabilistic resource requirements based on as little as 10 runs and can adapt to shifts in behavior.
- Applications from fields doing exploratory research that do not fit the HPC model can optimize their submissions and request more realistic resources.
- Experiments for applications from the neuroscience department at the Vanderbilt university show an average improvement of 20-25% in response time.
Citation and DOI
A. Gainaru, B. Goglin, V. Honoré and G. Pallez, Profiles of Upcoming HPC Applications and Their Impact on Reservation Strategies in IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 5, pp. 1178-1190
New applications with different profiles are starting to require the computational power of HPC infrastructures. This study aims to better understanding the features and needs of these applications in order to be able to run them efficiently on HPC platforms. For this purpose the Spatially Localized Atlas Network Tiles application (SLANT, originating from the neuroscience community) has been chosen for analysis by investigating in details its resource requirements within one run and across multiple runs with different input data. Based on these observations, we derive a generic, yet simple, application model (namely, a linear sequence of stochastic jobs). We expect this model to be representative for a large set of upcoming applications from emerging fields that start to require the computational power of HPC clusters without fitting the typical behavior of large-scale traditional applications.
We then apply this generic model in a scheduling framework. Specifically we consider the problem of making reservations (both time and memory) for an execution on an HPC platform based on the application expected resource requirements. We experimentally show the robustness of the model, even with very few data points or when using another application to generate the model, and provide performance gains with regards to standard and more recent approaches used in the neuroscience community.
Last Updated: January 20, 2021 - 10:48 am