Arghya Chatterjee (CSMD, ORNL / GA Tech), Oscar Hernandez (CSMD, ORNL), Thomas Maier (CCSD, ORNL), Vivek Sarkar (GA Tech). Preprint. 3rd International Workshop on Performance Portable Programming Models for Accelerators (P^3MA).
Scientific applications must be ported to today's highly heterogeneous HPC systems consisting of multiple accelerators and CPUs with unified heterogeneous memories (multiple NUMA domains) and persistent non-volate memories. New strategies needs to be developed to identify the parallelism in applications and map them efficiently to the underlying architecture.
In this paper, we discuss the strategies of current state-of-the-art programming models to port DCA++ (Dynamical Cluster Approximation) to exploit all the available compute power of Summit (the world's top supercomputer with a theoretical 200 petaflops). The DCA++ is a high-performance research code to solve quantum many body problems with cutting edge quantum cluster algorithms that has been ported successfully to the Titan system where it has reached 16 petaflops of performance. We discuss the parallelism available in DCA++ today and performance characteristics on Titan, with the idea to learn what worked well and what needs to improved. This in addition to the new science we want to add to the application to harness the power of Summit.
In this paper we identify programming models and tools that can potentially help to port the next-generation version of DCA++ using hybrid programming models to get good performance on Summit.