- The authors collaborated with TAU (SciDAC research institute) to profile and analyze DCA++ code.
- Optimizations to the code was made based on the performance bottlenecks identified by TAU.
- For this highlight, we observe a 15x speed up over the old code, and upto 47x improvement in the GPU utilization of the code on Summit.
- For full scale production run, we observe about 120x speedup over the old code on all 4600 nodes on Summit
Significance and Impact
Collaborating via the RAPIDS Institute, a joint research team from ORNL and the University of Oregon has harnessed TAU’s performance feedback to assist the DCA++ team in exploiting the GPUs on ORNL’s Summit supercomputer. Specifically, TAU has enabled DCA++ developers to improve the code’s performance on the Summit system and increase GPU utilization.
- TAU provides insight in tuning DCA++ execution parameters (e.g. Monte Carlo walkers/accumulators) to run efficiently on Summit and Titan.
- Researchers developed ideas to visualize massive amounts of GPU performance data in a scalable way.
- TAU facilitates the porting and testing of DCA++ on Summit by integrating DCA into a continuous integration performance system.
- DCA++, an ORNL-developed code to simulate correlated quantum materials.
- Researchers develop optimized algorithm and parallelization strategies for the implementation of new science capabilities in DCA++ using the performance and visualization tool, TAU (Tuning and Analysis Utilities).
- TAU is a scalable and portable profiling and tracing toolkit for the analysis of parallel programs developed under SciDAC’sRAPIDS Institute.