Mehmet E. Belviranli, Seyong Lee, Jeffrey S. Vetter, and Laxmi N. Bhuyan. 2018. Juggler: a dependence-aware task-based execution framework for GPUs. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '18). ACM, New York, NY, USA, 54-67.
In this study we have proposed Juggler, a new, dynamic task-based execution scheme for GPGPU applications with data dependences. Different from previous studies, Juggler implements an in-GPU runtime for applications with OpenMP 4.5–based dependences. The runtime uniquely employs in-GPU dependence resolution and task placement. Our experimental evaluation of seven scientific kernels with data dependences on an NVIDIA Tesla P100 GPU showed that Juggler improves kernel execution performance up to 31% when compared to global barrier-based implementation. Our results demonstrate that the conventional GPGPU programming paradigms relying on grid-based execution with global synchronization can be replaced with DAG-based, dependence-aware task processing to increase the performance of scientific applications.Read Publication