Highlight

The OpenACC data model: Preliminary study on its major challenges and implementations

OpenACC data model

Achievement

This paper describes how the OpenACC data model is implemented in current OpenACC compilers, ranging from research compilers (OpenUH and OpenARC) to a commercial compiler (the PGI OpenACC compiler). This includes implementation of the data directives and clauses, testing whether the data is already present on the device, managing asynchronous data transfers and memory allocations, handling aliased data, reusing device memory, managing partially present data, and supporting shared memory between the host and device.

Significance and Impact

The goal of this work is to provide information and guidance for other implementations of OpenACC or similar programming models such as OpenMP. State-of-the-art devices have a more interesting variety of memory hierarchies. We identified that directive-based, high-level programming models such as OpenACC must evolve to support many different memory hierarchy organizations to provide a truly performance-portable experience.

Research Details

  • Summarize various memory architectures in the today's accelerator systems.
  • Describe details and issues in implementing the OpenACC data model in the three different OpenACC compilers.
  • Measure the present table lookups, device memory allocation, pinned memory allocation, and managed memory in the three OpenACC compilers using eight OpenACC applications (seven from the SPEC ACCEL benchmark suite and a shock-hydrodynamics mini-application called LULESH).

Overview

This paper describes how the OpenACC data model is implemented in current OpenACC compilers, ranging from research compilers (OpenUH and OpenARC) to a commercial compiler (the PGI OpenACC compiler). First, we summarize various memory architectures in the today's accelerator systems. We then describe details and issues in implementing the OpenACC data model in the three different OpenACC compilers. This includes managing page tables, asynchronous data transfers, asynchronous memory allocate and free, host data construct, aliasing on a data directive, reusing device memory, partially present data, and adjacent data. It also discusses on-going work to manage large, complex dynamic data structures. We measured the present table lookups, device memory allocation, pinned memory allocation, and managed memory in the three OpenACC compilers using eight OpenACC applications (seven from the SPEC ACCEL benchmark suite and a shock-hydrodynamics mini-application called LULESH).