Highlight

Directive-based, High-Level Programming and Optimizations for High-Performance Computing with FPGAs

Fig. 1.  Comparison of directive-based FPGA approach with directive-based CPU and GPU approaches.
Fig. 1. Comparison of directive-based FPGA approach with directive-based CPU and GPU approaches.

Achievement

Optimizations to enable high-performance FPGA computing with directives have been implemented using compiler transformations.

Significance and Impact

FPGAs, which can offer runtime and power benefits, are traditionally difficult to program. We lower the programming barrier for these devices while still maintaining high performance.

Research Details

  • The researchers identify FPGA-specific OpenCL patterns to be expressed as OpenACC directives and develop compiler transformations within the OpenARC compiler framework to generate these patterns from the directives.
  • The researchers show that these optimizations increase FPGA OpenACC program performance, in some cases similar to hand-tuned OpenCL. They also show that directive-programmed FPGAs can have comparable performance to directive-programmed CPUs and GPUs, especially for power.

Overview

Reconfigurable architectures like Field Programmable Gate Arrays (FPGAs) have been used for accelerating computations from several domains because of their unique combination of flexibility, performance, and power efficiency. However, FPGAs have not been widely used for high-performance computing, primarily because of their programming complexity and difficulties in optimizing performance. In this paper, we present a directive-based, high-level optimization framework for high-performance computing with FPGAs, built on top of an OpenACC-to-FPGA translation framework called OpenARC. We propose directive extensions and corresponding compile-time optimization techniques to enable the compiler to generate more efficient FPGA hardware configuration files. Empirical evaluation of the proposed framework on an Intel Stratix V with five OpenACC benchmarks from various application domains shows that FPGA-specific optimizations can lead to significant increases in performance across all tested applications. We also demonstrate that applying these high-level directive-based optimizations can allow OpenACC applications to perform similarly to lower-level OpenCL applications with hand-written FPGA-specific optimizations, and offer runtime and power performance benefits compared to CPUs and GPUs.