Jeffrey Vetter

Highlights

Fig. 1.  Single GPU Tests on NVIDIA P100. The results show that indirect addressing outperforms locally direct addressing, and CSoA memory layout outperforms SoA and bundling memory layouts.

GPU performance of the lattice Boltzmann method (LBM) depends heavily on memory access patterns. When LBM is advanced with GPUs on complex computational domains, geometric data is typically accessed…

Fig. 1.  Comparison of directive-based FPGA approach with directive-based CPU and GPU approaches.

Reconfigurable architectures like Field Programmable Gate Arrays (FPGAs) have been used for accelerating computations from several domains because of their unique combination of flexibility,…

Fig. 1.  Tuyere integrates application, mapping, and system knowledge into hardware simulations.

Memory technologies are under active development. Meanwhile, workloads on contemporary computing systems are increasing rapidly in size and diversity. Such dynamics in hardware and software further…

juggler

In this study we have proposed Juggler, a new, dynamic task-based execution scheme for GPGPU applications with data dependences. Different from previous studies, Juggler implements an in-GPU runtime…