Highlight

SharP Unified Memory Allocator: An Intent-based Memory Allocator for Extreme-scale Systems

Bandwidth and Message
Figure 1. Bandwidth and Message rate on systems with different affinities to the NIC

Achievement

Design and implementation of the SharP Unified Memory Allocator, which performs intent-based memory allocations based on the composition of high-level hints and constraints, and demonstrated the effectiveness of the allocator by adapting Open MPI and OpenSHMEM-X to support the allocator with minimal changes to their vanilla versions.

Significance and Impact

This work demonstrates the performance portability capabilities of the SharP programming abstraction and the ability of the SharP UMA to be integrated into popular programming models allowing many scientific simulations and applications to take advantage of hierarchical and heterogeneous memories with minimal porting effort.

Research Details

  • Classified and designed higher-level abstractions for Users to perform memory allocations on multiple memory types in the system while enabling data locality and affinity and used these abstractions to design and implement the SharP Unified Memory Allocator.
  • Adapted Open MPI and OpenSHMEM-X implementations to support the SharP Unified Memory Allocator allowing many applications to obtain performance portability through existing interfaces (i.e., MPI_Alloc) or new, simple interfaces.
  • Evaluated and demonstrated the portability advantages of the SharP Unified Memory Allocator on the Turing cluster at ORNL and the Rhea system at the Oak Ridge Leadership Computing Facility and found the performance to be similar across systems differing systems; in addition, the performance of allocating memory near the NIC for distant processes increases the performance of zero-copy Put and Get operations by up to 8%.  

Overview

The pre-exascale systems will soon be deployed with a deep, complex memory hierarchy composed of many heterogeneous memories. This presents multiple challenges for users including: how to allocate data objects with locality between memories and devices for the various memories in these systems, which includes DRAM, High-bandwidth Memory (HBM), and non-volatile random access memory (NVRAM), and how to perform these allocations while providing portability for their application. Currently, the user can make use of multiple, disjoint libraries to allocate data objects on these memories. However, it is difficult to obtain locality between memories and devices when using libraries that are unaware of each other. This paper presents the Unified Memory Allocator (UMA) of the SHARed data-structure centric Programming abstraction (SharP) library, which provides a unified interface for memory allocations across DRAM, HBM, and NVRAM and is extensible to support future memory types. In addition, the SharP UMA allows for portability between systems by supporting both explicit and implicit, intent-based memory allocations. To demonstrate the ease of use of the SharP UMA, we have extended both Open MPI and OpenSHMEM-X to support SharP. We validate this work by evaluating the performance implications and intent-based approach with synthetic benchmarks as well as adaptations of the Graph500 benchmark.