Highlight

Performance Potential of Mixed Data Management Modes for Heterogeneous Memory Systems

results from our tool to classify program data from mixed memory usage modes
The above figure demonstrates results from our tool to classify program data from mixed memory usage modes. Profile last-level cache misses for individual (4 KB) pages associated with each SICM-managed region. Some hot regions have bi-modal distribution (i.e., small number of very hot pages), indicating they can be managed efficiently by HW, while others have uniform distribution of access and are better managed by SW

Achievement

A team of researchers from Oak Ridge National Laboratory (ORNL) and the University of Tennessee designed, implemented and evaluated a high-performance computing (HPC) runtime system that monitors memory usage for complex tier memory systems and places new memory allocations in the correct tier. As new memory technologies emerge in HPC, issues arise when domain scientists are faced with non-standard memory interfaces to take advantage of these new memory tiers. Under the new advanced architectures, a new layer of memory is inserted in the memory hierarchy that is of larger capacity than today’s typical DDR-based memories, but with lower latency or bandwidth or both latency+bandwith. Examples include Intel® Knights Landing (KNL) architecture which allows applications to allocate and use 1/4th or 1/2th of the capacity of the MCDRAM tier directly, while the remaining capacity is managed as a hardware-directed cache. The ORNL/UTK developed automated systems to determine the capabilities of the underlying hardware and automatically manage memory allocations for optimized performance. Both offline (post mortem based) and online (dynamic based) approaches are being developed. Performance improvements were demonstrated commensurate with the level of disparity between the faster (and smaller capacity) tier and the slower (and larger capacity) tier.

Significance and Impact

Heterogeneous memory architectures typically provide two options for managing and using data across the available tiers of memory: (1) Hardware-directed, memory-side caching (transparent to the OS and runtime software but not always effective); (2) Software-directed data tiering (fine-grained control of data placement but high migration overhead, used by our approach). Different options work better for different apps / usage patterns. Many heterogeneous memory architectures provide mixed data management modes, with HW- and SW-directed management for different portions of the address space, simultaneously. Our approach addresses an emerging problem raised by these new complex memories: mixed data management modes are currently under-utilized due to lack of tools / understanding of which data should be managed by hardware vs. software.

Research Details

  • Used SICM to evaluate the potential of mixed data management modes
    • Created new experimental configurations for evaluating the performance potential of mixed data management modes on heterogeneous memories
    • Extended SICM with new profiling capabilities to partition application data into groups that should be HW-managed vs. SW-managed
  • Evaluation
    • Testbed: Intel KNL with 16 GB high bandwidth MCDRAM and 96 GB DDR4
    • Ran five HPC applications selected from CORAL II and SPEC CPU 2017 with:
    • Cache mode: all MCDRAM used as HW-directed cache
    • Flat mode: all MCDRAM managed with SICM-based SW guidance
    • Hybrid mode: 8 GB of MCDRAM managed with HW, 8 GB with SW

Citation and DOI

Effler, T. Chad, Michael R. Jantz, and Terry Jones. "Performance Potential of Mixed Data Management Modes for Heterogeneous Memory Systems." In 2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC), pp. 10-16. IEEE, 2020. DOI: 10.1109/MCHPC51950.2020.00007

Overview

A new tool developed at ORNL and UTK is able to realize the performance potential of new complex memory tiers without requiring domain scientists to study and adjust for underlying hardware peculiarities. SW-directed tiering with SICM performs well for most applications, but can be slower. Hybrid mode achieves the best of both approaches for all applications.

Last Updated: February 17, 2021 - 10:07 am