Highlight

Tuyere: Enabling Scalable Memory Workloads for System Exploration

Fig. 1.  Tuyere integrates application, mapping, and system knowledge into hardware simulations.
Fig. 1. Tuyere integrates application, mapping, and system knowledge into hardware simulations.

Achievement

Represent complex application behaviors into concise data-centric abstractions and construct scalable memory workloads and flexible memory subsystems for experiments.

Significance and Impact

Supercomputers are featuring deeper and more heterogeneous memory hierarchies while the increasing diversity of applications further complicates the design of memory subsystems. Our work expedites the process of system exploration by directly integrating application knowledge into hardware simulations.

Research Details

  • Use Tuyere framework to express data-access traits, system specifications and data mappings in a modeling language.
  • Translate abstractions for various configurations and workloads into representative memory traffic.
  • Coordinate the interactions among memories in user-specified memory subsystems.
  • Achieve performance improvements from both reduced storage and profiling overhead.

Overview

Memory technologies are under active development. Meanwhile, workloads on contemporary computing systems are increasing rapidly in size and diversity. Such dynamics in hardware and software further widen the gap between memory system design and performance evaluation. In this work, we propose a data-centric abstraction of high-performance computing applications for fast exploration of new memory technologies. We also provide a framework that uses a formal modeling language to describe the abstraction, automatically translates abstractions into memory traffic, and directly interfaces with cycle-accurate simulators. We evaluated the framework using 20 workloads and validated the memory traffic profile, the simulation results, and the relative memory changes of four memory technologies. Our results show that the data-centric abstraction can accurately capture application behavior adaptable to different input problems and can expedite system exploration.