Highlight

Are We Witnessing the Spectre of an HPC Meltdown?

Job size relative to system size
Results on Titan for various job sizes, from a single compute node up to full-system jobs. Performance after patches is reported normalized to the before patch measurement. Values less than 1.0 indicate a performance degradation.

Achievement

Determined the performance impact of security patches for the Spectre and Meltdown vulnerabilities on Oak Ridge Leadership Computing Facility supercomputers.

Significance and Impact

Beyond posing a serious security risk, initial reports show that available fixes for the vulnerabilities can impact performance anywhere from just a few percent to nearly 50% depending on the application. The highest impacts are expected in applications that perform a large number of system calls (e.g., I/O operations). Our study shows that, fortunately, HPC applications are fairly immune to the performance degradations observed in other scenarios such as cloud computing.

Research Details

  • Studied performance of a wide range of HPC benchmarks and applications before and after security patches
  • Focused on HPC systems supplied by Cray, which covered a wide range of processors and interconnect technologies

Overview

This study summarizes the performance impacts observed on four different Cray architectures available at the OLCF. The applications and benchmarks chosen were selected because they are commonly used at the OLCF. In addition, applications with different characteristics were chosen to get a comprehensive view of the potential impacts of Spectre and Meltdown patches.

Due to the significant security risk that the vulnerabilities present, systems had to be patched immediately following the release of the patches from Cray. In some cases, this limited the number of tests that could be conducted before a system was patched.

The results show that the overall impact of the patches available is minimal, and in some cases performance actually improved due to processor firmware updates. This is an encouraging result for OLCF’s user community, which can be assured that the deployed vulnerability mitigations will not decrease their scientific productivity.

Publication

Veronica G. Vergara Larrea, et al., “Are We Witnessing the Spectre of an HPC Meltdown”, Cray User Group 2018, Stockholm, Sweden. May 2018. (Selected for publication in a special issue of the journal  Concurrency and Computation: Practice and Experience)