Design and implementation of an LLVM instrumentation plugin for Score-P
Significance and Impact
This plugin provides vastly improved instrumentation over automatic compiler instrumentation by not interfering with optimization passes and allowing to selectively include functions.
- Add support for Clang and POWER8 to Score-P.
- Leverage our knowledge from developing a GCC instrumentation plugin to bring the benefits of a custom instrumentation pass to LLVM.
- Enable selective instrumentation via Score-P’s filter format.
- The plugin correctly handles exceptions, which automatic compiler instrumentation and the new XRay do not.
- Evaluate its effectiveness on Summitdev.
In order to investigate software performance, we need a way to intercept an application at runtime, measure what it is doing, store that in a log and finally analyze and/or visualize this performance data. There are many different ways to intercept an application’s runtime. E.g. Sampling, instrumentation, library wrapping and tools interfaces. One of Score-P’s main sources of interception is instrumentation. Compared to sampling which probes at a given frequency, instrumentation provides callbacks before and after a function is called. This enables exact per-function-call measurements of time, performance counters and invocation counts.
One of the prime providers of instrumentation is automatic compiler instrumentation (-finstrument-functions). Sadly, its implementation in most compilers is lackluster, because it switches parts of the optimization off and in general does not give any guarantees about the implementation details. This also means that for most compilers, C++ templates are unrolled, but the resulting tiny multi-level function calls are not inlined, making it very challenging to analyze C++ applications. A second downside of automatic compiler instrumentation is that excluding certain functions and source files from instrumentation is compiler specific and often lackluster. Specifically, Clang has no facility for this. In order to reduce instrumentation overhead, compile-time filtering is badly needed.
The addition of the GCC plugin to Score-P in 2014 marked a major improvement in our ability to avoid excessive instrumentation and the resulting measurement overhead, and only instrument what the compiler would not optimize away. Since Clang and Flang are two of the major compilers for Summit and Sierra, and more compiler vendors are adopting LLVM-based backends for their compilers, we want to bring the benefits of custom instrumentation to LLVM.
We demonstrate the plugins effectiveness using the miniFE app on 20 threads. The plugin reduces the maximum slowdown from 133.3 to 4.5 and the minimum from 23.3 to 1.2, and the (Table 2).
Adding Flang support should be straightforward, once a clang-like frontend for Flang is available.