Proposal for new OpenSHMEM API that will allow for persistent active Sets that can be used multiple times with extension, called groups, to hide scratch pad symmetric data management by the application programmer.
Significance and Impact
Collective operations in the OpenSHMEM programming model are defined over an active Set, which is a grouping of Processing Elements (PEs) based on a triple of information including the starting PE,a log2 stride, and the size of the active Set. In addition to the active Set, collectives require Users to allocate and initialize synchronization (i.e., pSync) and scratchpad (i.e., pWrk) buffers for use by the collective operations. While active Sets and the user-defined buffers were previously useful based on hardware and algorithmic considerations, future systems and applications require us to re-evaluate these concepts. We propose APIs for Set creation (stride, list, union, intersection, and difference), query, index, translate and free operations are local and the Set handle is private to the PE. The Set is then used to create a group when the Set needs to perform a collective. By providing a decoupling between the Sets and Groups in OpenSHMEM, we delay expensive group creation operation to the time that it is needed in the application and also prevent the unnecessary cost of group creation in embarrassingly parallel compute intensive applications where only Sets are sufficient for work distribution. Groups abstracts Sets and couples it with memory spaces, which remove the allocation and initialization burden from the User. API for Groups allows for group create (from set), split on color, duplicate, free, and query. In the paper we present the API, the implementation logic, and the performance results.
We implement the proposed API in the OSH-X reference implementation that uses the UCX communications library. To evaluate Sets and Groups, we perform multiple micro-benchmarks to determine the overhead of these abstractions and demonstrate their utility by implementing a distributed All-Pairs Shortest Path (APSP) application, which we evaluate using multiple synthetic and real-world graphs. For our evaluations, we made use of both the Eos and Titan machines located at the Oak Ridge Leadership Computing Facility (OLCF).
Sets and Groups provide useful abstractions for the User to create and manipulate persistent groups of PEs for collective operations. However, existing collective operations expect active sets to be provided by the User, which requires log2 strides. To support the APIs presented in this paper, we have extended the collective interfaces to include Sets and Groups based equivalents. An example of this extension can be seen with the barrier operation, which requires the User to define an active set (i.e., a triple defining a starting index, log2 stride, and size) as well as allocate and initialize a synchronization buffer (i.e., pSync array). In total, this requires four parameters. For the Sets-based interface, the active set parameters are replaced by the Set, but the synchronization buffer parameter remains. For the Groups-based interface, this is further reduced to one parameter, which is the Group as it is coupling the Set and necessary resources for the collective. To support the APIs presented in this paper, we have extended the collective interfaces to include Sets and Groups based equivalents.