A team of scientists from CSMD and NCCS collaborated together to benchmark Summit for a Graph operation. Summit is 4th in the list at Graph500.org based on the ranking metric Giga Traversal Edges Per Second (GTEPS) and 1st in other important metrics such as GTEPS per node, Mega TEPS per core and GTEPS per memory bandwidth that is also commonly observed in community. Summit processed breadth first search on a very large graph with 1.1 trillion vertices that takes 281.5 Tera Bytes of memory for 64bits/edge on fewer number of 2048 nodes and 86016 CPU cores in comparison with 10.5 million cores on 40768 nodes in first place. Summit was awarded a certificate in the bird of feather session on Super Computing 2019 at Denver that was received by Jack Wells, Director of Oak Ridge Leadership Computing Facility.
Significance and Impact
Summit has been demonstrated to perform dense linear algebra operations well using Graphical Processing Units (GPU’s) and ranked at number 1 in Top500. However, Summit’s capability for sparse operations like Graph has not been explored. For the first time in the literature, the ORNL team has demonstrated Summit’s CPU’s capability for sparse graph operation on a standard kernel called breadth first search. This clearly positioned Summit per node capability for graph operations in top among other super computers in the world. ORNL scientists are exploring to leverage this Summit capability for their scientific graphs from different disciplines such as biology and transportation.
- Breadth first search is the problem of exploring all of the neighbor vertices of a given root vertex at the present depth prior to moving on to the vertices at the next depth level. The synthetic input graph consists of 2^40 vertices (~ 1.1 trillion), 2^44 edges (~ 17.5 trillion edges) and we have to perform BFS on randomly selected 64 root vertices.
- The algorithm involved a parallel 2D direction optimized breadth first search algorithm as detailed in 
- The reordered input graph was partitioned on 2048 nodes over 86016 MPI Processes as a 2D grid of 336x256 grid
- Graph construction took 454 seconds and on an average the kernel took 2.294 seconds ranging between 1.4-9.9 seconds among different MPI processors.
Graph 500 is an established set of large-scale benchmarks for data intensive applications and these high-performance applications cannot be improved without a meaningful benchmark. Graphs are a core part of most analytics workloads. Backed by a steering committee of over 30 international HPC experts from academia, industry, and national laboratories, Graph500 specification establishes a large-scale benchmark for these applications. This is the first serious approach to augment the Top 500 with data-intensive applications. The intent of benchmark problems (“Search” and “Shortest-Path”) is to develop a compact application that has multiple analysis techniques (multiple kernels) accessing a single data structure representing a weighted, undirected graph. The team of scientists from CSMD and NCCS worked over 9 months to place Summit on the 4th place in Graph500 based on reported GTEPS and 1st in other important metrics such as GTEPS per node, Mega TEPS per core and GTEPS per memory bandwidth that is also commonly observed in community.
Last Updated: April 6, 2021 - 11:59 am