Recent decades have witnessed an explosive growth of complex systems, such as the Internet, biological networks, social networks, and various infrastructure networks. To model these systems, synthetic networks are typically being used. Many of the networks are scale-free and massive in size. Here, we present a novel parallel algorithm for generating scale-free synthetic networks using the Preferential Attachment model. Our algorithm, named cuPPA, is custom-designed for "single instruction multiple data (SIMD)" style of parallel processing supported by modern processors such as graphical processing units (GPUs). To the best of our knowledge, our algorithm is one of the fastest implementations available today to generate scale-free networks. In one of the best cases, execution on an NVidia Tesla V100 GPU, cuPPA can generate a scale-free network of over four billion edges in a matter of seconds. We also present the challenges and techniques to address those challenges in designing and developing a multi GPU version of the algorithm.
Significance and Impact
Simulation of complex systems require realistic networks. cuPPA provides the first GPU based networks using the preferential-attachment model with the underlying assumptions of real-world network growth. As more and more simulations are applications are being developed in systems consisted of distributed computing nodes with GPUs, cuPPA provide a viable on-the-fly realistic random networks to test the scalability and performance of the developed systems. Therefore, cuPPA can provide a consistent benchmark with the ability to generate networks with billions of vertices and edges.
- Designed and implemented distributed memory parallel algorithm for generating billions of vertices and edges
- Provided exhaustive theoretical and experimental analysis of the system
A novel GPU-based algorithm, named cuPPA, has been presented, with a detailed performance study, and its combination of its scale and speed has been tested by achieving the ability to generate networks with up to two billion edges in under three seconds of wall clock time. The algorithm is customizable with respect to the structure of the network by varying a single parameter, namely, a probability measure that captures the preference style of new edges in the preferential attachment model. Also, a high amount of concurrency in the generator's workload per thread or processor is observed when that probability is at very small fractions greater than zero.