We show that our data compression technique achieves both storage reduction and performance improvement in a distributed embedded key-value store by exploiting deep memory hierarchy in an HPC system. Also, our data encryption technique enables HPC users to securely share their sensitive data with others with nearly imperceptible cost. We evaluate the benefits and costs of incorporating these capabilities along different points in the dataflow path. Our experimental results on CSCS's Grand Tave and NERSC's Cori show that there are distinct advantages to integrating data compression and encryption in persistent KVSs for HPC. This paper has been published in the International Journal of High Performance Computing Applications (IJHPCA).
Significance and Impact
This is the first work to (1) integrate data compression/encryption into the distributed embedded key-value store, (2) compress in-use data in key-value stores, (3) introduce a new data structure exploiting high bandwidth memory in key-value stores, (4) enable users to configure the compression/encryption algorithms in the different dataflow paths, and (5) show how deep memory hierarchies in modern HPC systems affect performance in data compression and encryption.
- We evaluate the benefits and costs of several designs for implementing data compression and encryption capabilities along different points in the DEKVS dataflow path, illustrating differences in effective bandwidth, latency, and additional computational expense.
- We present a data compression implementation that exploits deep memory hierarchy in HPC systems to achieve both storage reduction and performance improvement.
- We propose a data encryption technique that adds an extra security layer of data management in scientific workflows.
- We integrate our data compression and encryption techniques into PapyrusKV and empirically evaluate these techniques on two HPC systems (CSCS's Grand Tave and NERSC's Cori).
Recently, persistent data structures, like key-value stores (KVSs), which are stored in an HPC system's nonvolatile memory, provide an attractive solution for a number of emerging challenges like limited I/O performance. Data compression and encryption are two well-known techniques for improving several properties of such data-oriented systems. This paper investigates how to efficiently integrate data compression and encryption into persistent KVSs for HPC with the ultimate goal of hiding their costs and complexity in terms of performance and ease of use. Our compression technique exploits deep memory hierarchy in an HPC system to achieve both storage reduction and performance improvement. Our encryption technique provides a practical level of security and enables sharing of sensitive data securely in complex scientific workflows with nearly imperceptible cost. We implement the proposed techniques on top of a distributed embedded KVS to evaluate the benefits and costs of incorporating these capabilities along different points in the dataflow path, illustrating differences in effective bandwidth, latency, and additional computational expense on CSCS's Grand Tave and NERSC's Cori.