Highlight

Querying Large Scientific Data Sets with Adaptable IO System ADIOS

Query processing time with an S3D data set
Fig. 1. Query processing time with an S3D data set (total number of records is 1.67×109, organized into a 3D array of 1100×1080×1408, 14GB total size). Both minmax and Fastbit+Read can be orders of magnitudes faster if the data of interest is a fraction of the total data size.

Achievement

We designed a query interface for ADIOS to allow arbitrary combinations of range conditions on known variables, and implemented a number of different mechanisms for resolving these selection conditions, and devised strategies to reduce the time needed to retrieve the scattered data records. In many cases, the query mechanism can retrieve the selected data records orders of magnitude faster than a brute-force approach. Our work relies heavily on the in situ data processing feature of ADIOS that allows user functions to be executed in its data transport pipeline. This feature allowed us to build indexes for efficient query processing, and to perform other intricate analyses while the data is in memory.

The paper received the Outstanding Technical Paper Award at the Asian Conference on Supercomputing Frontiers (SFCA 2018).

Research Details

  • We designed the ADIOS Query interface that allows users to arbitrary search for data of interest with combinations of range conditions on known variables in their data sets.
  • We implemented multiple query mechanisms in ADIOS, one using FastBit indexing for pinpointing the exact data cells that satisfy the query conditions, and one using min/max values recorded on blocks of data to quickly find blocks that have data of interest.
  • We have tested the query mechanisms on scientific data sets from accelerator particle data (from IMPACT) and combustion data (from S3D) and examined the trade-off of each mechanism.
  • We have shown that using the query interface can dramatically speed up the retrieval of the data of interest from the large output data sets of scientific applications.

Overview

In this paper, we design a query interface for ADIOS to allow arbitrary combinations of range conditions on known variables, implement a number of different mechanisms for resolving these selection conditions, and devise strategies to reduce the time needed to retrieve the scattered data records. In many cases, the query mechanism can retrieve the selected data records orders of magnitude faster than the brute-force approach. Our work relies heavily on the in situ data processing feature of ADIOS to allow user functions to be executed in the data transport pipeline. This feature allows us to build indexes for efficient query processing, and to perform other intricate analyses while the data is in memory.