### Oak Ridge, TN

#### August 01, 2018

**Date/Time: **August 01, 2018, 2:00 – 3:00 p.m.

**Location: **Building 4100, Room J302

**Abstract:**

Maximum likelihood estimation is an important statistical technique for estimating missing data, for example in climate and environmental applications, which are usually large and feature data points that are irregularly spaced. In particular, the Gaussian log-likelihood function is the de facto model, which operates on the resulting sizable dense covariance matrix. The advent of high-performance systems with advanced computing power and memory capacity have enabled full simulations only for rather small dimensional climate problems, solved at the machine precision accuracy. The challenge for high dimensional problems lies in the computation requirements of the log-likelihood function, which necessitates O(n^2) storage and O(n^3) operations, where n represents the number of given spatial locations. This prohibitive computational cost may be reduced by using approximation techniques that not only enable large-scale simulations otherwise intractable but also maintain the accuracy and the fidelity of the spatial statistics model.

We present ExaGeoStat, a high performance software for geospatial statistics in climate and environment modeling using full-tile (i.e., exact) computation on manycore architectures. We also extend this software to support the Tile Low-Rank (TLR) approximation, which exploits the data sparsity of the dense covariance matrix by compressing the off-diagonal tiles up to a user-defined accuracy threshold. The underlying linear algebra operations may then be carried out on this data compression format, which may ultimately reduce the arithmetic complexity of the maximum likelihood estimation and the corresponding memory footprint. We analyze the performance of the proposed software across most recent Intel architectures. We achieve up to 4.48X speedup using TLR approximation compared to a full-tile solution on Intel 56-core salable Skylake chip, 9.16X speedup on Intel 28-core Broadwell chip, 6.21X speedup on Intel 36-Haswell chip, and 13X speedup on Knights Landing (KNL) chip. With a distributed system, performance results of TLR-based computations on Shaheen-II attain up to 5X speedup, compared to full accuracy simulations using synthetic and real datasets (up to 2M), while ensuring adequate prediction accuracy.

**Biography:**

Sameh Abdulah is a Postdoctoral Fellow at the Extreme Computing Research Center, King Abdullah University of Science and Technology, Saudi Arabia. Sameh received his MS and PhD degrees from The Ohio State University, Columbus, in 2014 and 2016, respectively. His work is centered around High Performance Computing (HPC) applications, bitmap indexing in big data, large spatial datasets, parallel statistical applications, algorithm-based fault tolerance, and Machine Learning and Data Mining algorithms.

Host: Scott Klasky, 865-241-9980, klasky@ornl.gov