Highlight

Advanced Health Information Technology Analytic Framework and Application to Hazard Detection

We use Apache Kafka, Spark Structured Streaming Engine, Delta Lake, and Power BI to meet the requirements of the advanced HIT-HD system
We use Apache Kafka, Spark Structured Streaming Engine, Delta Lake, and Power BI to meet the requirements of the advanced HIT-HD system.

Achievement

A team of researchers from Oak Ridge National Laboratory (ORNL) developed a HIT data and compute platform that supports multi-granularity real-time analytics from heterogeneous data sources. The work first identifies functional requirements and proposes a framework that satisfies the requirements using state-of-the-art big data technologies including Apache Kafka, Spark Structured Streaming Engine, and Delta Lake. To demonstrate its capability to support data analytics in multiple time granularities analytics, a statistical process control- based hazard detection algorithm has been implemented on top of the framework to detect unexpected hazards from order cancellation data of the Department of US Veterans Affairs (VA) in near real-time. Initial evaluation has shown a significant improvement in hazard detection in HIT.

Significance and Impact

Health Information Technology (HIT) aims to improve healthcare outcomes by organizing and analyzing various health-related data. With data accumulating at a staggering rate, the importance of real-time analytics has been increasing dramatically, shifting the focus of informatics from batch processing to streaming analytics. HIT is also facing unprecedented challenges in adapting to this new requirement and leveraging advanced IT technologies. This work introduces a HIT data and compute platform that supports multi-granularity real-time analytics from heterogeneous data sources. The work first identifies functional requirements and proposes a framework that satisfies the requirements using state-of-the-art big data technologies including Apache Kafka, Spark Structured Streaming Engine, and Delta Lake. To demonstrate its capability to support data analytics in multiple time granularities analytics, a statistical process control- based hazard detection algorithm has been implemented on top of the framework to detect unexpected hazards from order cancellation data of the Department of US Veterans Affairs (VA) in near real-time.

Research Details

  • Introduces a real-time multi-granular HIT streaming data analytic platform that is designed and implemented by identifying functional requirements and selecting technologies that are evaluated as best fit.
  • An end- to-end HIT hazard detection (HIT-HD) system developed at Oak Ridge National Laboratory for the Department of Veterans Affairs (VA) is used to demonstrate the analytic platform
  • Initial evaluation shows a significant execution time difference between sequential and parallel HIT- HD.

Citation and DOI:

Mohit Kumar, Sangkeun Lee, Byung Hoon Park, James Blum, Merry Ward, and Jonathan R. Nebeker. Advanced Health Information Technology Analytic Framework and Application to Hazard Detection. In Proceedings of the 2nd International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD) 2020, Atlanta, GA, USA, December 10, 2020.

Overview

Health Information Technology (HIT) aims to improve healthcare outcomes by organizing and analyzing various health-related data. With data accumulating at a staggering rate, the importance of real-time analytics has been increasing dramatically, shifting the focus of informatics from batch processing to streaming analytics. HIT is also facing unprecedented challenges in adapting to this new requirement and leveraging advanced IT technologies. This work introduces a HIT data and compute platform that supports multi-granularity real-time analytics from heterogeneous data sources. The work first identifies functional requirements and proposes a framework that satisfies the requirements using state-of-the-art big data technologies including Apache Kafka, Spark Structured Streaming Engine, and Delta Lake. To demonstrate its capability to support data analytics in multiple time granularities analytics, a statistical process control- based hazard detection algorithm has been implemented on top of the framework to detect unexpected hazards from order cancellation data of the Department of US Veterans Affairs (VA) in near real-time. Initial evaluation shows a significant execution time difference between sequential and parallel HIT- HD.

Last Updated: January 17, 2021 - 3:16 pm