Dr. Christian Engelmann is a Senior Scientist and the Intelligent Systems and Facilities Group Leader at Oak Ridge National Laboratory, which is the US Department of Energy’s (DOE) largest multiprogram science and technology laboratory with an annual budget of $2.2 billion. He has more than 20 years experience in software research and development for extreme-scale high-performance computing (HPC) systems with a strong funding and publication record. In collaboration with other laboratories and universities, Dr. Engelmann’s research solves computer science challenges in HPC software, such as scalability, dependability, energy efficiency, and portability.
His primary expertise is in HPC resilience, i.e., providing efficiency and correctness in the presence of faults, errors, and failures through avoidance, masking, and recovery. Dr. Engelmann is a leading expert in HPC resilience and was a member of the DOE Technical Council on HPC Resilience 2013-2015. He received the 2015 DOE Early Career Award for research in resilience design patterns for extreme scale HPC. His secondary expertise is in lightweight simulation of future-generation extreme-scale supercomputers with millions of processing units, studying the impact of hardware and software properties on the key HPC system design factors: performance, resilience, and power consumption.
Dr. Engelmann's ongoing research program targets computer science challenges in machine-in-the-loop operational intelligence (OI) for smart systems, instruments and facilities. Leveraging operational data analytics in a loop control, machine-in-the-loop OI maximizes productivity and minimizes costs through adaptive autonomous operation. Application areas in HPC are: optimizing (i) scientific application performance and productivity, (ii) system performance and productivity and (iii) system and center operational costs and productivity. Application areas in federated instruments, laboratories and facilities include: (1) autonomous operation of instruments and laboratories, (2) optimizing the orchestration and utilization of federated instruments, laboratories and facilities, and (3) autonomous operation of federated instruments, laboratories and facilities. Some of the computer science research challenges are: (i) identification and collection of relevant operational data, (ii) combining offline with online data analytics, learning and decision making using artificial intelligence, (iii) understanding and modeling the involved trade-offs for decision making, (iv) design of experiments, and (iiv) leveraging community software tools for reusability and maintainability.
Dr. Engelmann earned a Dipl.-Ing. (FH), a German engineering degree and M.Sc. equivalent, in Computer Systems Engineering from the University of Applied Sciences Berlin, Germany, in 2001, a M.Sc. in Computer Science from the University of Reading, UK, also in 2001 as a conjoint degree, and a Ph.D. in Computer Science from the University of Reading in 2008. He is a Senior Member of the Association for Computing Machinery (ACM) and the Institute of Electrical and Electronics Engineers (IEEE). He is also a Member of the Society for Industrial and Applied Mathematics (SIAM) and the Advanced Computing Systems Association (USENIX).
Last Updated: January 11, 2021 - 9:16 am