Highlight

DSSE Researcher Leads Team Effort to Rearchitect ORNL DAAC Ingest System to IaC

Achievement

Chris Lindsley, Data System Sciences and Engineering (DSSE) research scientist and Data Architect at the Oak Ridge National Laboratory Distributed Active Archive Center for Biochemical Dynamics (ORNL DAAC), recently led a successful team effort to deploy a new data ingest system built using Infrastructure as Code (IaC). This system is replacing a nearly 10-year-old system based on Red Hat Enterprise Linux 6 (RHEL6) Operating System (OS). The IaC architecture provides system documentation within the code, uniformity between the Test and Production environments, and the ability to parse services into functional configurations.

Overview

The researchers were faced with a system that had grown organically over the years and included mismatched Test and Production environments; a migration of the Development environment to serve a Test role; multiple, unrelated services clustered together; and system documentation that only existed in the minds of the researchers.

Lindsley and the ORNL DAAC ingest development team used Puppet, an open-source software configuration management tool that works with the current OS, RHEL7. Puppet uses a combination of the Ruby and YAML languages as the basis for documenting the system infrastructure. Versioning of the infrastructure is maintained using the global information tracker (Git) version control system. When the in-code system documentation (i.e., the manifest) is updated in Git, it automatically deploys to the Test environment. When changes are approved, they are deployed to the Production environment. Puppet also deploys application code changes using the same approach. This ensures both the infrastructure and application code remain at the specified configuration. New system features include the following:

  • Identical Test and Production environments, yielding a “true Test environment.” New features can be added into Test with confidence that they will deploy properly in Production.
  • Ability to create multiple true Development environments with configurations matching Test and Production.
  • Logistical, functional grouping of services with configuration documentation.
  • In-code system documentation, including enforceable permissions. Users are required to be added within the system.
  • Quicker deployment of changes, i.e., upgrading an OS or creating a new server. An exact duplicate machine can be “stood up” in a matter of minutes, for instance, if service usage grows to require load balancing.

The ORNL DAAC is funded by the National Aeronautics and Space Administration (NASA) and primarily archives and distributes NASA–funded Terrestrial Ecology data. These architectural improvements help staff simplify the flow of data through the ingest system for distribution to the general public.

Last Updated: January 15, 2021 - 2:10 pm