Berkeley Lab and ESnet Document Flow, Performance of 56 Terabytes Climate Data Transfer

Visualization by Prabhat (Berkeley Lab).
The simulated storms seen in this visualization are generated from the finite volume version of NCAR’s Community Atmosphere Model. Visualization by Prabhat (Berkeley Lab).

In a recent paper entitled “An Assessment of Data Transfer Performance for Large‐Scale Climate Data Analysis and Recommendations for the Data Infrastructure for CMIP6,” experts from Lawrence Berkeley National Laboratory (Berkeley Lab) and ESnet (the Energy Sciences Network, document the data transfer workflow, data performance, and other aspects of transferring approximately 56 terabytes of climate model output data for further analysis.

The data, required for tracking and characterizing extratropical storms, needed to be moved from the distributed Coupled Model Intercomparison Project (CMIP5) archive to the National Energy Research Supercomputing Center (NERSC) at Berkeley Lab.

The authors found that there is significant room for improvement in the data transfer capabilities currently in place for CMIP5, both in terms of workflow mechanics and in data transfer performance. In particular, the paper notes that performance improvements of at least an order of magnitude are within technical reach using current best practices.

To illustrate this, the authors used Globus to transfer the same raw data set between NERSC and Argonne Leadership Computing Facility (ALCF) at Argonne National Lab.

Read the Globus story: https://www.globus.org/user-story-lbl-and-esnet
Read the paper: https://arxiv.org/abs/1709.09575