Here are some plots of data transfers posted by Alex Sim of the Scientific Data Management Group at Lawrence Berkeley National Lab, which is headed by Arie Shoshani. Sim et al. have been deeply engaged with data movements for the Earth System Grid (ESG) and Climate100 projects.
The Australian National University/NERSC data transfer rates are impressive. As you observe from this graph, there is about a third of a terabyte of data flowing over the network in under 10 minutes. ESnet carries the data from the Bay Area to Pacific Wave in Seattle and then it continues across the Pacific to Australia on AARNET. There are three groups at ESnet providing assistance with various aspects of this effort.
Notice here, the combined plot runs off the graph. We like to see that sort of thing as it sets a good example of what we are all striving for—data transfer performance of scientific utility. How do you get it? It is a matter of figuring out how to use systems correctly and optimize infrastructure.
The Australians were savvy in figuring out their system before they launched huge data flows. Initial qualification of the network path between the BAMAN sites and ANU was done using the ESnet disk performance tester, lbl-diskpt1. Before the ESG nodes went live at NERSC, the Australians were testing against lbl-diskpt1 to qualify their network and storage system performance for long-distance transfers from the San Francisco Bay Area. So, by the time the NERSC ESG Gateway and Data Node came up, they knew the data transfer infrastructure was relatively clean.
Let’s just say that our test and measurement infrastructure is continuing to show its value…
The data flow from Livermore to NERSC is pretty impressive as well. Recent data movement from the British Atmospheric Data Centre (BADC), UK to LBNL/NERSC also achieved another milestone in ESG and Climate100 projects.
These data replications were managed by Bulk Data Mover (BDM), a scalable data transfer management tool, developed by the SDM group at LBNL under the ESG project. BDM manages efficient data transfers with optimized transfer queue and concurrent management algorithms. GridFTP is used as the main underlying transfer protocol.