Research and Education Networks (REN) capacity planning and user requirements differ from those faced by commodity internet service providers for home users. One key difference is that scientific workflows can require the REN to move large, unscheduled, high-volume data transfers, or “bursts” of traffic. Experiments may be impossible to duplicate and even one underperforming network link can cause the entire data transfer to fail. Another set of challenges stem from the federated nature of scientific collaboration and networking. Because network performance standards cannot be centrally enforced, performance is obtained as a result of the entire REN community working together to identify best practices and resolve issues. For example:
- Data Transfer Nodes (DTN), which connect network endpoints to local data storage systems are owned by individual institutions, facilities, or labs. DTNs can be deployed with various equipment configurations, with local or networked storage configurations, and connected to internal networks in many different ways.
- Research institutions have diverse levels of resources and varied data transfer requirements; DTNs and local networks are maintained and operated based on these local considerations.
- Devising performance benchmarks for “how fast a data transfer should be” is difficult as capacity, flexibility, and general capabilities of networks linking scientists and resources constantly evolve and are not consistent across the entire research ecosystem.
ESnet has long been focused on developing ways to streamline workflows and reduce network operational burdens on the scientific programs, researchers, and others both those we directly serve and on behalf of the entire R&E network community. Building on the successful Science DMZ design pattern and the Petascale DTN project, the Data Mobility Exhibition (DME) was developed to improve the predictability of data movement between research sites and universities. Many sites use perfSONAR to test end-to-end network performance. The DME allows sites to take this a step farther and test end to end data transfer performance.
DME is a resource that enables the calibration of data transfer performance for a site’s DTNs to ensure that they are performing well by using ESnet’s own test environment, at scale. As part of the DME, system/storage administrators and network engineers have a wide variety of resources available to analyze data transfer performance against ESnet’s standard DTNs, obtain help from ESnet Science Engagement (or from universities, Engagement and Performance Operation Centers) to tune equipment, and to share performance data and network designs with the community to help others. For instance, a 10Gbps DTN should be capable of – at a minimum – transferring one Terabyte per hour. However, we would like to see DTNs > 10G or a cluster of 10G DTNs transfer at PetaScale rates of 6TB/hr or 1PB/week.
Currently, the DME has geographically dispersed benchmarking DTNs in three research locations:
- Cornell Center for Advanced Computing in Ithaca, NY, connected through NYSERnet
- NCAR GLADE in Boulder, CO, connected through Front Range Gigapop
- Petrel system at Argonne National Lab, connected through ESnet
Benchmarking DTNs are also deployed in two commercial cloud environments: Google Drive and Box. All five DME DTN can be used for both upload and download testing allowing users to calibrate and compare their network’s data transfer performance. Additional DTNs are being considered for future capacity. Next generation ESnet6 DTNs will be added in FY22-23, supporting this data transfer testing framework.
DME provides calibrated data sets ranging in size from 100MB to 5TB, so that performance of different sized transfers can be studied.
DOE scientists or infrastructure engineers can use the DME testing framework, built from the Petascale DTN model, with their peers to better understand the performance that institutions are achieving in practice. Here are examples of how past Petascale DTN data mobility efforts have helped large scientific data transfers:
- 768 TB of DESI data sent via ESnet, between OLCF and NERSC automatically via Globus over 20 hours. Despite the interruption of a maintenance activity at ORNL, the transfer was seamlessly reconnected without any user involvement.
- Radiation-damage-free high-resolution SARS-CoV-2 main protease SFX structures obtained at near-physiological-temperature offer invaluable information for immediate drug-repurposing studies for the treatment of COVID19. This Work required near-real-time collaboration and data movement between LCLS, NERSC via ESnet.
To date, over 100 DTN operators have used DME benchmarking resources to tune their own data transfer performance. In addition, the DME has been added to the NSF-funded Engagement and Performance Operations Center (EPOC) program’s six main scientific networking consulting support services, bringing this capability to a wide set of US Research Universities.
As the ESnet lead for this project, I invite you to contact me for more info (email@example.com). We also have information up on our knowledge-base website fasterdata.es.net. DME is an easy, effective way to ensure your network, data transfer, and storage resources are operating at peak efficiency!