ESnet’s Data Mobility Exhibition: Moving to petascale with the research community

Research and Education Networks (REN) capacity planning and user requirements differ from those faced by commodity internet service providers for home users. One key difference is that scientific workflows can require the REN to move large, unscheduled, high-volume data transfers, or “bursts” of traffic. Experiments may be impossible to duplicate and even one underperforming network link can cause the entire data transfer to fail.  Another set of challenges stem from the federated nature of scientific collaboration and networking. Because network performance standards cannot be centrally enforced, performance is obtained as a result of the entire REN community working together to identify best practices and resolve issues.  For example:

  • Data Transfer Nodes (DTN), which connect network endpoints to local data storage systems are owned by individual institutions, facilities, or labs. DTNs can be deployed with various equipment configurations, with local or networked storage configurations, and connected to internal networks in many different ways. 
  • Research institutions have diverse levels of resources and varied data transfer requirements; DTNs and local networks are maintained and operated based on these local considerations.
  • Devising performance benchmarks for “how fast a data transfer should be” is difficult as capacity, flexibility, and general capabilities of networks linking scientists and resources constantly evolve and are not consistent across the entire research ecosystem.

ESnet has long been focused on developing ways to streamline workflows and reduce network operational burdens on the scientific programs, researchers, and others both those we directly serve and on behalf of the entire R&E network community.  Building on the successful Science DMZ design pattern and the Petascale DTN project, the Data Mobility Exhibition (DME) was developed to improve the predictability of data movement between research sites and universities. Many sites use perfSONAR to test end-to-end network performance. The DME allows sites to take this a step farther and test end to end data transfer performance.

DME is a resource that enables the calibration of data transfer performance for a site’s DTNs to ensure that they are performing well by using ESnet’s own test environment, at scale. As part of the DME, system/storage administrators and network engineers have a wide variety of resources available to analyze data transfer performance against ESnet’s standard DTNs, obtain help from ESnet Science Engagement (or from universities, Engagement and Performance Operation Centers) to tune equipment, and to share performance data and network designs with the community to help others.  For instance, a 10Gbps DTN should be capable of – at a minimum – transferring one Terabyte per hour. However, we would like to see DTNs > 10G or a cluster of 10G DTNs transfer at PetaScale rates of 6TB/hr or 1PB/week.

Currently, the DME has geographically dispersed benchmarking DTNs in three research locations:

  • Cornell Center for Advanced Computing in Ithaca, NY, connected through NYSERnet
  • NCAR GLADE in Boulder, CO, connected through Front Range Gigapop
  • Petrel system at Argonne National Lab, connected through ESnet

Benchmarking DTNs are also deployed in two commercial cloud environments: Google Drive and Box.  All five DME DTN can be used for both upload and download testing allowing users to calibrate and compare their network’s data transfer performance. Additional DTNs are being considered for future capacity. Next generation ESnet6 DTNs will be added in FY22-23, supporting this data transfer testing framework.

DME provides calibrated data sets ranging in size from 100MB to 5TB, so that performance of different sized transfers can be studied. 

DOE scientists or infrastructure engineers can use the DME testing framework, built from the Petascale DTN model, with their peers to better understand the performance that institutions are achieving in practice. Here are examples of how past Petascale DTN data mobility efforts have helped large scientific data transfers:

  1. 768 TB of DESI data sent via ESnet, between OLCF and NERSC automatically via Globus over 20 hours. Despite the interruption of a maintenance activity at ORNL, the transfer was seamlessly reconnected without any user involvement.
  2. Radiation-damage-free high-resolution SARS-CoV-2 main protease SFX structures obtained at near-physiological-temperature offer invaluable information for immediate drug-repurposing studies for the treatment of COVID19. This Work required near-real-time collaboration and data movement between LCLS, NERSC via ESnet.

To date, over 100 DTN operators have used DME benchmarking resources to tune their own data transfer performance. In addition, the DME has been added to the NSF-funded Engagement and Performance Operations Center (EPOC) program’s six main scientific networking consulting support services, bringing this capability to a wide set of US Research Universities. 

As the ESnet lead for this project, I invite you to contact me for more info (consult@es.net). We also have information up on our knowledge-base website fasterdata.es.net. DME is an easy, effective way to ensure your network, data transfer, and storage resources are operating at peak efficiency! 

Creating the Tokamak Superfacility: Fusion with the ScienceDMZ

5.5 Questions with Eli Dart (ESnet), C.S. Chang, and Michael Churchill (PPPL)

In 2025, when the International Thermonuclear Experimental Reactor (ITER) generates “first plasma”, it will be the culmination of almost 40 years of effort.  First started in 1985, the project has grown to include the scientific talents of seven members (China, EU, India, Japan, Korea, Russia, and the US, with EU membership bringing the total to 35 countries) and if successful, will mark the first time that a large scale fusion reactor generates more thermal power than is used to heat isotopes of hydrogen gas to a plasma state.

ESnet is supporting this international scientific community as this dream of limitless, clean energy is pursued. When operational at full capacity, ITER will generate approximately a petabyte-per-day of data, much of which will need to be analyzed and fed back in near real-time to optimize the fusion reaction and manage distribution of data to a federated framework of geographically distributed “remote control rooms” or RCR.  To prepare for this demanding ability to distribute both data and analytics, recently ESnet’s Eli Dart and the Princeton Plasma Physics Laboratory’s (PPPL) Michael Churchill and  C.S. Chang were co-authors on a test exercise performed with collaborators at Pacific Northwest National Laboratory (PNNL), PPPL, Oak Ridge National Laboratory (ORNL), and with the Korean KREONET, KSTAR, National Fusion Research Institute, and the Ulsan National Institute of Science and Technology. This study (https://doi.org/10.1080/15361055.2020.1851073) successfully demonstrated the use of ESnet and the ScienceDMZ architecture as part of trans-Pacific large data transfer, and near real-time movie creation and analysis of the KSTAR electron cyclotron emission images, via links between multiple paths at high sustained speeds.


Q 1: This was a complex test, involving several sites and analytic workflows.  Can you walk our readers through the end-to-end workflow? 

End-to-end workflow of the demonstration comparing real-time streaming data from the KSTAR ECEI diagnostic to side-by-side movie from XGC1 gyrokinetic turbulence code.
End-to-end workflow of the demonstration comparing real-time streaming data from the KSTAR ECEI diagnostic to side-by-side movie from XGC1 gyrokinetic turbulence code.

Eli Dart: The data were streamed from a system at KSTAR, encoded into ADIOS format, streamed to PPPL, rendered into movie frames, and visualized at PPPL. One of the key attributes of this workflow is that it is a streaming workflow. Specifically, this means that the data passes through the workflow steps (encoding in ADIOS format, transfer, rendering movie frames, showing the movie) without being written to non-volatile storage. This allows for performance improvements, because no time is spent on storage I/O. It also removes the restriction of storage allocations from the operation of the workflow – only the final data products need to be stored (if desired). 

Q 2: A big portion of this research supports the idea of federated, near real-time analysis of data.  In order to make these data transfers performant, flexible, and adaptable enough to meet the requirements for a future ITER RCR, you had to carefully engineer and coordinate with many parties.  What was the hardest part of this experiment, and what lessons does it offer ITER?

Eli Dart: It is really important to ensure that the network path is clean. By “clean” I mean that the network needs to provide loss-free IP service for the experiment traffic. Because the fusion research community is globally distributed, the data transfers cover long distances, which greatly magnifies the negative impact of packet loss on transfer performance. Test and measurement (using perfSONAR) is very important to ensure that the network is clean, as is operational excellence to ensure that problems are fixed quickly if they arise. KREONET is an example of a well-run production network – their operational excellence contributed significantly to the success of this effort.

Q 3: One of the issues you had to work around was a firewall at one institution.  What was involved in working with their site security, and how should those working with Science DMZ work through these issues?

Eli Dart: Building and operating a Science DMZ involves a combination of technical and organizational work. Different institutions have different policies, and the need for different levels of assurance depending on the nature of the work being done on the Science DMZ. The key is to understand that security policy is there for a reason, and to work with the parties involved in the context that makes sense from their perspective. Then, it’s just a matter of working together to find a workable solution that preserves safety from a cybersecurity perspective and also allows the science mission to succeed. 

Q 4: How did you build this collaboration and how did you keep everyone on the same page, any advice you can offer other experiments facing the same need to coordinate multi-national efforts?

Eli Dart: From my perspective, this result demonstrates the value of multi-institution, multi-disciplinary collaborations for achieving important scientific outcomes. Modern science is complex, and we are increasingly in a place where only teams can bring all the necessary expertise to bear on a complex problem. The members of this team have worked together in smaller groups on a variety of projects over the years – those relationships were very valuable in achieving this result.

Q 5: In the paper you present a model for a federated remote framework workflow. Looking beyond ITER, are there other applications you can see for the lessons learned from this experiment?

C.S. Chang: Lessons learned from this experiment can be applied to many other distributed scientific, industrial, and commercial applications which require collaborative data analysis and decision making.  We do not need to look too far.  Expensive scientific studies on exascale computers will most likely be collaborative efforts among geographically distributed scientists who want to analyze the simulation data and share/combine the findings in near-real-time for speedy scientific discovery and for steering of ongoing or next simulations.  The lessons learned here can influence the remote collaboration workflow used in high energy physics, climate science, space physics, and others.

Q 5.5: What’s next? You mention quite a number of possible follow on activities in the paper? Which of these most interest you, and what might follow?

Michael Churchill: Continued work by this group has led to the recently developed  open-source Python framework, DELTA, for streaming data from experiments to remote compute centers, using ADIOS for streaming over wide-area networks, and on the receiver side using asynchronous Message Passing Interface to do parallel processing of the data streams. We’ve used this for streaming data from KSTAR to the NERSC Cori supercomputer and completing a spectral analysis in parallel in less than 10 minutes, which normally in serial would take 12 hours. Frameworks such as this, enabling connecting experiments to remote high-performance computers, will open up the quality and quantity of analysis workflows that experimental scientists can run. It’s exciting to see how this can help accelerate the progress of science around the world.

Congratulations on your success! This is a significant step forward in building the data management capability that ITER will need.  

40G Data Transfer Node (DTN) now Available for User Testing!

ESnet’s first 40 Gb/s public data transfer node (DTN) has been deployed and is now available for community testing. This new DTN is the first of a new generation of publicly available networking test units, provided by ESnet to the global research and engineering network community as part of promoting high-speed scientific data mobility. This 40G DTN will provide four times the speed of previous-generation DTN test units, as well as the opportunity to test a variety of network transfer tools and calibrated data sets.

The 40G DTN server, located at ESnet’s El Paso location, is based on an updated reference implementation of our Science DMZ architecture. This new DTN (and others that will soon follow in other locations) will allow our collaborators throughout the global research and engineering network community to test high speed, large, demanding data transfers as part of improving their own network performance. The deployment provides a resource enabling the global science community to reach levels of data networking performance first demonstrated in 2017 as part of the ESnet Petascale DTN project

The El Paso 40G DTN has Globus installed for gridFTP and parallel file transfer testing. Additional data transfer applications may be installed in the future. To facilitate user evaluation of their own network capabilities ESnet Data Mobility Exhibition (DME), test data sets will be loaded on this new 40G DTN shortly. 

All ESnet DTN public servers can be found at https://app.globus.org/file-manager. ESnet will continue to support existing 10G DTNs located at Sunnyvale, Starlight, New York, and CERN. 

ESnet's 40G DTN Reference Architecture Block Diagram
ESnet’s 40G DTN Reference Architecture Block Diagram

The full 40G DTN Reference architecture and more information on the design of these new DTN can be found here:

A second 40G DTN will be available in the next few weeks, and will be deployed in Boston. It will feature Google’s bottleneck bandwidth and round-trip propagation time (BBR2) software, allowing improved round-trip-time measurement and the ability for users to explore BBR2 enhancements to standard TCP congestion control algorithms.

In an upcoming blog post, I will describe the Boston/BBR2-enabled 40G DTN and perfSONAR servers. In the meantime, ESnet and the deployment team hope that the new El Paso DTN will be of great use to the global research community!  

Into the Medical Science DMZ

iStock-629606180
Speeding research. The Medical Science DMZ expedites data transfers for scientists working on large-scale research such as biomedicine and genomics while maintaining federally-required patient privacy.

In a new paperLawrence Berkeley National Laboratory (Berkeley Lab) computer scientist Sean Peisert and Energy Sciences Network (ESnet) researcher Eli Dart and their collaborators outline a “design pattern” for deploying specialized research networks and ancillary computing equipment for HIPAA-protected biomedical data that provides high-throughput network data transfers and high-security protections.

“The original Science DMZ model provided a way of securing high-throughput data transfer applications without the use of enterprise firewalls,” says Dart. “You can protect data transfers using technical controls that don’t impose performance limitations.”

Read More at Science Node: https://sciencenode.org/feature/into-the-science-dmz.php 

Sean-and-Eli
Left: Eli Dart, ESnet Engineer | Right:  Sean Peisert, Berkeley Lab Computer Scientist

CENIC Honors Astrophysics Link to NERSC via ESnet

unnamed-8
A star-forming region of the Large Magellanic Cloud (Credit: European Space Agency via the Hubble Telescope)

An astrophysics project connecting UC Santa Cruz’s Hyades supercomputer cluster to NERSC via ESnet and other networks won the CENIC 2018 Innovations in Networking Award for Research Applications announced last week.

Through a consortium of Science DMZs and links to NERSC via CENIC’s CalREN and the DOE’s ESnet, the connection enables UCSC to carry out the high-speed transfer of large data sets produced at NERSC, which supports the Dark Energy Spectroscopic Instrument (DESI) and AGORA galaxy simulations, at speeds up to five times previous rates. These speeds have the potential to be increased by 20 times the previous rates in 2018. Peter Nugent, an astronomer and cosmologist from the Computational Research Division, was pivotal in the effort. Read UC Santa Cruz’s press release.

Science DMZ is Focus of Latest Library of Network Training Videos

ESnet, Network Startup Resource Center Combine Expertise to Spread the Word

sciencedmz-capture
The Network Startup Resource Center and ESnet have created a Science DMZ video library.

For members of the established research and education (R&E) networking community, attending conferences or sitting in on workshop sessions is the normal way to learn about the latest equipment, architecture, tools and technologies.

But for network engineers striving to establish basic R&E infrastructure where bandwidth and other resources are scarce, the University of Oregon’s Network Startup Resource Center (NSRC) is often the primary information conduit. NSRC staff travel to emerging nations in Africa, Asia-Pacific, Middle East and South America where they hold intensive hands-on training courses combined with direct engineering assistance to bring institutions up to speed.

And for the second time in a year, ESnet and the NSRC have produced and released a library of short explanatory videos to help network engineers around the world gain basic knowledge, set up basic systems and drill down into areas of specific interest. In December, 15 videos detailing the Science DMZ network architecture were posted, covering the background and structure, specific designs, and techniques and technology.

The Science DMZ video library complements the 29-video perfSONAR library released in July 2016.

“The goal is to make the information more accessible to networking staff, in the U.S. and particularly in emerging economic areas where institutions are trying to bootstrap a research network,” said ESnet Network Engineer Eli Dart, who developed the Science DMZ concept with Brent Draney of the National Energy Research Scientific Computing Center (NERSC). Both ESnet and NERSC are DOE Office of Science User Facilities managed by Lawrence Berkeley National Laboratory.

Read the full story.

How the World’s Fastest Science Network Was Built

Created in 1986, the U.S. Department of Energy’s (DOE’s) Energy Sciences Network (ESnet) is a high-performance network built to support unclassified science research. ESnet connects more than 40 DOE research sites—including the entire National Laboratory system, supercomputing facilities and major scientific instruments—as well as hundreds of other science networks around the world and the Internet.

Funded by DOE’s Office of Science and managed by the Lawrence Berkeley National Laboratory (Berkeley Lab), ESnet moves about 51  petabytes of scientific data every month. This is a 13-step guide about how ESnet has evolved over 30 years.

Step 1: When fusion energy scientists inherit a cast-off supercomputer, add 4 dialup modems so the people at the Princeton lab can log in. (1975)

Online3

Step 2: When landlines prove too unreliable, upgrade to satellites! Data screams through space. (1981)

18ogxd

Step 3: Whose network is best? High Energy Physics (HEPnet)? Fusion Physics (MFEnet)?  Why argue? Merge them into one-Energy Sciences Network (ESnet)-run by the Department of Energy!  Go ESnet! (1986)

ESnetListicle

Step 4: Make it even faster with DUAL Satellite links! We’re talking 56 kilobits per second! Except for the Princeton fusion scientists – they get 112 Kbps! (1987)

Satellite

Step 5:  Whoa, when an upgrade to 1.5 MEGAbits per second isn’t enough, add ATM (not the money machine, but Asynchronous Transfer Mode) to get more bang for your buck. (1995)

18qlbh

Step 6: Duty now for the future—roll out the very first IPv6 address to ensure there will be enough Internet addresses for decades to come. (2000)

18s8om

Step 7: Crank up the fastest links in the network to 10 GIGAbits per second—16 times faster than the old gear—a two-generation leap in network upgrades at one time. (2003)

18qlnc

Step 8: Work with other networks to develop really cool tools, like the perfSONAR toolkit for measuring and improving end-to-end network performance and OSCARS (On-Demand Secure Circuit and Advance Reservation), so you can reserve a high-speed, end-to-end connection to make sure your data is delivered on time. (2006)

18qn9e

Step 9: Why just rent fiber? Pick up your own dark fiber network at a bargain price for future expansion. In the meantime, boost your bandwidth to 100G for everyone. (2012)

18on55

Step 10: Here’s a cool idea, come up with a new network design so that scientists moving REALLY BIG DATASETS can safely avoid institutional firewalls, call it the Science DMZ, and get research moving faster at universities around the country. (2012)

18onw4

18oo6c

Step 11: We’re all in this science thing together, so let’s build faster ties to Europe. ESnet adds three 100G lines (and a backup 40G link) to connect researchers in the U.S. and Europe. (2014)

18qnu6

Step 12: 100G is fast, but it’s time to get ready for 400G. To pave the way, ESnet installs a production 400G network between facilities in Berkeley and Oakland, Calif., and even provides a 400G testbed so network engineers can get up to speed on the technology. (2015)

18oogv

Step 13: Celebrate 30 years as a research and education network leader, but keep looking forward to the next level. (2016)

ESnetFireworks

BioTeam and ESnet Partner on Science DMZ Webinar

BioTeam and ESnet are partnering to offer a webinar on the Science DMZ architectural paradigm.  While streamlining a network design to facilitate “friction free” research paths, the Science DMZ has been widely adopted by the research and education (R&E) community and is being implemented at many locations around the world.  Using this approach, the task of data mobility becomes less of a mystery, and more of a routine part of scientific networks.  

This event will occur on Monday, May 18th, between 2pm and 4pm EDT and is open to the general public.  We would like to encourage network operators and researchers (including, but not limited to, life science researchers) to attend this no-cost event.  For complete information on registration and logistical details, visit: http://bioteam.net/2015/04/science-dmz-101/. Registration will close when the number of registration slots has been exhausted.

BioTeam is a high-performance consulting practice. They are dedicated to delivering objective, technology agnostic solutions to life science researchers by leveraging technologies customized for scientific objectives.

The Energy Sciences Network (ESnet) is a high-performance, unclassified network built to support scientific research. ESnet provides services to more than 40 DOE research sites, and peers with over 140 research and commercial networks.

Berkeley Lab Staff to Present Super-facility Science Model at Internet2 Conference

Berkeley Lab staff from five divisions will share their expertise in a panel discussion on “Creating Super-facilities: a Coupled Facility Model for Data-Intensive Science at the Internet2 Global Summit to be held April 26-30 in Washington, D.C. The panel was organized by Lauren Rotman of ESnet and includes Alexander Hexemer of the Advanced Light Source (ALS), Craig Tull of CRD, David Skinner of NERSC and Rune Stromsness of the IT Division.

The session will highlight the concept of a coupled science facility or “super-facility,” a new model that links together experimental facilities like the ALS with computing facilities like NERSC via a Science DMZ architecture and advanced workflow and analysis software, such as SPOT Suite developed by Tull’s group. The session will share best practices, lessons learned and future plans to expand this effort.

Also at the conference, ESnet’s Brian Tierney will speak in a session oh “perfSONAR: Meeting the Community’s Needs.” Co-developed by ESnet, perfSONAR is a tool for end-to-end monitoring and troubleshooting of multi-domain network performance. The session will give an overview of the perfSONAR project, including an overview of the 3.4 release, a preview of the 3.5 release, an overview of the product plan, and an overview of perfSONAR training plan.

ESnet’s Tierney, Zurawski to Present at Workshop on perfSONAR Best Practices

ESnet’s Brian Tierney and Jason Zurawski will be the featured speakers at a workshop on “perfSONAR Deployment Best Practices, Architecture, and Moving the Needle.” The Jan. 21-22 workshop, one in a series of Focused Technical Workshops organized by ESnet and Internet2, will be held at the Ohio Supercomputer Center in Columbus. Read more (http://es.net/news-and-publications/esnet-news/2015/esnet-s-tierney-zurawski-to-present-at-workshop-on-perfsonar-best-practices/)

A joint effort between ESnet, Internet2, Indiana University, and GEANT, the pan-European research network, perfSONAR is a tool for end-to-end monitoring and troubleshooting of multi-domain network performance. In January 2014, perfSONAR reached a milestone with 1,000 instances of the diagnostic software installed on networking hosts around the U.S. and in 13 other countries. perfSONAR provides network engineers with the ability to test and measure network performance, as well as to archive data in order to pinpoint and solve service problems that may span multiple networks and international boundaries.

At the workshop, Tierney will give an introduction to perfSONAR and present a session on debugging using the software. Zurawski will talk about maintaining a perfSONAR node, describe some user case studies and success stories, discuss “Pulling it All Together – perfSONAR as a Regional Asset” and conclude with “perfSONAR at 10 Years: Cleaning Networks & Disrupting Operation.”

ESnet's Jason Zurawski and Brian Tierney
ESnet’s Jason Zurawski and Brian Tierney