Creating the Tokamak Superfacility: Fusion with the ScienceDMZ

2021-05-04 ~ Andrew Wiedlea

5.5 Questions with Eli Dart (ESnet), C.S. Chang, and Michael Churchill (PPPL)

In 2025, when the International Thermonuclear Experimental Reactor (ITER) generates “first plasma”, it will be the culmination of almost 40 years of effort. First started in 1985, the project has grown to include the scientific talents of seven members (China, EU, India, Japan, Korea, Russia, and the US, with EU membership bringing the total to 35 countries) and if successful, will mark the first time that a large scale fusion reactor generates more thermal power than is used to heat isotopes of hydrogen gas to a plasma state.

ESnet is supporting this international scientific community as this dream of limitless, clean energy is pursued. When operational at full capacity, ITER will generate approximately a petabyte-per-day of data, much of which will need to be analyzed and fed back in near real-time to optimize the fusion reaction and manage distribution of data to a federated framework of geographically distributed “remote control rooms” or RCR. To prepare for this demanding ability to distribute both data and analytics, recently ESnet’s Eli Dart and the Princeton Plasma Physics Laboratory’s (PPPL) Michael Churchill and C.S. Chang were co-authors on a test exercise performed with collaborators at Pacific Northwest National Laboratory (PNNL), PPPL, Oak Ridge National Laboratory (ORNL), and with the Korean KREONET, KSTAR, National Fusion Research Institute, and the Ulsan National Institute of Science and Technology. This study (https://doi.org/10.1080/15361055.2020.1851073) successfully demonstrated the use of ESnet and the ScienceDMZ architecture as part of trans-Pacific large data transfer, and near real-time movie creation and analysis of the KSTAR electron cyclotron emission images, via links between multiple paths at high sustained speeds.

Q 1: This was a complex test, involving several sites and analytic workflows. Can you walk our readers through the end-to-end workflow?

End-to-end workflow of the demonstration comparing real-time streaming data from the KSTAR ECEI diagnostic to side-by-side movie from XGC1 gyrokinetic turbulence code.

Eli Dart: The data were streamed from a system at KSTAR, encoded into ADIOS format, streamed to PPPL, rendered into movie frames, and visualized at PPPL. One of the key attributes of this workflow is that it is a streaming workflow. Specifically, this means that the data passes through the workflow steps (encoding in ADIOS format, transfer, rendering movie frames, showing the movie) without being written to non-volatile storage. This allows for performance improvements, because no time is spent on storage I/O. It also removes the restriction of storage allocations from the operation of the workflow – only the final data products need to be stored (if desired).

Q 2: A big portion of this research supports the idea of federated, near real-time analysis of data. In order to make these data transfers performant, flexible, and adaptable enough to meet the requirements for a future ITER RCR, you had to carefully engineer and coordinate with many parties. What was the hardest part of this experiment, and what lessons does it offer ITER?

Eli Dart: It is really important to ensure that the network path is clean. By “clean” I mean that the network needs to provide loss-free IP service for the experiment traffic. Because the fusion research community is globally distributed, the data transfers cover long distances, which greatly magnifies the negative impact of packet loss on transfer performance. Test and measurement (using perfSONAR) is very important to ensure that the network is clean, as is operational excellence to ensure that problems are fixed quickly if they arise. KREONET is an example of a well-run production network – their operational excellence contributed significantly to the success of this effort.

Q 3: One of the issues you had to work around was a firewall at one institution. What was involved in working with their site security, and how should those working with Science DMZ work through these issues?

Eli Dart: Building and operating a Science DMZ involves a combination of technical and organizational work. Different institutions have different policies, and the need for different levels of assurance depending on the nature of the work being done on the Science DMZ. The key is to understand that security policy is there for a reason, and to work with the parties involved in the context that makes sense from their perspective. Then, it’s just a matter of working together to find a workable solution that preserves safety from a cybersecurity perspective and also allows the science mission to succeed.

Q 4: How did you build this collaboration and how did you keep everyone on the same page, any advice you can offer other experiments facing the same need to coordinate multi-national efforts?

Eli Dart: From my perspective, this result demonstrates the value of multi-institution, multi-disciplinary collaborations for achieving important scientific outcomes. Modern science is complex, and we are increasingly in a place where only teams can bring all the necessary expertise to bear on a complex problem. The members of this team have worked together in smaller groups on a variety of projects over the years – those relationships were very valuable in achieving this result.

Q 5: In the paper you present a model for a federated remote framework workflow. Looking beyond ITER, are there other applications you can see for the lessons learned from this experiment?

C.S. Chang: Lessons learned from this experiment can be applied to many other distributed scientific, industrial, and commercial applications which require collaborative data analysis and decision making. We do not need to look too far. Expensive scientific studies on exascale computers will most likely be collaborative efforts among geographically distributed scientists who want to analyze the simulation data and share/combine the findings in near-real-time for speedy scientific discovery and for steering of ongoing or next simulations. The lessons learned here can influence the remote collaboration workflow used in high energy physics, climate science, space physics, and others.

Q 5.5: What’s next? You mention quite a number of possible follow on activities in the paper? Which of these most interest you, and what might follow?

Michael Churchill: Continued work by this group has led to the recently developed open-source Python framework, DELTA, for streaming data from experiments to remote compute centers, using ADIOS for streaming over wide-area networks, and on the receiver side using asynchronous Message Passing Interface to do parallel processing of the data streams. We’ve used this for streaming data from KSTAR to the NERSC Cori supercomputer and completing a spectral analysis in parallel in less than 10 minutes, which normally in serial would take 12 hours. Frameworks such as this, enabling connecting experiments to remote high-performance computers, will open up the quality and quantity of analysis workflows that experimental scientists can run. It’s exciting to see how this can help accelerate the progress of science around the world.

Congratulations on your success! This is a significant step forward in building the data management capability that ITER will need.

Attending SC15? Get a Close-up Look at Virtualized Science DMZs as a Service

2015-11-15 ~ ESnetComms

Attending SC15? Get a Close-up Look at Virtualized Science DMZs as a Service

ESnet, NERSC and RENCI are pooling their expertise to demonstrate “Virtualized Science SMZs as a Service” at the SC15 conference being held Nov. 15-20 in Austin. They will be giving the demos at 2:30-3:30 p.m. Tuesday and Wednesday and 1:30-2:30 p.m. Thursday in the RENCI booth #181.

Here’s the background: Many campuses are installing ScienceDMZs to support efficient large-scale scientific data transfers. There’s a need to create custom configurations of ScienceDMZs for different groups on campus. Network function virtualization (NFV) combined with compute and storage virtualization enables a multi-tenant approach to deploying virtual ScienceDMZs. It makes it possible for campus IT or NREN organizations to quickly deploy well-tuned ScienceDMZ instances targeted at a particular collaboration or project. This demo shows a prototype implementation of ScienceDMZ-as-a-Service using ExoGENI racks (ExoGENI is part of NSF GENI federation of testbeds) deployed at StarLight facility in Chicago and at NERSC.

The virtual ScienceDMZs deployed on-demand in these racks use the SPOT software suite developed at Lawrence Berkeley National Laboratory to connect to a data source at Argonne National Lab and a compute cluster at NERSC to provide seamless end-to-end high-speed data transfers of data acquired from Argonne’s Advanced Photon Source (APS) to be processed at NERSC. The ExoGENI racks dynamically instantiate necessary compute virtual resources for ScienceDMZ functions and connect to each other on-demand using ESnet’s OSCARS and Internet2’s AL2S system.

ESnet’s Zurawski Helps Advance Campus Cyberinfrastructure in Pennsylvania

2015-08-052015-08-05 ~ ESnetComms

Jason Zurawski of ESnet’s Science Engagement Team gave presentations on the Science DMZ architecture and perfSONAR network measurement toolkit at a two-day workshop held last month at Penn State. The workshop, which aimed to strengthen campus cyberinfrastucture, drew more than 30 higher education network engineers representing 11 higher education institutions.

The workshop was a collaboration of ESnet, the Keystone Initiative for Network Based Education and Research (KINBER) and Penn State with funding from the National Science Foundation. Zurawski and other members of ESnet’s Science Engagement Team regularly participate in similar workshops and give webinars to share ESnet’s expertise and experience with campuses and regional networks as they handle increasingly large data flows.
JasonZurawski2
Jason Zurawski

NSF Funds Upgraded Network Linking Labs, Universities and Research Networks Based on Science DMZ

2015-08-032015-08-03 ~ ESnetComms

For the last three years, the National Science Foundation (NSF) has made a series of competitive grants to over 100 U.S. universities to aggressively upgrade their campus network capacity for greatly enhanced science data access, with many incorporating ESnet’s Science DMZ architecture. NSF is now building on that distributed investment by funding a $5 million, five-year award to UC San Diego and UC Berkeley to establish a Pacific Research Platform (PRP), a science-driven high-capacity data-centric “freeway system” on a large regional scale.

The PRP is basing its initial deployment on a proven and scalable network design model for optimizing science data transfers developed by ESnet. “ESnet developed the Science DMZ concept to help address common network performance problems encountered at research institutions by creating a network architecture designed for high-performance applications, where the data science network is distinct from the commodity shared Internet,” said ESnet Director Greg Bell. “As part of its extensive national and international outreach, ESnet is committed to working closely with the Pacific Research Platform to leverage the Science DMZ and Science Engagement concepts to enable collaborating scientists to advance their research.”

In the PRP the Science DMZ model will be extended from a set of heterogeneous campus-level DMZs to an interoperable regional model. Read more.

Image courtesy Calit2.

ESnet to Demonstrate Science DMZ as a Service, Create Virtual Superfacility at GENI Conference

2015-03-23 ~ ESnetComms

At the twenty-second GENI Engineering Conference being held March 23-26 in Washington, D.C., ESnet staff will conduct a demonstration of the Science DMZ as a service and show how the technique for speeding the flow of large datasets can be created on demand. The conference is tailor-made for the demonstration as GENI, the Global Environment for Network Innovations, provides a virtual laboratory for networking and distributed systems research and education.

The Science DMZ architecture, developed by ESnet, is a specialized network architecture to speed up the flow of large datasets. The Science DMZ is a portion of a network, usually at a university campus, that is configured to take optimal advantage of the campus’ advanced networks. A Science DMZ provides “frictionless” network paths that connect computational power and storage to scientific big data.

ESnet’s Science DMZ Architecture is Foundation for New Infrastructure Linking California’s Top Research Institutions

2015-03-092015-03-09 ~ ESnetComms

The Pacific Research Platform, a cutting-edge research network infrastructure based on ESnet’s Science DMZ architecture, will link together the Science DMZs of dozens of top research institutions in California. The Pacific Research Platform was announced Monday, March 9, at the CENIC 2015 Annual Conference/

The new platform will link the sites via three advanced networks: the Department of Energy’s Energy Science Network (ESnet), CENIC’s California Research & Education Network (CalREN) and Pacific Wave. Initial results for the new infrastructure will be announced in a panel discussion during the conference featuring by Eli Dart (ESnet), John Haskins (UC Santa Cruz), John Hess (CENIC), Erik McCroskey (UC Berkeley), Paul Murray (Stanford), Larry Smarr (Calit2), and Michael van Norman (UCLA). The presentation will be live-streamed at 4:20 p.m. Pacific Time on Monday, March 9, and can be watched for free at cenic2015.cenic.org.

Science DMZs are designed to create secure network enclaves for data-intensive science and high-speed data transport. The Science DMZ design was developed by ESnet and NERSC.

“CENIC designed CalREN to have a separate network tier reserved for data-intensive research from the beginning, and the development of the Science DMZ concept by ESnet has enabled that to reach into individual laboratories, linking them together into a single advanced statewide fabric for big-science innovation,” said CENIC President and CEO Louis Fox. “Of course, CENIC itself also functions as a way to create a fabric of innovation by bringing researchers together to share ideas, making the timing of this announcement at our annual conference just right.”

ESnet Takes Science DMZ Architecture to Pennsylvania R&E Community

2015-03-02 ~ ESnetComms

Jason Zurawski of ESnet’s Science Engagement team will lead a March 4 webinar on “Upgrading Campus Cyberinfrastructure: An Introduction to the Science DMZ Architecture” for research and education organizations in Pennsylvania.

Zurawski will introduce ESnet’s Science DMZ Architecture, a network design pattern designed to streamline the process of science and improve the outcomes for researchers. Included in this design are network monitoring concepts via the perfSONAR framework, as well as functional components used to manage security and manage the transfer of data. This design pattern has roots in high speed networks at major computing facilities, but is flexible enough to be deployed and used by institutions of any size. This solution has been successfully deployed on numerous campuses involved in the NSF CC-IIE and CC-NIE programs, and is a focus area for the upcoming CC-DNI program.

The workshop is presented by the Keystone Initiative for Network Based Education and Research (KINBER), a not-for-profit membership organization that provides broadband connectivity, fosters collaboration, and promotes the innovative use of digital technologies for the benefit of Pennsylvania.

ESnet, Berkeley Lab to Host Feb. 27-28 Workshop on Operating Innovative Networks

2014-02-25 ~ ESnetComms

The next installment of the Operating Innovative Networks (OIN) workshop series will be held Feb. 27-28 at Lawrence Berkeley National Laboratory, drawing 50 university and laboratory network engineers from as far away as New Jersey, Texas, Illinois and Colorado. The workshop is co-hosted by ESnet, Berkeley Lab and CENIC, the Corporation for Education Network Initiatives in California. The program is designed to give attendees the knowledge and training needed to build next-generation campus networks that are optimized for data-intensive science.

Presented by experts from the Department of Energy’s ESnet, Indiana University and Internet2, the workshop series will focus on Science DMZnetwork architectures, perfSONAR performance measurement software, data transfer nodes and emerging software defined networking techniques. Combined, these technologies are proven to support high-performance, big data science applications, including high-volume bulk data transfer, remote experiment control, and data visualization. The workshops will consist of two days of presentation material, along with hands-on sessions to encourage immediate familiarity with these technologies.

The Berkeley workshop is the fourth in the series and future sessions are planned in Atlanta (March), Boston (May) and Oregon (July). Read more at: http://oinworkshop.com/

ESnet’s Brian Tierney Discusses Network Performance in HPCwire Podcast

2014-02-13 ~ ESnetComms

In an interview with HPCwire editor Nicole Hemsoth, Brian Tierney, leader of the ESnet Advanced Network Technologies Group at ESnet’s 100G Network Testbed Project, discusses network performance issues, including the 1,000^th deployment of the perfSONAR network measurement software.

In the 20-minute interview, Brian talks about how he originally got interested in network performance, the current state of ESnet and how institutions can assess and improve the performance of their network connections, such as referring to the fasterdata.es.net knowledge base. He also describes how Science DMZs can safely transfer massive datasets around firewalls and touches on the potential of Software Defined Networking.

Take a listen at: http://www.hpcwire.com/soundbite/milestones-hpc-network-performance/

ESnet’s Brian Tierney

ESnet Staff to Share Expertise in SC13 Tutorials

2013-10-31 ~ ESnetComms

At the SC13 conference to be held Nov. 17-22, networking experts from the Department of Energy’s ESnet will share their expertise and experience in two tutorials designed to help high performance computing center managers and IT staff simplify and accelerate data transfers on the network. Not only does ESnet operate the nation’s fastest network for scientific research at 100 gigiabits per second, but ESnet staff also work with their counterparts at computing facilities to help them make the most effective use of their high-speed connections.

The tutorials are: