ESnet6 Investment Supports Next Generation Exascale Earth System Model

Scientists at Oak Ridge, Argonne, and Lawrence Livermore National Laboratories are collaborating on the next generation of integrated Earth climate models using Exascale Computing Project computers and simulation models. The Earth System Grid Federation program is building vast simulation models using data collected about our planet at all levels, from space to far below the surface. Predictions from these models are vital to our understanding of climate, ocean, and other complex systems that make life possible. Read more about this and ESnet’s role in this important international science conversation in a new phys.org article from Oak Ridge National Laboratory.

Visualization from the Earth System Model, one component of the Earth System Grid Federation program. ESnet provides the data connectivity necessary to stitch teams and computers at different labs together. Credit: LLNL, U.S. Dept. of Energy

The ESnet6 Unveiling Ceremony is 4 days away!  Come celebrate our new network and the great science we support, like the Earth System Grid Federation. Join us from 9 a.m. – 12 a.m., 11 October on https://streaming.lbl.gov.

Why We Designed and Deployed ESnet6: It is All About the Science!

We’re just a few days away from the ESnet6 unveiling and Confab22!

Here’s a great video interview with Ann Almgren, Senior Scientist in CCSE and the Department Head of the Applied Mathematics Department in the Applied Mathematics and Computational Research Division at Berkeley Lab. In it she discusses her research into wind power generation/distribution, and how she will use ESnet6.

Ann Almgren, Berkeley Lab

To watch the unveiling of ESnet6 and learn more about Ann’s research, join us 11 October from 900AM – 12 PM PT at streaming.lbl.gov!

ESnet’s Wireless Edge: Extending Our Network to Support Field Science

Throughout the world, earth and environmental scientists are deploying new kinds of sensors to measure and understand how the climate is changing and how we can best manage key infrastructure and resources in response. 

Operation and data analysis of these sensors can often be challenging, as they are deployed in areas with limited power, sometimes with no data connectivity beyond the periodic physical collection of memory cards. Sensors may be in areas where weather and other factors make access laborious and challenging, such as at the top of a mountain, down a borehole, or under dense forest canopy.

Solar-powered meteorological and hydrological sensors deployed at the Snodgrass Field Site, Crested Butte, July 2022 at approximately 9,000 ft. elevation. (Photo: Andrew Wiedlea)

As the number, types, and capabilities of these sensors increases, the U.S. Department of Energy’s (DOE) Energy Sciences Network (ESnet) is working on ways to extend its high-speed network to support the needs of scientists working in remote, resource-challenged environments where our fiber backbone cannot be extended. Using advanced wireless technologies such as low-Earth orbit constellations, 5G, and private citizen band radio system cellular, mmWave, and Internet-of-Things tools like long-range (LoRa) mesh networks, we are developing ways to remove the limits of geographical constraints from field scientists, just as we have traditionally sought to do for laboratory scientists around the DOE complex.

In early July this year, ESnet took a step forward in these efforts by installing a private cellular network near Crested Butte, Colorado, supporting sensor fields being used by Earth and environmental scientists on Lawrence Berkeley National Laboratory’s (Berkeley Lab’s) Surface Atmosphere Integrated Laboratory program.  

The purpose of this effort is to assess requirements for operation of a private 4G/5G wireless network in a remote and changing environment, which can pull ESnet capabilities and services supporting scientific research out beyond our performant 13,000 km optical backbone. We are also using this research to identify specific operational, workflow, and data movement needs for the Earth and environmental science community as part of building ESnet’s logistics, operational, and human capital resources available to support the Earth and environmental science mission.

Our system, which is currently being configured, is built around a Nokia Digital Automation Cloud private cellular capability, with antennas being placed across a valley from sensor fields at the Snodgrass Field Site in Crested Butte. The intent is to use this cellular service to automate and improve the efficiency of data collection from sensors, using cellular routers and radios, depending on the specific capabilities of each sensor system. For those sensor systems that cannot be directly connected to a cellular network, we are establishing solar-powered sensor stations that will provide local area bridge (several hundred meter) connectivity to local sensors via wifi, LoRa, or direct ethernet cable. 

Once data is backhauled from a sensor field through our private cellular network, it will be transmitted back to ESnet via SpaceX’s Starlink low earth orbit satellite system, connecting to ESnet at a peering location in Seattle, Washington, and then through our optical backbone to the National Energy Research Scientific Computing Center at Berkeley Lab for processing and storage.

With fantastic assistance and collaboration from the Atmospheric Radiation Monitoring program, the Rocky Mountain Biological Laboratory, and Dan Feldman and Charulekha Varadarajan in the Watershed Function Science Focus Area at Berkeley Lab, our first field campaign was both great fun and extremely productive. 

We will return later in the Fall to complete network configuration and connection of sensors to the network. Once this is done, we can begin the next phase of this research: studying the operational performance and service requirements necessary to support field science through the demanding conditions provided by winter in the Colorado High Rockies. We will also begin to develop standard deployment equipment specifications and practices that we can use to support ESnet wireless edge deployments supporting science in other regions and for other purposes.  

This effort is being made possible by teamwork across ESnet and Berkeley Lab, including outstanding support at Berkeley Lab from Chris Tracy, Jackson Gor with ESnet network engineering, and Steve Nobles and many others with IT Telephone Services. The Colorado deployment success depended on the hard (often physical) work of Stijn Wielandt-EESA, Kate Robinson (ESnet Network Engineering), Jeff D’Ambrogia (IT-Science IT), and Jeff Chavez with Nokia.

CERN’s Large Hadron Collider Begins Run 3 

Today, the world’s most powerful particle accelerator, the Large Hadron Collider (LHC), begins Run 3, a new period of data taking! This comes after more than three years of upgrade and maintenance work.

Image: CERN

ESnet is looking forward to continuing to support the US LHC community for Run 3 through its connectivity to CERN. ESnet carries all traffic for LHC from Europe to the US, to the two Tier 1 centers at Fermilab and Brookhaven National Lab, and for all the physicists in Tier 2 centers in Universities. You can see the list of LHCONE collaborators at my.es.net/lhcone/list.

View the traffic on our trans-Atlantic links live at my.es.net.

Check out CERN’s livestream of the event here.

Apply for the 2022 Women in IT Networking at SC program and help build SCInet!

The Women in IT Networking at SC (WINS) program is now accepting applications from US-based early- to mid-career women for their 2022 program. Those selected for the program will be given full travel funding to attend the Supercomputing Conference (SC) in Dallas, TX from November 13-18, 2022, where they’ll have a chance to help construct SCinet, a unique multi-terabit-per-second network built annually to support demonstrations by SC attendees.

The WINS program was developed in 2015 to combat the gender gap in the network engineering and high performance computing fields.  WINS is a joint effort between the Department of Energy’s Energy Sciences Network (ESnet), the Keystone Initiative for Network Based Education and Research (KINBER), and the University Corporation for Atmospheric Research (UCAR) and works collaboratively with the SC program committee.

What the program entails

If selected for the program, you’ll be matched with a SCinet team and a high-profile mentor based on your interests and background. You’ll also get to dive in and work side-by-side with top engineers building SCinet.

Those selected for the program will also receive: 

  • Travel funds for attending staging, setup, and live support of the SC conference as a SCinet volunteer.
  • Complimentary conference registration
  • Professional development support before, during, and after the conference

Who should apply

Early- and mid-career engineers and technologists who: 

  • Want to work side-by-side with the world’s leading network, software, and systems engineers and top network technology vendors.
  • Identify as women at the time of application.
  • Are able to travel to Dallas, TX during the following dates (assuming COVID doesn’t interfere):
    • SCinet Staging: Oct. 20-28, 2022
    • SCinet Setup: Nov. 7-13, 2022
    • SCinet Live Operations/SC22 Conference and SCinet teardown: Nov 13-19, 2022

WINS is especially interested in applications from historically underrepresented groups in the Information Technology field, including Black or African-American, American Indian or Alaska Native, Hispanic or Latinx women. 

Learn more and submit your application here. Applications are due by January 21, 2022, at 11:59 pm. If you want to participate in SCinet but don’t fit the above criteria, you can contact SCinet to learn more about other volunteer opportunities 

Arecibo Support Wins SC21 HPCwire Readers’ Choice Award!

Arecibo dish after the collapse

As part of a team spanning 15 government, academic, and industrial partners, the Engagement and Performance Operations Center (EPOC) – a collaboration between Indiana University and ESnet – was awarded the “Best HPC Collaboration (Academia/Government/Industry)” HPCwire Readers’ Choice award on Tuesday, Nov. 16. The award, which was made at the High Performance Computing, Networking, Storage and Analysis (SC21) conference, recognizes the effort and collaboration required to move and safeguard irreplaceable data (over 50 years of astronomical observations) from the Arecibo observatory following the structural collapse of this scientific resource in 2016.

At ESnet, Ken Miller, George Robb, and Jason Zurawski supported these efforts as both full members of EPOC and ESnet staff. Both Jason and Ken divide their time between ESnet’s Science Engagement Team, while George is with ESnet’s Infrastructure Systems group. LightBytes looped up with Jason Zurawski to get his thoughts on the project and award, and an update on the Arecibo effort since our post in April 2021 on this project.


Now that data from Arecibo has been migrated to the Texas Advanced Computing Center (TACC), what happens now, and how will this data be used?

The team at the University of Central Florida has been engaged with TACC on several ways to build up the capabilities for their data analysis and sharing requirements. They are working to deploy a portal that will allow researchers access to the data, as well as build workflows to investigate and process using computation provided by TACC.

The team at Arecibo is also still going to process much older data that still resides on tape. Due to the delicate state of the media, it is carefully being read and transferred to on-island storage before being transmitted to TACC for archiving. This work will take several more months to complete.

What do you think the lessons from this effort are in terms of getting so many different organizations to work together to support this very challenging problem?

The collapse that Arecibo experienced sent ripples through the R&E community because researchers and technology professionals alike knew there was a limited window to act on replicating important observations gathered over the years. The partners in this effort were motivated to act, and that removed many barriers to putting some solutions in place. Everyone collaborated efficiently with their core competencies, and we continue to work together as the next steps for the scientific collaboration are planned.

Plans are starting to emerge for a “next generation” Arecibo based on the loss of this instrument, how might the next generation of data management resources be shaped by this collaboration?

Now that there has been some time to evaluate the work, it has also spurred UCF and Arecibo to plan for the future with respect to computation, storage, and network connectivity both in Puerto Rico and in Florida.  With these improvements planned, they will be well-positioned to serve the scientific data for years to come.  New instruments will no doubt increase the data demands by many orders of magnitude – addressing all aspects of the data pipeline now, and then gradually increasing the capabilities over time, will help to prepare for these emerging challenges. 

Congratulations to all of the organizations and staff who helped prevent the loss of this data!

Science begins as a Conversation! See how ESnet creates a world where conversations become discovery. Watch our new video now!

Ever want to know how big research data moves around the globe? ESnet plays a significant role in supporting the great scientific conversations, collaborations, and experiments underway, wherever and whenever they occur. We move Exabytes of data around the world creating a global laboratory that accelerates scientific discovery.

In order to meet these needs of scientists, we are constantly looking for opportunities to expand our capabilities with our next generation network ESnet6, intelligent edge analytics, advanced network testbeds, 5G wireless, quantum networking and more.

https://www.es.net/scienceconversation/

High Energy Physics Requirements Review Now Available: The Data Deluge Shows no Sign of Cresting!

Lauren Rotman and Jason Zurawski


Across the physical sciences, new instruments and capabilities are continuing a relentless growth in data production and need for high speed networking and analysis resources. 

ESnet stays on-top of these trends via the Network Requirements Review process, which for the past 15 years has been a remarkable and useful collaboration between the DOE Office of Advanced Supercomputing Research (ASCR), ESnet and science programs across the DOE Office of Science.

The latest Network Requirements Review for the Office of Science High Energy Physics program office (HEP) is now available — among many other findings, this review confirms that the exponential growth of scientific data generation will continue unabated as we proceed into what may well be a new golden age for high energy physics research. Some samples include:

The upcoming High Luminosity era for the Large Hadron Collider (beyond 2027, or Run-4) will require multi-Tbps network speeds to support globally dispersed “Tier 1” HPC resources.  Scientists will use the LHC to uncover how the Higgs-Boson interacts and gives mass to other particles, and explore emerging evidence for particle behaviors not explained by current physics models. Each data-taking year, the experiments, ATLAS and CMS combined, are expected to accumulate roughly 1 EB of new data and it is estimated that complete data set sizes may routinely exceed 100 PB.  

Expected maximum luminosity and integrated luminosity for the LHC as a function of calendar year, data produced tracks with improved luminosity and resolution

Scientists at the Deep Underground Neutrino Experiment (DUNE) in South Dakota and at Fermilab in Illinois, will use high speed data transfer to identify supernova events, as part of ongoing measurement of neutrino interactions. Supernovae measured by DUNE will generate over 200TB of compressed data per event, and Research and Educational Networks (REN) must be able to supply highly reliable, predictable data transfer capabilities to provide telescope targeting data to global arrays.

10kt DUNE Far-Detector SP module, showing the alternating 58 m long (into the page), 12 m high anode (A) and cathode (C) planes, as well as the field cage that surrounds the drift regions between the anode and cathode planes. The blank area on the left side was added to show the profile of a single anode plane assembly (APA). Person included for scale.

The Cosmic Microwave Background, Stage 4 (CMB-S4) experiment will require data management and transfer capabilities in some of the most demanding locations on earth. Operating two observational locations, and multiple telescopes with a combined total of 500,000 cryogenically-cooled superconducting detectors at the South Pole and in the Chilean Atacama Desert, CMB-S4 will provide an unprecedented picture back into the start of the Universe. Operating for seven years in these conditions, 22 TB (~8 TB at the South Pole and ~14 TB in Chile) of data will be generated daily, leading to an accrual of 3 PB annually, and as much as 100 TB over the full program lifecycle.

Two Cross-Dragone (CD) telescopes (one is pictured above) with six meter diameter input apertures will be deployed at the Chilean site to map roughly 70% of the sky every day to support the dark universe, matter-mapping, and time-varying mm-wave sky science goals. Image and caption courtesy of the CMB-S4 Project

Network Requirements Reviews analyze the current, near, and long-term needs of the HEP community, providing a network and data-centric understanding of the scientific process used by the researchers and scientists. These requirements reviews drive ESnet’s investments in new services and capabilities, and enable ESnet to build strong partnerships with Office of Science (SC) programs, PIs, and user facilities. More information on this ESnet requirements review process can be found here.

We would like to thank the 13 HEP projects, and all of the HEP & DOE Office of Science collaborators who generously gave of their time, expertise, and most importantly, their enthusiasm for the future of high energy physics, as part of creating this report.

We want to especially thank the entire Science Engagement team plus Kate Robinson, and Dale Carder with our Network Engineering group who all provided outstanding support and technical expertise.

Creating the Tokamak Superfacility: Fusion with the ScienceDMZ

5.5 Questions with Eli Dart (ESnet), C.S. Chang, and Michael Churchill (PPPL)

In 2025, when the International Thermonuclear Experimental Reactor (ITER) generates “first plasma”, it will be the culmination of almost 40 years of effort.  First started in 1985, the project has grown to include the scientific talents of seven members (China, EU, India, Japan, Korea, Russia, and the US, with EU membership bringing the total to 35 countries) and if successful, will mark the first time that a large scale fusion reactor generates more thermal power than is used to heat isotopes of hydrogen gas to a plasma state.

ESnet is supporting this international scientific community as this dream of limitless, clean energy is pursued. When operational at full capacity, ITER will generate approximately a petabyte-per-day of data, much of which will need to be analyzed and fed back in near real-time to optimize the fusion reaction and manage distribution of data to a federated framework of geographically distributed “remote control rooms” or RCR.  To prepare for this demanding ability to distribute both data and analytics, recently ESnet’s Eli Dart and the Princeton Plasma Physics Laboratory’s (PPPL) Michael Churchill and  C.S. Chang were co-authors on a test exercise performed with collaborators at Pacific Northwest National Laboratory (PNNL), PPPL, Oak Ridge National Laboratory (ORNL), and with the Korean KREONET, KSTAR, National Fusion Research Institute, and the Ulsan National Institute of Science and Technology. This study (https://doi.org/10.1080/15361055.2020.1851073) successfully demonstrated the use of ESnet and the ScienceDMZ architecture as part of trans-Pacific large data transfer, and near real-time movie creation and analysis of the KSTAR electron cyclotron emission images, via links between multiple paths at high sustained speeds.


Q 1: This was a complex test, involving several sites and analytic workflows.  Can you walk our readers through the end-to-end workflow? 

End-to-end workflow of the demonstration comparing real-time streaming data from the KSTAR ECEI diagnostic to side-by-side movie from XGC1 gyrokinetic turbulence code.
End-to-end workflow of the demonstration comparing real-time streaming data from the KSTAR ECEI diagnostic to side-by-side movie from XGC1 gyrokinetic turbulence code.

Eli Dart: The data were streamed from a system at KSTAR, encoded into ADIOS format, streamed to PPPL, rendered into movie frames, and visualized at PPPL. One of the key attributes of this workflow is that it is a streaming workflow. Specifically, this means that the data passes through the workflow steps (encoding in ADIOS format, transfer, rendering movie frames, showing the movie) without being written to non-volatile storage. This allows for performance improvements, because no time is spent on storage I/O. It also removes the restriction of storage allocations from the operation of the workflow – only the final data products need to be stored (if desired). 

Q 2: A big portion of this research supports the idea of federated, near real-time analysis of data.  In order to make these data transfers performant, flexible, and adaptable enough to meet the requirements for a future ITER RCR, you had to carefully engineer and coordinate with many parties.  What was the hardest part of this experiment, and what lessons does it offer ITER?

Eli Dart: It is really important to ensure that the network path is clean. By “clean” I mean that the network needs to provide loss-free IP service for the experiment traffic. Because the fusion research community is globally distributed, the data transfers cover long distances, which greatly magnifies the negative impact of packet loss on transfer performance. Test and measurement (using perfSONAR) is very important to ensure that the network is clean, as is operational excellence to ensure that problems are fixed quickly if they arise. KREONET is an example of a well-run production network – their operational excellence contributed significantly to the success of this effort.

Q 3: One of the issues you had to work around was a firewall at one institution.  What was involved in working with their site security, and how should those working with Science DMZ work through these issues?

Eli Dart: Building and operating a Science DMZ involves a combination of technical and organizational work. Different institutions have different policies, and the need for different levels of assurance depending on the nature of the work being done on the Science DMZ. The key is to understand that security policy is there for a reason, and to work with the parties involved in the context that makes sense from their perspective. Then, it’s just a matter of working together to find a workable solution that preserves safety from a cybersecurity perspective and also allows the science mission to succeed. 

Q 4: How did you build this collaboration and how did you keep everyone on the same page, any advice you can offer other experiments facing the same need to coordinate multi-national efforts?

Eli Dart: From my perspective, this result demonstrates the value of multi-institution, multi-disciplinary collaborations for achieving important scientific outcomes. Modern science is complex, and we are increasingly in a place where only teams can bring all the necessary expertise to bear on a complex problem. The members of this team have worked together in smaller groups on a variety of projects over the years – those relationships were very valuable in achieving this result.

Q 5: In the paper you present a model for a federated remote framework workflow. Looking beyond ITER, are there other applications you can see for the lessons learned from this experiment?

C.S. Chang: Lessons learned from this experiment can be applied to many other distributed scientific, industrial, and commercial applications which require collaborative data analysis and decision making.  We do not need to look too far.  Expensive scientific studies on exascale computers will most likely be collaborative efforts among geographically distributed scientists who want to analyze the simulation data and share/combine the findings in near-real-time for speedy scientific discovery and for steering of ongoing or next simulations.  The lessons learned here can influence the remote collaboration workflow used in high energy physics, climate science, space physics, and others.

Q 5.5: What’s next? You mention quite a number of possible follow on activities in the paper? Which of these most interest you, and what might follow?

Michael Churchill: Continued work by this group has led to the recently developed  open-source Python framework, DELTA, for streaming data from experiments to remote compute centers, using ADIOS for streaming over wide-area networks, and on the receiver side using asynchronous Message Passing Interface to do parallel processing of the data streams. We’ve used this for streaming data from KSTAR to the NERSC Cori supercomputer and completing a spectral analysis in parallel in less than 10 minutes, which normally in serial would take 12 hours. Frameworks such as this, enabling connecting experiments to remote high-performance computers, will open up the quality and quantity of analysis workflows that experimental scientists can run. It’s exciting to see how this can help accelerate the progress of science around the world.

Congratulations on your success! This is a significant step forward in building the data management capability that ITER will need.  

Graduate students publish on network telemetry with ESnet

Two graduate students working with ESnet have published their papers recently in IEEE and ACM workshops.

Bibek Shrestha, a graduate student at the University of Nevada, Reno, and his advisor Engin Arslan worked with Richard Cziva from ESnet to publish a work on “INT Based Network-Aware Task Scheduling for Edge Computing”. In the paper, Bibek investigated the use of in-band network telemetry (INT) for real-time in-network task scheduling. Bibek’s experimental analysis using various workload types and network congestion scenarios revealed that enhancing task scheduling of edge computing with high-precision network telemetry can lead up to a 40% reduction in data transfer times and up to 30% reduction in total task execution times by favoring edge servers in uncongested (or mildly congested) sections of the network when scheduling tasks. The paper will appear in the 3rd Workshop on Parallel AI and Systems for the Edge (PAISE) co-conducted with IEEE IPDPS 2021 conference to be held on May 21st, 2021, in Portland, Oregon. 

Zhang Liu, a former ESnet intern and a current graduate student at the University of Colorado at Boulder, worked with the ESnet High Touch Team – Chin Guok, Bruce Mah, Yatish Kumar, and Richard Cziva – on fastcapa-ng, ESnet’s telemetry processing software. In the paper “Programmable Per-Packet Network Telemetry: From Wire to Kafka at Scale,” Zhang showed the scaling and performance characteristics of fastcapa-ng, and highlighted the most critical performance considerations that allow the pushing of 10.4 million telemetry packets per second to Kafka with only 5 CPU cores, which is more than enough to handle 170 Gbit/s of original traffic with 1512B MTU. This paper will appear in the 4th International Workshop on Systems and Network Telemetry and Analytics (SNTA 2021) held at the ACM HPCD 2021 conference in Stockholm, Sweden between 21-25 June 2021.

Congratulations Bibek and Zhang!


If you are a networked systems research student looking to collaborate with us on network measurements, please reach out to Richard Cziva. If you are interested in a summer internship with ESnet, please visit this page.