Under Budget and Ahead of Schedule, ESnet6 Project Receives Final CD-4 Approval

The Department of Energy’s Office of Project Assessment recently issued its final CD-4 Review Report on the Energy Sciences Network (ESnet)’s ESnet6 upgrade project. The review, held on July 12 – 13, 2022, and conducted at the request of Barbara Helland, Associate Director of Science for Advanced Scientific Computing Research (ASCR), assessed the project’s readiness to proceed to the approval of project completion. The project completed all threshold key performance parameters (KPPs) six months ahead of the early finish date, two years ahead of the CD-4 Level 1 milestone date, and well under budget. The committee assessed that the project was ready to proceed to CD-4 approval, which was achieved on July 29, 2022. The Final Closeout report and Lessons Learned are being submitted next week within the specified 90-day window.

“I want to congratulate the entire ESnet organization especially the ESnet6 project director, Kate Mace, and the project team,” said Inder Monga, executive director of ESnet. “When the team set out to deliver on a project scope as vast as the ESnet6 launch, we did not imagine a global pandemic would interrupt the process. Despite that, the team delivered the entire project ahead of the deadline and, even with supply-chain issues, managed to complete the scope below the projected budget. Most of the team attended the ESnet6 Unveiling Event on October 11 and heard their accomplishments praised by the lab directorate as well as congresspeople and DOE staff.”

The committee commended the project team for their “unique and innovative approach” in completing the project objectives and complimented ESnet for their agility in following through with the project scope while dealing with the difficult environment generated by the COVID-19 pandemic. The report stated that COVID-19 restrictions, limitations, and supply-chain issues presented “no significant impact” on the project’s critical path. The report also identified the distributed nature of operations and ESnet’s support for a remote workforce as an “invaluable” approach and a best practice to be shared beyond the DOE complex.

The final reports required for the official Project Closeout will be submitted to DOE this week. The ESnet team has continued to keep up the pace as they work toward additional enhancements to the ESnet6 Facility. “The #ESnet6Week festivities the week of October 9 energized the team. Not only were the project accomplishments celebrated at the ESnet6 Unveiling event, but the team also heard firsthand about the impact the project has already had on scientific discovery,” said Kathryn Mace, the ESnet6 project director, and Network Engineering group lead. “Hearing about the expansion of scientific collaborations made easier with the ESnet6 network and automated operations provided the team with newfound motivation to keep moving full speed ahead. ESnet6 sets the foundation for global scientific innovation over the next 10 years.”

The ESnet6 project team, DOE staff, IPR Committee, and members of the Berkeley Lab directorate during the final CD-4 IPR closeout session.

ESnet6 Unveiled Tomorrow!

We’re getting things set up for the ESnet6 Unveiling tomorrow – our tent has gone up, we’re holding final rehearsals for the presentations, printing badges, and doing a thousand other small things.  

The only thing missing from our pictures is you! 

See you tomorrow for the big day, if you are visiting in person, travel safe, and if you are joining us virtually, the show starts at 9:00 AM on https://streaming.lbl.gov.

ESnet6 Investment Supports Next Generation Exascale Earth System Model

Scientists at Oak Ridge, Argonne, and Lawrence Livermore National Laboratories are collaborating on the next generation of integrated Earth climate models using Exascale Computing Project computers and simulation models. The Earth System Grid Federation program is building vast simulation models using data collected about our planet at all levels, from space to far below the surface. Predictions from these models are vital to our understanding of climate, ocean, and other complex systems that make life possible. Read more about this and ESnet’s role in this important international science conversation in a new phys.org article from Oak Ridge National Laboratory.

Visualization from the Earth System Model, one component of the Earth System Grid Federation program. ESnet provides the data connectivity necessary to stitch teams and computers at different labs together. Credit: LLNL, U.S. Dept. of Energy

The ESnet6 Unveiling Ceremony is 4 days away!  Come celebrate our new network and the great science we support, like the Earth System Grid Federation. Join us from 9 a.m. – 12 a.m., 11 October on https://streaming.lbl.gov.

Why We Designed and Deployed ESnet6: It is All About the Science!

We’re just a few days away from the ESnet6 unveiling and Confab22!

Here’s a great video interview with Ann Almgren, Senior Scientist in CCSE and the Department Head of the Applied Mathematics Department in the Applied Mathematics and Computational Research Division at Berkeley Lab. In it she discusses her research into wind power generation/distribution, and how she will use ESnet6.

Ann Almgren, Berkeley Lab

To watch the unveiling of ESnet6 and learn more about Ann’s research, join us 11 October from 900AM – 12 PM PT at streaming.lbl.gov!

ESnet6 Unveiling in Seven Days!

On October 11, 2022, we will welcome the newest generation of our high-performance scientific network, ESnet6, at an unveiling ceremony hosted by Berkeley Lab.

ESnet6 marks a new era of our high-performance network supporting the needs of scientists. We’re able to handle massive flows of data in a reliable, nimble way, and we can specifically configure our setup to match the needs of individual experiments. The upgrade ensures that ESnet is ready to support the future of science today, including the significant increase in the amount of data produced by scientific experiments and the increasingly complex needs of scientists and the way they interact with our network. 

Come watch the ESnet6 unveiling ceremony 9AM -12 PM PT, October 11, at streaming.lbl.gov!

Watch our latest video: Serving Conversations That Matter

Title card from the Serving Conversations That Matter video

ESnet exists to support research into some of the most important questions of our time. The traffic that travels over our network on a daily basis contains data from tens of thousands of researchers – data that could lead to the next major discovery or scientific breakthrough.

In our latest video, learn about just a few of the revolutionary research collaborations we support, and the questions they’re working to answer.

Making the Research and Educational Community SAFER: Adam Slagell on the creation of a new global collaboration to combat cyberthreats.

Adam Slagell is ESnet’s Chief Security Officer and a founding member of the newly formed Security Assistance For Education & Research (SAFER) trust group.

SAFER is an operational security entity focused on fighting computer misuse and defending the academic, research, and education (R&E) mission globally.  SAFER brings together expertise and resources from organizations across the Research and Educational cybersecurity community, including CERN, DFN-CERT, ESET, ESnet, LBNL, STFC, and WLCG.

More information can be found here https://www.safer-trust.org/.


What motivates the creation of SAFER and what do you think success will look like for the community?

There are many cybersecurity trust groups out there, some even dedicated to R&E like REN-ISAC or XSEDE’s trust group consisting of current and former Teragrid and XSEDE site  members. However, there really isn’t anything like this—both permanent and truly international— even though attacks are almost always transnational. So each time there is a new, major campaign, an international group connecting all these regional responders must be created again. What we are trying to do is create that permanent backbone with a core set of highly connected individuals who are a part of these regional and project-specific trust groups around the world.

If we are successful, we will see several things. First, I believe we will see more international cooperation and information sharing, leading to an earlier notice of new attack campaigns. Second, we will be able to activate a response more quickly, pulling in the expertise needed from a broad pool of SAFER members and their trusted colleagues. Finally, it is our hope that we can provide surge capabilities when a member is under attack. Many R&E organizations have limited resources and small teams. It is a tremendous asset if they can get help from their peers, maybe with unique expertise as they are facing a disruptive attack.

What kind of security resources will SAFER provide?

I alluded to some of the services when discussing what success will look like. But ultimately, our security resources will be determined by community needs. The founding members will serve as the steering committee for the first year until we elect the next steering committee. 

One of our  first-steps is  setting up a Malware Information Sharing Platform (MISP) instance to share Indicators of Compromise, e.g., IP addresses, file hashes, domain names, etc. Usually, there is no requirement for members to share such data as the rules and regulations differ so much across organizations. But even on day one, we will have enough organizations that can contribute to making this service useful.

There is also a secure messaging and chat service using decentralized cryptography that all of our members can participate in. These ad hoc conversations about what people are seeing on their networks will hopefully help detect trends early.

Finally, many of the founding members have more resources from these large institutions, and I believe we can quickly help those projects and institutions that might struggle with an attack by providing our expertise while helping to train the next generation of security professionals.

What excites you most about this effort and what is the opportunity to do the most good?

I love the community-building aspect. In a past life, I created the Bro (now Zeek) Leadership Team and really worked hard to build a vibrant community around that software. I think this expertise is where I can be most helpful as I am less technical in my roles today.

I will also say, I am excited about getting young people involved, too. Organizations who contribute time from their teams will really benefit. There is no training for an incident response like jumping in, and I expect the variety of issues we will see will prove very useful just from a training and development perspective.

LBL has a long history supporting cybersecurity research, from the early days of Clifford Stoll and The Cuckoo’s Egg to the creation of Bro.  What does the future of cybersecurity look like, and how will that shape the REN community?

Indeed, LBL’s security team is also a SAFER founding member. One of the things I love about working here and at ESnet is that our mission is outward-focused and when we help the community we raise all boats so to speak.

Fortune telling however is a dangerous game. We have anticipated some things, like cryptocurrency mining coming to HPCs. However, the threat landscape and tools available keep changing. That is part of what makes this job interesting. The important thing that I hope we keep in mind is that security is not done for its own sake, but to enable our mission of scientific research. To me, this means that we must always work to make risk-based security decisions, even when that might challenge pushes for compliance and simple one-size-fits-all solutions. 

Next Generation ESnet6 Routers Installed and Accepted!

ESnet6 took a major step forward last week with the completed installation and acceptance of all 40 “greenfield” routers on the network backbone. These new routers will enable ESnet to operate at speeds up to 400 Gbps across our national fiber network, and provide the backbone infrastructure behind our next generation scientific data mobility capabilities.

A new ESnet6 backbone router in its native habitat.

The installation and acceptance process at each location across the continental US required careful coordination between subcontractors, colocation facility personnel, Lab site staff, and multiple teams across ESnet. Following local health regulations and access requirements, ESnet arranged physical access for the subcontractors at each location and all parties participated in a turn-up conference call as the routers were installed and brought online..

In addition to networking capabilities, the ESnet6 team implemented new software automation capabilities simplifying the installation and acceptance process.  These capabilities included enhancements to the ESnet inventory system to support bulk planning data import, automatic bill of materials generation, automatic site survey generation, and automated generation of all backbone links within the network.  In addition, the team introduced new workflow orchestration, automated provisioning, and inventory discovery capabilities to help with the installation process.

The acceptance of the ESnet6 greenfield routers is a major milestone for the ESnet6 Project and the team has already migrated a significant portion of customer traffic onto the new routers. Despite the extra challenges presented by the COVID-19 pandemic, the project has made steady progress and is on track to finish ahead of schedule. 

Meeting the Challenge of High Availability through the HASS

Operating a highly optimized network across two continents that meets the needs of very demanding scientific endeavors requires a tremendous amount of automation, orchestration, security, and monitoring.  Any failure to provide these services can create serious operational challenges. 

As we enter the ESnet6 era, ESnet is dedicated to ensuring that we continue to relentlessly push the boundaries of operational excellence and obsessively seek out and improve upon operational risks. Our new High Availability Services Site (HASS) in San Jose, CA. will be a critical component to realizing those goals in our computing platforms and services. ESnet’s HASS will soon provide fully redundant network operations platforms, allowing us to seamlessly maintain services if our operations at LBL are disrupted.

For about a decade, ESnet has augmented its data center operations at Berkeley Lab in California with a small footprint at Brookhaven National Laboratory in New York.  This has allowed us to synchronize important information across two sites and to run multiple instances of important services to ensure operational continuity in the case of a failure.  While this has provided great stability and reliability, there are limitations.  In particular, the 2,500 mile gap across a continent does not let ESnet restore operations without some degree of delay as some key services must be manually transitioned.  HASS will enable seamless operational continuity, since the shorter distance between Berkeley and San Jose will let us automatically maintain the active synchronization of operational platforms.

Deployment of HASS involves a team effort of our ESnet Computing Infrastructure, Network Engineering, and Security teams, working together to architect and deploy the next evolution in our computing and service reliability strategy.  After finalizing our requirements, we are now working with Equinix, a commercial colocation provider, to deploy a site adjacent to the ESnet6 network.  Equinix was able to provide a secure suite in their San Jose facility and this location gives us the capacity, and physical adjacency we require to directly connect this suite to ESnet6 and reach our Berkeley data center comfortably within our demanding latency goals (10ms or less).  

As part of standing up HASS , we’ll be installing a new routing platform with a 100G upstream connection to ESnet6 in both San Jose and Berkeley.  We’ll also be installing new high performance switching platforms, security services (high throughput firewalls, tapping, black hole routing, etc.), virtualization resources, and several other redundant internal operational platforms.  Our existing virtualization platform (ESXi/vSAN) will “stretch” into the new space as part of the same logical cluster we operate in Berkeley.  Once this is deployed, even networking services that lack native high availability capabilities will be able to simply “float” between the two physical data centers with data mirrored and striped across both sites.  

We’re very excited by the addition of the San Jose HASS, and HASS, in combination with existing reliability resources at Brookhaven, will continue to ensure that ESnet6 has the ability to meet scientific networking community needs for service hosting, disaster recovery, and offsite data replication.

Programmable Per-Packet Network Telemetry: From Wire to Kafka at Scale

High-speed intelligent Research and Educational Networks (RENs),  such as the one we’re building as part of the ESnet 6 program, will require a greater ability to understand and manage traffic flows. One research program underway to provide this capability is the High Touch effort,  a programmable, scalable, and expressive hardware and software solution that produces and analyzes per-packet telemetry information with nanosecond-accurate timing. Along with Zhang Liu, Bruce Mah, Yatish Kumar, and Chin Guok, I have just released a presentation for the Proceedings of the 2021 Virtual Meeting on Systems and Network Telemetry and Analytics, describing work underway to create a programmable, very high speed, packet monitoring, and telemetry capability as part of bringing High-Touch to life.

Richard Cziva presenting at the SNTA ’21: Proceedings of the 2021 on Systems and Network Telemetry and Analytics

For more information on this talk, please see this link.