ESnet staff attend strategic on-site meetings for the first time in years!

Last week, over 50 ESnet employees gathered at Berkeley Lab for a week of strategizing and socializing. Here are some pictures from their adventures!

Jealous of all the fun we had? Want to hang out with us, too? Good news – Registration will open soon for Confab22, ESnet’s first user meeting! Keep an eye on the blog or pre-register for updates!

Save the date for ESnet’s first annual user meeting – Oct 12-13, 2022!

Join us 12–13 October 2022 for ESnet’s first science user meeting!

Science is a conversation! Come join the conversation and help shape the future of scientific networking at the inaugural ESnet yearly science user meeting – Confab22.  

What to expect at Confab22:

  • Co-design the future of data management and networking with peers across the scientific community and ESnet staff
  • Share with colleagues from other research programs and identify common needs and solutions
  • Learn about the latest networking trends and capabilities
  • Collaborate and enjoy stimulating professional discussions

More information (including the event location, our exciting agenda, and a registration link) will be released soon.

Interested in attending Confab22 (either in-person, or virtually)? Pre-register now to be notified once registration opens. .

A word from Inder Monga: The Road to ESnet6 (Part 1)

Inder Monga, Executive Director of ESnet.

Dear Friends, Well-wishers, Colleagues, and all of ESnet,

In October of this year we will launch ESnet6, a next-generation network featuring an entirely new, software-driven network design that enhances the ability to rapidly invent, test, and deploy new innovations to meet the data needs of the Office of Science/DOE.

We put forth the vision for ESnet6 in 2016. Since then, this $151M project (total project cost – DOE 413.3 parlance including contingency) has overcome pandemic-induced issues like site lockdowns, differing vaccination and inter-state travel policies, and variable supply chain delays, and is now in its final stages of implementation. As I prepare this historic unveiling, I can’t help but look back at what the team accomplished last year.

This is the first post in a series of blog posts about the people, partnerships, and innovations that have paved the road to ESnet6.

2021 was a year for growth within ESnet. We have 100+ people in the organization now—a 30% increase from last year—and it has been great to have new employees on-boarded, integrated, and productive in this challenging environment. 

A diagram showing the dimensions of growth within ESnet: Foundations, Innovation, Co-design, and Culture. Foundations, Innovation, and Co-design all point outward in separate directions, while Culture lies alongside all three Axes, growing in tandem with them.
The dimensions of growth for ESnet

Looking towards the future, we think of ESnet growing around four dimensions. The three spatial axes are: 

  • Foundations: Next Generation Network and Services 
  • Innovation: Testbeds and Advanced Research and Development
  • and Co-design: Partnerships with Science for new data and network solutions. 

The fourth axis, Culture, is pervasive across all three dimensions. 

The main reason for choosing this very technical representation is to illustrate that these are not independent thrusts—success in each of these dimensions depends on the capabilities of the other.

In this post, I’d like to focus on that first axis: Foundations. In the next few posts, I will focus on the Innovation and Co-Design dimensions and share more thoughts about our focus for 2022 and beyond.

Major capacity improvements

In 2021, we installed a brand new routing infrastructure on our network backbone, while decommissioning a portion of the previous generation packet processors in parallel. We seamlessly transitioned all ESnet customers and peers onto the forty new backbone routers before the holidays, and the remaining router upgrades at our customer sites are in progress and scheduled through 2022.

The greenfield optical infrastructure (installed at 300 locations in 2020— another noteworthy accomplishment) is getting a wonderful upgrade: 400G wavelengths are being standardized across our national backbone as we complete the second phase of optical upgrades.

In addition to our team’s intricate efforts to decommission the existing network, we added another 100G on the ring in Europe (thanks to our collaboration with GEANT). This ensured that the first Large Hadron Collider Data Challenge had enough bandwidth to accommodate both ESnet scientific data and LHC data challenge (test) streams. We also established a new point of presence in Dallas to support new peerings and the FABRIC project

ESnet network map showing LHC data challenge traffic sending nearly 100Gbps from Amsterdam to Boston
ESnet network map showing LHC data challenge traffic sending nearly 100Gbps from Amsterdam to Boston.

Creating a smarter network

The vision laid out in 2016 focused not only on capacity, but also on improving the essential framework of how we operate with the network. 

We made a significant investment in building out a high-availability site within 10ms of our main data center, in addition to our disaster-recovery site on the east coast. So any planned or unplanned power outages will be handled without a scramble. While the supply chain issues prevented the site from being ready for operations, we are making steady progress and look forward to completing it this year. 

The software orchestration team made tremendous progress in laying down the vision and framework for automation. They were supported by strong internal collaboration with the engineering team. Many repetitious deployments were automated, and I know it took diligent effort to make these tools available in the right time frame, aligned with evolving constraints of the deployments. A few examples of where automation was used include:

  • Deployment of optical wavelengths on our backbone
  • Deployment of routers and base configurations, and service provisioning
  • Customer migration configurations from old network to the new equipment automatically generated from ESnet Database (ESDB)
  • Virtualized test environment was developed to test out new tools and services before actual in-field deployment.

This year, we prepare to bring the official DOE 413.3 ESnet6 project to a close, but as you know the network never sleeps, data never stops growing, and we have to constantly evolve the network. I can proudly say that we have the core foundations of the enduring ESnet user facility ready to handle the next big challenges of Data, AI, and Integrated multi-facility research that the scientists and National Labs are actively pursuing.

Wishing you all a very Happy New Year from ESnet. 

Inder

This post is part of a series of posts reflecting on the road to ESnet6. Check back soon to see upcoming posts from Inder focusing on innovation, co-design, and his vision for ESnet6 and beyond.

ESnet Highlights from ZeekWeek’21

Fatema Bannat Wala presenting at ZeekWeek21

Slides and videos from ZeekWeek have just been made available — here are links to ESnet highlights.


ZeekWeek, an annual Fall conference organized by the Zeek Project, took place online from October 13-15 this year. The conference had over 2000 registered participants from the open source user community this year, who got together to share the latest and greatest about this cyber-security and network monitoring software tool.

Berkeley Lab staff member Vern Paxson developed the precursor to the Zeek intrusion detection software, then called Bro, in 1994. As an early adopter, ESnet’s cybersecurity team has strong relationships with the Zeek community, and this ZeekWeek was an opportunity to showcase advances and uses made by the software by ESnet and the entire Research and Educational Networking Community.


The talk “DNS and Spoofed traffic investigation with Zeek,” presented by Fatema Bannat Wala, discussed how Zeek is being used to do network traffic analysis/investigations at ESnet by triaging abnormal activities when these occur on our network.

The talks “A Better Way to Capture Packets with DPDK” and “Details for DPDK plugin development and performance measurement presented by Vlad Grigorescu and Scott Campbell, detailed the development process of the plugin and the performance enhancements it brings to the network packet capture technology.

Fatema Bannat Wala also did a training session on “Introduction to Zeek,” which provided hands-on experience with Zeek tools and information about how to get involved with the collaboration.

ESnet’s cybersecurity team looks forward to continued collaboration with the Zeek community, attending next year’s ZeekWeek, and to contributing future code enhancements to this great software ecosystem.

ESnet Machine Learning Researchers Win Best Paper at MLN ‘2021!

MLN '2021 Best Paper Award Notification

Sheng Shen, Mariam Kiran, and Bashir Mohammed have just been awarded the Best Paper award at the International Conference on Machine Learning for Networking (MLN). Sponsored by the Conservatoire National des Arts et Métiers (CNAM), the École Supérieure d’Ingénieurs en Électrotechnique et Électronique (ESIEE), and Laboratoire d’Informatique Gaspard-Monge (LIGM), MLN is being held virtually 1-3 December 2021.

The paper, “DynamicDeepFlow: An Approach for Identifying Changes in Network Traffic Flow Using Unsupervised Clustering,” uses a hybrid of deep learning variational autoencoder model and a shallow learning k-means to help identify unique traffic patterns across ESnet. These unique patterns can help identify if a new experiment has started or whether current network bandwidth is changing.

DynamicDeepFlow (DDF) model structure

“We’re very excited to receive this recognition and the conference was a wonderful opportunity to exchange thoughts and ideas with peers in France. MLN is a conference dedicated to discussing machine learning applications in networks. Our next task is to integrate DynamicDeepflow with Netpredict to show real-time information in ESnet data” — Mariam Kiran

Papers from MLN will be published as post-proceedings in Springer’s Lecture Notes in Computer Science (LNCS).

ESnet Highlights from the National Science Foundation’s Cybersecurity Summit ’21

The National Science Foundation (NSF) Cybersecurity Center of Excellence, Trusted CI Project hosts a yearly cybersecurity summit, inviting people from various NSF-funded research organizations to share innovations and ideas. Here are some videos of ESnet presentations.

Scott Campbell presented “ESnet Security Group Impact on Network Architecture” where he discussed some of the social, technical, and architectural outcomes of the ESnet6 network upgrade that were beneficial to the organization. By being involved early, security design elements were incorporated into workflows at early stages and were both tightly integrated and vetted during the core design process. This early involvement also heightened the security group’s visibility, which led to a better understanding of how the various groups interact and their different methods of problem-solving and time management.

Eli Dart and Fatema Bannat Wala presented “Best practices for securing Science DMZ,” focusing on disentangling security policies and enforcement for science flows from traditional security approaches for business systems, and use of the Science DMZ model to protect high-performance science flows. They discussed thinking of the Science DMZ as a security architecture that provides useful and implementable security controls without impacting performance. 

ESnet Scientists awarded best paper at SC21 INDIS!

A combined team from ESnet and Lehigh University was awarded the best paper for Exploring the BBRv2 Congestion Control Algorithm for use on Data Transfer Nodes at the 8th IEEE/ACM International Workshop on Innovating the Network for Data-Intensive Science (INDIS 2021), which was held in conjunction with the 2021 IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC21) on Monday, November 15, 2021.

The team was comprised of:

  • Brian Tierney, Energy Sciences Network (ESnet)
  • Eli Dart, Energy Sciences Network (ESnet)
  • Ezra Kissel, Energy Sciences Network (ESnet)
  • Eashan Adhikarla, Lehigh University

The paper can be found here. Slides from the presentation are here. In this Q+A, ESnet spoke with the award-winning team about their research — answers are from the team as a whole.

INDIS 21 Best Paper Certificate

The paper is based on extensive testing and controlled experiments with the BBR (Bottleneck Bandwidth and Round-trip propagation time), BBRv2 and the Cubic Function Binary Increase Congestion Control (CUBIC) Transmission Control Protocol (TCP) Internet congestion algorithms. What was the biggest lesson from this testing?

BBRv2 represents a fundamentally different approach to TCP congestion control. CUBIC (as well as Hamilton, Reno, and many others) are loss-based, meaning that they interpret packet loss as congestion and therefore require significant network engineering effort to achieve high performance. BBRv2 is different in that it measures the network path and builds a model of the path – it then paces itself to avoid loss and queueing. In practical terms, this means that BBRv2 is resilient to packet loss in a way that CUBIC is not. This comes through loud and clear in our data.

What part of the testing was the most difficult and/or interesting?

We ran a large number of tests in a wide range of scenarios. It can be difficult to keep track of all the test configurations, so we wrote a “test harness” in python that allowed us to keep track of all the testing parameters and resulting data sets.

The harness also allowed us to better compare results collected over real-world paths to those in our testbed environments. Managing the deployment of the testing environment though containers also allowed for rapid setup and improved reproducibility. 

You provide readers with links to great resources so they can do their own testing and learn more about BBRv2. What do you hope readers will learn?

We hope others will test BBRv2 in high-performance research and education environments. There are still some things that we don’t fully understand, for example there are some cases where CUBIC outperforms BBRv2 on paths with very large buffers. It would be great for this to be better characterized, especially in R&E network environments.

What’s the next step for ESnet research into BBRv2? How will you top things next year?

We want to further explore how well BBRv2 performs at 100G and 400G. We would also like to spend additional time performing a deeper analysis of the current (and newly generated) results to gain insights into how BBRv2 performs compared to other algorithms across varied networking infrastructure. Ideally we would like to provide strongly substantiated recommendations on where it makes sense to deploy BBRv2 in the context of research and educational network applications.

Arecibo Support Wins SC21 HPCwire Readers’ Choice Award!

Arecibo dish after the collapse

As part of a team spanning 15 government, academic, and industrial partners, the Engagement and Performance Operations Center (EPOC) – a collaboration between Indiana University and ESnet – was awarded the “Best HPC Collaboration (Academia/Government/Industry)” HPCwire Readers’ Choice award on Tuesday, Nov. 16. The award, which was made at the High Performance Computing, Networking, Storage and Analysis (SC21) conference, recognizes the effort and collaboration required to move and safeguard irreplaceable data (over 50 years of astronomical observations) from the Arecibo observatory following the structural collapse of this scientific resource in 2016.

At ESnet, Ken Miller, George Robb, and Jason Zurawski supported these efforts as both full members of EPOC and ESnet staff. Both Jason and Ken divide their time between ESnet’s Science Engagement Team, while George is with ESnet’s Infrastructure Systems group. LightBytes looped up with Jason Zurawski to get his thoughts on the project and award, and an update on the Arecibo effort since our post in April 2021 on this project.


Now that data from Arecibo has been migrated to the Texas Advanced Computing Center (TACC), what happens now, and how will this data be used?

The team at the University of Central Florida has been engaged with TACC on several ways to build up the capabilities for their data analysis and sharing requirements. They are working to deploy a portal that will allow researchers access to the data, as well as build workflows to investigate and process using computation provided by TACC.

The team at Arecibo is also still going to process much older data that still resides on tape. Due to the delicate state of the media, it is carefully being read and transferred to on-island storage before being transmitted to TACC for archiving. This work will take several more months to complete.

What do you think the lessons from this effort are in terms of getting so many different organizations to work together to support this very challenging problem?

The collapse that Arecibo experienced sent ripples through the R&E community because researchers and technology professionals alike knew there was a limited window to act on replicating important observations gathered over the years. The partners in this effort were motivated to act, and that removed many barriers to putting some solutions in place. Everyone collaborated efficiently with their core competencies, and we continue to work together as the next steps for the scientific collaboration are planned.

Plans are starting to emerge for a “next generation” Arecibo based on the loss of this instrument, how might the next generation of data management resources be shaped by this collaboration?

Now that there has been some time to evaluate the work, it has also spurred UCF and Arecibo to plan for the future with respect to computation, storage, and network connectivity both in Puerto Rico and in Florida.  With these improvements planned, they will be well-positioned to serve the scientific data for years to come.  New instruments will no doubt increase the data demands by many orders of magnitude – addressing all aspects of the data pipeline now, and then gradually increasing the capabilities over time, will help to prepare for these emerging challenges. 

Congratulations to all of the organizations and staff who helped prevent the loss of this data!

Looking back at ESnet’s 2020

Advancing our strategy and shaping our position on the board.
Some thoughts from Inder on the year-that-was.

Miniature from Alfonso X’s Libro del axedrez dados et tablas (Book of chess, dices and tables), c. 1283. , Public domain, via Wikimedia Commons

Dear Friends, Well-wishers, Colleagues, and all of ESnet,

Chess! 2020 has been much more challenging than this game. It’s also been a year where we communicated through the squares on our zoom screens, filled with faces of our colleagues, collaborators, and loved ones.

In January, Research and Education leaders came together in Hawaii at the Pacific Telecommunications Council meeting to discuss the future of networking across the oceans. It was impossible to imagine then that we would not be able to see each other again for such a long time. Though thanks to those underwater cables, we have been able to communicate seamlessly across the globe.

Looking back at 2020, we not only established a solid midgame position on our ESnet chessboard, but succeeded in ‘winning positions’ despite the profound challenges. The ESnet team successfully moved our network operations to be fully remote (and 24/7) and accomplished several strategic priorities. 

ESnet played some really interesting gambits this year: 

  1. Tackled COVID-related network growth and teleworking issues for the DOE complex
    • We saw a 4x spike in remote traffic and worked closely across several Labs to upgrade their connectivity. We continue to address the ever-growing demand in a timely manner. 

    • As we all shifted to telework from home, ESnet engineers developed an impromptu guide that was valuable to troubleshoot our home connectivity issues. 
  2. Progressed greatly on implementing our next-generation network, ESnet6
    • We deployed and transitioned to the ESnet6 optical backbone network, with 300 new site installations, 100’s of 100G waves provisioned, with just six months of effort, and while following pandemic safety constraints. I am grateful to our partners Infinera (Carahsoft) and Lumen for working with our engineers to make this happen. Check out below how we decommissioned the ESnet5 optical network and lit up the ESnet6 network.
    • Installed a brand new management network and security infrastructure upgrades along with significant performance improvements.
    • We awarded the new ESnet6 router RFP (Congratulations Nokia and IMPRES!); the installs start soon.
    • Issued another RFP for optical transponders, and will announce the winner shortly.
  3. Took initiative on several science collaborations to address current and future networking needs
    • We brainstormed new approaches with the Rubin Observatory project team, Amlight, DOE and NSF program managers to meet the performance and security goals for traffic originating in Chile. We moved across several countries in South America before reaching the continental U.S. in Florida (Amlight), and eventually the U.S. Data Facility at SLAC via ESnet.
    • Drew insights through deep engagement of ESnet engineers with the High Energy Physics program physicists, for serving the data needs of their current and planned experiments expediently.
      Due to the pandemic, a two-day immersive in-person meeting turned into a multi-week series of Zoom meetings, breakouts, and discussions.
    • When an instrument produces tons of data, how do you build the data pipeline reliably? ESnet engineers took on this challenge, and worked closely with the GRETA team to define and develop the networking architecture and data movement design for this instrument. This contributed to a successful CD 2/3 review of the project—a challenging enough milestone during normal times, and particularly tough when done remotely. 
    • Exciting opening positions were created with EMSL, FRIB, DUNE/SURF, LCLS-II…these games are still in progress, more will be shared soon. 
  4. Innovated to build a strong technology portfolio with a series of inspired moves
    • AI/ML
      • We demonstrated Netpredict, a tool using deep learning models and real-time traffic statistics to predict when and where the network will be congested. Mariam’s web page showcases some of the other exciting investigations in progress. 
      • Richard and his collaborators published Real-time flow classification by applying AI/ML to detailed network telemetry.
    • High-touch ESnet6 project
      • Ever dream of having the ability to look at every packet, a “packetscope”, at your fingertips? An ability to create new ways to troubleshoot, performance engineer, and gain application insights? We demonstrated a working prototype of that vision at the SC20 XNET workshop
    • SENSE
      • We deployed a beta version of software that provides science applications the ability to orchestrate large data flows across administrative domains securely. What started as a small research project five years ago (Thanks ASCR!) is now part of the AutoGOLE project initiative in addition to being used for Exascale Computing Project (ECP) project, ExaFEL.
    • TCP
      • Initiated the Q-Factor project this year, a research collaboration with Amlight, funded by NSF. The project will enable ultra-high-speed data transfer optimization by TCP parameter tuning through the use of programmable dataplane telemetry: https://q-factor.io/
      • We testbed thoroughly the interactions between TCP congestion control algorithms, BBRv2 and CUBIC. A detailed conversation with Google, the authors of the BBRv2 implementation, is in progress.
  5. Initiated strategic new games, with a high potential for impact
    • FABRIC/FAB
      • Executed on the vision and design of a nationwide @scale research testbed working alongside a superstar multi-university team.
      • With the new FAB grant, FABRIC went international with plans to put nodes in Bristol, Amsterdam, Tokyo and Geneva. More locations and partners are possibilities for the future.  
    • Edge Computing
      • Created an prototype FPGA-based edge-computing platform for data-intensive science instruments in collaboration with the Computational Research Division and Xilinx. Look for exciting news on the blog as we complete the prototype deployment of this platform.
    • Quantum
    • 5G
      • What are the benefits of widespread deployment of 5G technology on science research? We contributed to the development of this important vision at a DOE workshop. New and exciting pilots are emerging that will change the game on how science is conducted. Stay tuned. 

Growth certainly has its challenges. But, as we grew, we evolved from our old game into an adept new playing style. I am thankful for the trust that all of you placed in ESnet leadership, vital for our numerous, parallel successes. Our 2020 reminds me of the scene in Queen’s Gambit where the young Beth Harmon played all the members of a high-school chess team at the same time. 

Several achievements could not make it to this blog, but are important pieces on the ESnet chess board. They required immense support from all parts of ESnet, CS Area staff, Lab procurement, Finance, HR, IT, Facilities, and Communications partners.

I am especially grateful to the DOE Office of Science, Advanced Scientific Computing Research leadership, NSF, and our program manager Ben Brown, whose unwavering support has enabled us to adapt and execute swiftly despite blockades. 

All this has only been possible due to the creativity, resolve, and resilience of ESnet staff — I am truly proud of each one of you. I am appreciative of the new hires that trusted their careers with us and joined us remotely—without shaking hands or even stepping foot at the lab.

My wish is for all to stay safe this holiday season, celebrate your successes, and enjoy that extra time with your immediate family. In 2021, I look forward to killer moves on the ESnet chessboard, while humanity checkmates the virus. 

Signing off for the year, 

Inder Monga

Charting a resilient path for the future

The ESnet Site Coordinating Committee (ESCC) meeting on 22-23 October was attended by over 50 members representing all of the major DOE sites and projects supported by our team. This was the first ESCC meeting held via Zoom.

The meeting focused on network resiliency, both on lessons learned from adapting to working from home, as well as longer term plans for ESnet6. 

Highlights of the sessions were ESnet’s director Inder Monga’s presentation on the ways we ensured operational continuity during the pandemic. 

Inder Monga presents on ESnet's support for the DOE complex during the pandemic
Inder Monga presents on ESnet’s support for the DOE complex during the pandemic

The DOE ESnet program manager, Ben Brown, provided a vision for future research opportunities as well as future operational needs for the nation’s scientific complex.

DOE's Ben Brown presents on future objectives and priorities
DOE’s Ben Brown presents on future objectives and priorities

Attendees identified several key activities for ESCC collaboration as part of advancing shared resilience goals. Foremost among these is the creation of a working group to develop improved metrics for ESnet resilience, to identify ways that resilience features can be better incorporated into infrastructure funding and planning, and to establish better ways to engage scientific programs into risk management processes.

ESnet thanks ESCC participants for attending and we look forward to returning to in-person ESCC meetings in future!

A sample of attendees zoom screenshots - the first ESCC via Zoom
A sample of attendees – the first ESCC via Zoom