Diagram of ESnet6’s peering points for the new Cloud Connect Service
By Joshua Stewart, ESnet
Part of managing a network dedicated to handling vast swaths of scientific data is also ensuring it adapts to trends for how data is being created, stored, and computed. A pattern has emerged in recent years allowing for access to elastic and scalable systems on demand. Nebulously titled “The Cloud,” it refers to software and services that run over the public internet. For ESnet, this is just another place where science intends to happen.
To drill down more on the nebulosity of the term “The Cloud,” there are different flavors of how the services/software are consumed. “Public Cloud” refers to services and software that are open for all users and subscribers around the world: for example, those provided by Dropbox, Slack, Salesforce, and Office 365. Meanwhile, as its name suggests, a Virtual Private Cloud (VPC) is an environment in which all virtualized hardware and software resources are dedicated exclusively to, and accessible only by, a single organization. The intention of a VPC is to emulate the on-premise data centers of old while removing the headaches of managing their physicality (space and power constraints), and offering the added benefit of instantaneous access to scale when needed. Although some organizations decided to go all-in on the new virtual environments by harnessing a cloud-native posture, some took a more measured approach by seamlessly blending their on-premises infrastructure with the new virtualized territory, in a format also known as a hybrid cloud.
As usage of virtual private clouds grew, it became apparent that connectivity over the public internet was too unreliable, slow, and insecure: dedicated, high-bandwidth connectivity was a must-have. In response, every major Cloud Service Provider (CSP) launched an offering. Amazon Web Services (AWS) was first, launching “Direct Connect” in 2012; Azure followed in 2014 with its “ExpressRoute”; and in 2017, Google launched Cloud Interconnect. (Read more about the history.)
These virtual circuits are the driver behind the new ESnet Cloud Connect service aimed at supporting both scientific and enterprise workloads. The goal is to carve out a dedicated, high-bandwidth path (up to 10 Gbps) across ESnet’s 400GE-capable backbone from any supported user facility to the nearest cloud on-ramp by utilizing two interim network service providers: Packet Fabric and Equinix. From there, ESnet would help provision the major CSPs’ (Azure, AWS, GCP) aforementioned flavor of dedicated connectivity into your Virtual Private Cloud.
This solution is designed to scale from simple dedicated connectivity and a singular cloud provider to a virtual routed network utilizing multiple cloud providers, onramps, and interconnecting user facilities. This series of blog posts will focus on a few suggested use cases for utilizing ESnet’s new service offering. For questions or to learn more, email Joshua Stewart.
Throughout the world, earth and environmental scientists are deploying new kinds of sensors to measure and understand how the climate is changing and how we can best manage key infrastructure and resources in response.
Operation and data analysis of these sensors can often be challenging, as they are deployed in areas with limited power, sometimes with no data connectivity beyond the periodic physical collection of memory cards. Sensors may be in areas where weather and other factors make access laborious and challenging, such as at the top of a mountain, down a borehole, or under dense forest canopy.
Solar-powered meteorological and hydrological sensors deployed at the Snodgrass Field Site, Crested Butte, July 2022 at approximately 9,000 ft. elevation. (Photo: Andrew Wiedlea)
As the number, types, and capabilities of these sensors increases, the U.S. Department of Energy’s (DOE) Energy Sciences Network (ESnet) is working on ways to extend its high-speed network to support the needs of scientists working in remote, resource-challenged environments where our fiber backbone cannot be extended. Using advanced wireless technologies such as low-Earth orbit constellations, 5G, and private citizen band radio system cellular, mmWave, and Internet-of-Things tools like long-range (LoRa) mesh networks, we are developing ways to remove the limits of geographical constraints from field scientists, just as we have traditionally sought to do for laboratory scientists around the DOE complex.
The purpose of this effort is to assess requirements for operation of a private 4G/5G wireless network in a remote and changing environment, which can pull ESnet capabilities and services supporting scientific research out beyond our performant 13,000 km optical backbone. We are also using this research to identify specific operational, workflow, and data movement needs for the Earth and environmental science community as part of building ESnet’s logistics, operational, and human capital resources available to support the Earth and environmental science mission.
Our system, which is currently being configured, is built around a Nokia Digital Automation Cloud private cellular capability, with antennas being placed across a valley from sensor fields at the Snodgrass Field Site in Crested Butte. The intent is to use this cellular service to automate and improve the efficiency of data collection from sensors, using cellular routers and radios, depending on the specific capabilities of each sensor system. For those sensor systems that cannot be directly connected to a cellular network, we are establishing solar-powered sensor stations that will provide local area bridge (several hundred meter) connectivity to local sensors via wifi, LoRa, or direct ethernet cable.
Once data is backhauled from a sensor field through our private cellular network, it will be transmitted back to ESnet via SpaceX’s Starlink low earth orbit satellite system, connecting to ESnet at a peering location in Seattle, Washington, and then through our optical backbone to the National Energy Research Scientific Computing Center at Berkeley Lab for processing and storage.
With fantastic assistance and collaboration from the Atmospheric Radiation Monitoring program, the Rocky Mountain Biological Laboratory, and Dan Feldman and Charulekha Varadarajan in the Watershed Function Science Focus Area at Berkeley Lab, our first field campaign was both great fun and extremely productive.
We will return later in the Fall to complete network configuration and connection of sensors to the network. Once this is done, we can begin the next phase of this research: studying the operational performance and service requirements necessary to support field science through the demanding conditions provided by winter in the Colorado High Rockies. We will also begin to develop standard deployment equipment specifications and practices that we can use to support ESnet wireless edge deployments supporting science in other regions and for other purposes.
This effort is being made possible by teamwork across ESnet and Berkeley Lab, including outstanding support at Berkeley Lab from Chris Tracy, Jackson Gor with ESnet network engineering, and Steve Nobles and many others with IT Telephone Services. The Colorado deployment success depended on the hard (often physical) work of Stijn Wielandt-EESA, Kate Robinson (ESnet Network Engineering), Jeff D’Ambrogia (IT-Science IT), and Jeff Chavez with Nokia.
An ESnet Private Cellular Antenna on the side of the ARM Crested Butte Meteorological Radar. (Photo: Andrew Wiedlea)
The view back toward Snodgrass from the ARM Radar Site (Photo: Andrew Wiedlea).
ARM shipping container containing the ESnet cellular system. Note the Starlink Antenna being used for data backhaul to ESnet’s optical backbone. (Photo: Andrew Wiedlea)
Learning about the rigors of field science, part 1: A sudden storm at altitude traps Jeff D’Ambroglia (Berkeley Lab-Science IT) and Stijn Wielandt (Berkeley Lab-EESA) in the ARM trailer as we deploy the CBRS system. (Photo: Andrew Wiedlea)
Learning about the rigors of field science, part 2 Kate Robinson (ESnet) and Andrew Wiedlea (ESnet) drip dry in the ARM trailer as the storm passes. (Photo: Andrew Wiedlea)
A completed solar sensor station, providing Cellular, Wifi, and (soon) LoRa backhaul to the sensors located in the field via our ESnet Private Cellular/CBRS. (Photo: Andrew Wiedlea)
Learning about the rigors of field science, part 3 Stijn Wielandt (LBNL-EESA) loads solar cellular sensor station equipment onto an ATV for deployment at Snodgrass (Photo: Andrew Wiedlea)
Learning about the rigors of field science, part 4 Jeff D’Ambrogia, Stijn Wieland, and a member of RMBL staff setting up a solar sensor station at Snodgrass. Everything is heavy and there’s always a need to improvise building atop a mountain at altitude. (Photo: Andrew Wiedlea)
ZeekWeek, an annual Fall conference organized by the Zeek Project, took place online from October 13-15 this year. The conference had over 2000 registered participants from the open source user community this year, who got together to share the latest and greatest about this cyber-security and network monitoring software tool.
Berkeley Lab staff member Vern Paxson developed the precursor to the Zeek intrusion detection software, then called Bro, in 1994. As an early adopter, ESnet’s cybersecurity team has strong relationships with the Zeek community, and this ZeekWeek was an opportunity to showcase advances and uses made by the software by ESnet and the entire Research and Educational Networking Community.
Fatema Bannat Wala also did a training session on “Introduction to Zeek,” which provided hands-on experience with Zeek tools and information about how to get involved with the collaboration.
ESnet’s cybersecurity team looks forward to continued collaboration with the Zeek community, attending next year’s ZeekWeek, and to contributing future code enhancements to this great software ecosystem.
Sheng Shen, Mariam Kiran, and Bashir Mohammed have just been awarded the Best Paper award at the International Conference on Machine Learning for Networking (MLN). Sponsored by the Conservatoire National des Arts et Métiers (CNAM), the École Supérieure d’Ingénieurs en Électrotechnique et Électronique (ESIEE), and Laboratoire d’Informatique Gaspard-Monge (LIGM), MLN is being held virtually 1-3 December 2021.
The paper, “DynamicDeepFlow: An Approach for Identifying Changes in Network Traffic Flow Using Unsupervised Clustering,” uses a hybrid of deep learning variational autoencoder model and a shallow learning k-means to help identify unique traffic patterns across ESnet. These unique patterns can help identify if a new experiment has started or whether current network bandwidth is changing.
“We’re very excited to receive this recognition and the conference was a wonderful opportunity to exchange thoughts and ideas with peers in France. MLN is a conference dedicated to discussing machine learning applications in networks. Our next task is to integrate DynamicDeepflow with Netpredict to show real-time information in ESnet data” — Mariam Kiran
Papers from MLN will be published as post-proceedings in Springer’s Lecture Notes in Computer Science (LNCS).
Scott Campbell presented “ESnet Security Group Impact on Network Architecture” where he discussed some of the social, technical, and architectural outcomes of the ESnet6 network upgrade that were beneficial to the organization. By being involved early, security design elements were incorporated into workflows at early stages and were both tightly integrated and vetted during the core design process. This early involvement also heightened the security group’s visibility, which led to a better understanding of how the various groups interact and their different methods of problem-solving and time management.
Eli Dart and Fatema Bannat Wala presented “Best practices for securing Science DMZ,” focusing on disentangling security policies and enforcement for science flows from traditional security approaches for business systems, and use of the Science DMZ model to protect high-performance science flows. They discussed thinking of the Science DMZ as a security architecture that provides useful and implementable security controls without impacting performance.
A combined team from ESnet and Lehigh University was awarded the best paper for Exploring the BBRv2 Congestion Control Algorithm for use on Data Transfer Nodes at the 8th IEEE/ACM International Workshop on Innovating the Network for Data-Intensive Science (INDIS 2021), which was held in conjunction with the 2021 IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC21) on Monday, November 15, 2021.
The team was comprised of:
Brian Tierney, Energy Sciences Network (ESnet)
Eli Dart, Energy Sciences Network (ESnet)
Ezra Kissel, Energy Sciences Network (ESnet)
Eashan Adhikarla, Lehigh University
The paper can be found here. Slides from the presentation are here. In this Q+A, ESnet spoke with the award-winning team about their research — answers are from the team as a whole.
The paper is based on extensive testing and controlled experiments with the BBR (Bottleneck Bandwidth and Round-trip propagation time), BBRv2 and the Cubic Function Binary Increase Congestion Control (CUBIC) Transmission Control Protocol (TCP) Internet congestion algorithms. What was the biggest lesson from this testing?
BBRv2 represents a fundamentally different approach to TCP congestion control. CUBIC (as well as Hamilton, Reno, and many others) are loss-based, meaning that they interpret packet loss as congestion and therefore require significant network engineering effort to achieve high performance. BBRv2 is different in that it measures the network path and builds a model of the path – it then paces itself to avoid loss and queueing. In practical terms, this means that BBRv2 is resilient to packet loss in a way that CUBIC is not. This comes through loud and clear in our data.
What part of the testing was the most difficult and/or interesting?
We ran a large number of tests in a wide range of scenarios. It can be difficult to keep track of all the test configurations, so we wrote a “test harness” in python that allowed us to keep track of all the testing parameters and resulting data sets.
The harness also allowed us to better compare results collected over real-world paths to those in our testbed environments. Managing the deployment of the testing environment though containers also allowed for rapid setup and improved reproducibility.
You provide readers with links to great resources so they can do their own testing and learn more about BBRv2. What do you hope readers will learn?
We hope others will test BBRv2 in high-performance research and education environments. There are still some things that we don’t fully understand, for example there are some cases where CUBIC outperforms BBRv2 on paths with very large buffers. It would be great for this to be better characterized, especially in R&E network environments.
What’s the next step for ESnet research into BBRv2? How will you top things next year?
We want to further explore how well BBRv2 performs at 100G and 400G. We would also like to spend additional time performing a deeper analysis of the current (and newly generated) results to gain insights into how BBRv2 performs compared to other algorithms across varied networking infrastructure. Ideally we would like to provide strongly substantiated recommendations on where it makes sense to deploy BBRv2 in the context of research and educational network applications.
Ever want to know how big research data moves around the globe? ESnet plays a significant role in supporting the great scientific conversations, collaborations, and experiments underway, wherever and whenever they occur. We move Exabytes of data around the world creating a global laboratory that accelerates scientific discovery.
In order to meet these needs of scientists, we are constantly looking for opportunities to expand our capabilities with our next generation network ESnet6, intelligent edge analytics, advanced network testbeds, 5G wireless, quantum networking and more.
High-speed intelligent Research and Educational Networks (RENs), such as the one we’re building as part of the ESnet 6 program, will require a greater ability to understand and manage traffic flows. One research program underway to provide this capability is the High Touch effort, a programmable, scalable, and expressive hardware and software solution that produces and analyzes per-packet telemetry information with nanosecond-accurate timing. Along with Zhang Liu, Bruce Mah, Yatish Kumar, and Chin Guok, I have just released a presentation for the Proceedings of the 2021 Virtual Meeting on Systems and Network Telemetry and Analytics, describing work underway to create a programmable, very high speed, packet monitoring, and telemetry capability as part of bringing High-Touch to life.
Two graduate students working with ESnet have published their papers recently in IEEE and ACM workshops.
Bibek Shrestha, a graduate student at the University of Nevada, Reno, and his advisor Engin Arslan worked with Richard Cziva from ESnet to publish a work on “INT Based Network-Aware Task Scheduling for Edge Computing”. In the paper, Bibek investigated the use of in-band network telemetry (INT) for real-time in-network task scheduling. Bibek’s experimental analysis using various workload types and network congestion scenarios revealed that enhancing task scheduling of edge computing with high-precision network telemetry can lead up to a 40% reduction in data transfer times and up to 30% reduction in total task execution times by favoring edge servers in uncongested (or mildly congested) sections of the network when scheduling tasks. The paper will appear in the 3rd Workshop on Parallel AI and Systems for the Edge (PAISE) co-conducted with IEEE IPDPS 2021 conference to be held on May 21st, 2021, in Portland, Oregon.
Zhang Liu, a former ESnet intern and a current graduate student at the University of Colorado at Boulder, worked with the ESnet High Touch Team – Chin Guok, Bruce Mah, Yatish Kumar, and Richard Cziva – on fastcapa-ng, ESnet’s telemetry processing software. In the paper “Programmable Per-Packet Network Telemetry: From Wire to Kafka at Scale,” Zhang showed the scaling and performance characteristics of fastcapa-ng, and highlighted the most critical performance considerations that allow the pushing of 10.4 million telemetry packets per second to Kafka with only 5 CPU cores, which is more than enough to handle 170 Gbit/s of original traffic with 1512B MTU. This paper will appear in the 4th International Workshop on Systems and Network Telemetry and Analytics (SNTA 2021) held at the ACM HPCD 2021 conference in Stockholm, Sweden between 21-25 June 2021.
Congratulations Bibek and Zhang!
If you are a networked systems research student looking to collaborate with us on network measurements, please reach out to Richard Cziva. If you are interested in a summer internship with ESnet, please visit this page.