EJFAT prototype demonstrates proof of concept for connecting scientific instruments with remote high-performance computing for rapid data processing
Scientists at Thomas Jefferson National Accelerator Facility (Jefferson Lab) clicked a button and held their collective breaths. Moments later, they exulted as a monitor showed steady saturation of their new 100 gigabit-per-second connection with raw data from a nuclear physics experiment. Across the country, their collaborators at Energy Sciences Network (ESnet) were also cheering: the data torrent was streaming flawlessly in real time from 3,000 miles away, across the ESnet6 network backbone, and into the National Energy Research Scientific Computing Center’s (NERSC‘s) Perlmutter supercomputer at Lawrence Berkeley National Laboratory (Berkeley Lab).
Once it reached NERSC, 40 Perlmutter nodes (more than 10,000 cores) massively processed the data stream and sent the results back to Jefferson Lab in real time for validation, persistence, and final physics analysis. This was achieved without the need for any buffering or temporal storage and without experiencing data loss or latency-related problems. (In this context, “real time” means streamed continuously while processing is performed, with no significant delays or storage bottlenecks.)
This was only a test — but not just any test. “This was a major breakthrough for the transmission and processing of scientific data,” said Graham Heyes, Technical Director of the High Performance Data Facility (HPDF). “Capturing this data and processing it in real time is challenging enough; doing it when the data source and destination are separated by distances on continental scales is very difficult. This proof-of-concept test shows that it can be done and will be a game changer.”
Twenty-five years ago Saturday, on August 3, 1999, IPv6 history was made: ESnet was issued its IPv6 production netblock, which is still in use today. The American Registry for Internet Numbers (ARIN) assigned the very first block out of its 2001:400::/23 allocation, and since then, ESnet has numbered its production IPv6 services out of 2001:400::/32. (The netblock would have originally been a /35, later increased to /32 automatically by the RIRs.)
This was the first production IPv6 address allocation in North America, and possibly the world. Akira Kato, a global Internet pioneer, member of the WIDE Project, and a professor at Keio University, notes that the WIDE project did receive the first allocation in the Asia-Pacific region 8 days after the allocation of ESnet’s space. WIDE, along with its sub-project, KAME, would go on to develop critical IPv6 software, some of which is still in use today. (WIDE continues to have the numerically-lowest globally routable production IPv6 address, 2001:200::/32.)
IPv6 is the current generation Internet Protocol, and it has been in an extended process of supplanting the now-legacy IPv4. Developed in the early 1990s, it was adopted by ESnet as part of a larger effort to lead by example–by embracing cutting-edge protocols and technologies. IPv6’s massive address space was originally seen as a way of effectively “saving the Internet” from potential collapse due to the exhaustion of IPv4’s more limited address space. While mechanisms such as network address translation (NAT) were developed to extend IPv4’s life, ESnet quickly recognized the drawbacks that these mechanisms created, especially in the high-performance computing and networking environments ESnet supported. Hence, the move to IPv6 was an obvious one for ESnet, and the organization encouraged other National Labs to follow suit.
Before Production
ESnet’s involvement in IPv6 predates our effective move to production by at least three years. As we noted in 2021:
Bob Fink, Tony Hain, and Becca Nitzan [all ESnet staff at the time, circa 1996] spearheaded early IPv6 adoption processes, and their efforts reached far beyond ESnet and the Department of Energy (DOE). The trio were instrumental in establishing a set of operational practices and testbeds under the auspices of the Internet Engineering Task Force–the body where IPv6 was standardized–and this led to the development of a worldwide collaboration known as the 6bone. 6bone was a set of tunnels that allowed IPv6 “islands” to be connected, forming a global overlay network. More importantly, it was a collaboration that brought together commercial and research networks, vendors, and scientists, all with the goal of creating a robust internet protocol for the future.
In addition to 6bone, ESnet was one of three principals that created and managed 6tap, one of the first IPv6 internet exchanges. Collocated at the Starlight research and education network exchange in Chicago, 6tap represented a collaboration between ESnet and Canadian partners CANARIE and Viagenie.
Moving to Native IPv6
With experience gained in the 6bone and 6tap projects, ESnet staff, with other partners in the Internet, began to advance the cause of the creation of a production IPv6 network. The goal would ultimately be a native IPv6 network–no overlays, no tunnels. This would require new blocks of address space to be delegated to the still-nascent Regional Internet Registries (at the time, ARIN in the Americas, APNIC in Asia-Pacific, and RIPE-NCC in Europe and the Middle East/Africa). Those RIRs would need policies and procedures for allocating IPv6 address space. But first, they needed to get their own blocks from the Internet Assigned Numbers Authority (IANA). After nearly two years of deliberation (and much input from ESnet staff to bodies like the IETF and RIRs), IANA made the assignments on July 14, 1999:
After much discussion concerning the policy guidelines for the deployment of IPv6 addresses, in addition to the years of technical development done throughout the Internet community, the IANA has delegated the initial IPv6 address space to the regional registries in order to begin immediate worldwide deployment of IPv6 addresses.
We would like to thank the current Regional Internet Registries (RIR) for their invaluable work in the construction of the policy guidelines, which seem to have general consensus from the Internet community. We would also like to thank the efforts of the IETF community and the support of the IAB in making this effort a reality.
The stage was now set for ESnet to receive the first public production IPv6 addresses, and help usher in a new era for the Internet. Thus, while Tony Hain and Bob Fink were busy chairing IETF working groups and writing RFCs, Becca Nitzan began building a production IPv6 network within ESnet. Nitzan configured her DEC Alpha workstation, hershey.es.net, to be dual stack–possibly the first workstation in North America, if not the world, to be on a production IPv6 network.
hershey.es.net, a DEC Alpha Personal Workstation, is reputed to be the first computer on the production IPv6 network. It still runs (OpenBSD now instead of Digital Unix) and also serves as a valuable foot-rest and space-heater for the author.
ESnet continues to maintain hershey’s historic EUI64-based IPv6 AAAA entry in the public es.net domain.
2008 – Present: New Generation(s) Take Over
Eventually, Hain, Fink, and Nitzan all moved on from ESnet, either into the private sector or retirement, but a new generation took over to continue IPv6 development in the Department of Energy’s research network:
Kevin Oberman carried the torch of IPv6 evangelist in ESnet, promoting the protocol and educating DoE sites about the virtues of the new technology.
Michael O’Connor developed the first tools for network monitoring and topology discovery in ESnet–and made them exclusively run over IPv6.
When it came time for Oberman to retire, Michael Sinatra transferred from UC Berkeley to ESnet to become IPv6 and DNS subject matter expert. He proceeded to unite IPv6 and IPv4 routing under a single interior routing protocol, developed tools to manage the IPv6 side of ESnet’s rich peering infrastructure, continued educating research and education and government networks on the risks of not adopting IPv6, and is currently involved in building a segment-routed network capable of carrying both IPv4 and IPv6 traffic while only using IPv6–eliminating IPv4–for the control plane.
Nick Buraglio joined ESnet in 2013 and has continued in the tradition of Hain and Fink, spearheading DoE and other US Government agency adoption of IPv6–and, more importantly, abatement of IPv4 within those agencies. He is now a co-chair of the IPv6 Operations (v6ops) Internet Engineering Task Force (IETF) working group, and continues to advance and evangelize for the protocol.
Buraglio, Paul Wefel, Dylan Jacob, and John J. Christman worked to design and construct an IPv6-only management plane for the ESnet6 network, requiring substantial cooperation from vendors, some of whom had not previously had any requirements for IPv6-only management capabilities. It has allowed ESnet to inch closer to achieving the US Government’s IPv4 abatement mandates.
Dale Carder joined ESnet in 2016 and has recently worked on an innovative use of the IPv6 packet header flow label field as a mechanism to mark IPv6 traffic based on the scientific project that is generating and receiving the traffic. This is providing valuable information on how, for example, the various Large Hadron Collider research projects are using the network and the relative network capacity demands they are producing.
Going into the second quarter-century of production IPv6 in ESnet, IPv6 is no longer a core component of ESnet’s services. It is the core component. Nick and I continue to work together in our respective roles, along with other ESnet engineers, to advance the protocol, both from a strategic perspective and in the day-to-day operations of our organization.
As we move into another new era–one characterized by the movement away from dual-stack to an IPv6-only world, expect Nick, myself, and others to continue our work in bringing the Internet beyond IPv4. In the next few weeks, I plan to revisit my “Risks of not deploying IPv6” in an effort to determine what risks have been realized and what have been mitigated over the past dozen years. We will also be doing an overview of the most recent LHC Data Challenge, with a focus on IPv6’s nearly exclusive role as the data-transfer substrate for this major high-energy physics project. Nick will continue to advance IPv6 operations via his role in the IETF. Members of ESnet’s Data-centers and Facilities Team are inching closer to IPv6-only data centers and zero-touch provisioning (ZTP). My hope is that we will make as much progress in the next five years as we have in the previous twenty-five. Regardless, ESnet’s role in the history of IPv6 has prepared us well to lead us into a bright, IPv6-only future.
A recent post on Microsoft’s Networking blog, clickably titled “Three Reasons Why You Should Not Use iPerf3 on Windows,” caused a mini-kerfuffle in the world of network speed measurement and a lively discussion in the comments section. Although the post has since been updated with some important disclaimers — the no. 1 answer to the question being that ESnet has never and still does not support Windows for iperf3 — ESnet’s iperf3 team wanted to set the record straight on a few additional points publicly for anyone who might still be confused.
A little background on ESnet and iperf3 ESnet (www.es.net) provides scientific networking services to support the U.S. Department of Energy and its national laboratories, user facilities, and scientific instruments complex. We developed iperf3 as a rewrite of iperf2 in order to be able to test the end-to-end performance of networks doing large transfers of scientific data. The primary consumer of iperf3 is the perfSONAR measurement system), which is widely used in the research and education (R&E) networking community. iperf3 is of course also usable as a standalone tool, which is one of the reasons it’s been released separately on GitHub. Many large corporations, including SpaceX’s Starlink and Comcast, are using it to measure their own networks. It has been downloaded from GitHub nearly 100,000 times according to a third-party tool; this does not measure the other ways that users could acquire iperf3.
Work on iperf3 started in 2009 (the first commit was an import of the iperf-2.0 sources), with the first public release in 2013. The commit history (and the original iperf2 project maintainer) will confirm that iperf3 was intended essentially as an iperf2 replacement. Thus there was a time during which iperf2 was basically abandonware. Fortunately, Bob McMahon from Broadcom has assumed the maintainership of this code base and is actively developing for it.
Linux vs. other operating systems
Most of the high-performance networking that we see in the R&E networking space comes from Linux hosts, so it was natural that this is the main supported platform. Supporting iperf3 on FreeBSD and macOS has ensured some level of cross-platform support, at least to the extent of other UNIX and UNIX-like systems. While we have had many requests to make iperf3 work under Windows, we didn’t have the developer skills or resources to support that — and we still don’t. The fact that iperf3 works on Windows at all is a result of code contributions from the community, which we gratefully acknowledge.
There are many facets to end-to-end application network performance. These include of course routers, switches, NICs, and network links, but also the end host operating system, runtime libraries, and the application itself. To that extent iperf3 does characterize the performance of a certain set of applications designed for UNIX but trying to run (with some emulation or adaptation) in a Windows environment. We completely agree that this may not provide the highest throughput numbers on Windows, compared to a program that uses native APIs.
Iperf3 and Windows
ESnet is happy to see that iperf.fr has removed the old, obsolete binaries from their Web site. This is a problem that can affect any open-source project, not just iperf3.
As mentioned earlier, we’ve generally accepted patches for iperf3 to run on Windows (or other not-officially-supported operating systems such as Android, iOS, or various commercial UNIXes). These changes have allowed Windows hosts to run iperf3 tests (apparently with sub-optimal performance) against any other instance of iperf3, regardless of operating system.
If there’s interest on the part of Microsoft in making a more-Windows-friendly version of iperf3, we’d welcome a conversation on that topic. Feel free to reach out to me (Bruce Mah) anytime.
The Advanced North Atlantic (ANA) collaboration has added three 400-Gbps spectrum circuits between exchange points in the U.S., U.K., and France, boosting ANA’s combined trans-Atlantic network capacity to 2.4 Tbps. READ MORE
Leading research and education (R&E) networking organizations Energy Sciences Network (ESnet), GÉANT, GlobalNOC at Indiana University, Internet2, and Texas Advanced Computing Center (TACC) have joined forces to form MetrANOVA, a consortium for Advancing Network Observation, Visualization, and Analysis. MetrANOVA’s goal is to develop and disseminate common network measurement and analysis tools, tactics, and techniques that can be applied throughout the global R&E community. READ MORE
The software automation system OSCARS, one of the key innovations powering ESnet’s high-speed network for Department of Energy–funded scientific research, has just gotten a major update: OSCARS 1.1, which is designed to take advantage of the capabilities offered by ESnet6, the latest iteration of the network.
ESnet has released iperf-3.16, a significant new version of the open-source network performance measurement tool iperf3. Part of perfSONAR, iperf3 can also be used as a stand-alone tool for measuring network performance in general. The new version gives better insights into high-speed network behaviors. READ MORE
In a TNC23 workshop organized by ESnet’s Chris Cummings and SURFnet’s Hans Trompert, NREN administrators hailing from six continents took the Workflow Orchestrator for a test drive.
If the world’s scientific research networks transported people instead of data packets, in early June you would have seen a traffic spike transiting Tirana, Albania – the site of TNC23, the prestigious research and education networking conference put on by GÉANT. More than 800 participants from 70-plus countries, representing regional and national research and education networks (NRENs), schools and universities, technology providers, and world-changing scientific projects, came together in southeastern Europe for three days of discussion and collaboration.
A sizable delegation from the Department of Energy’s Energy Sciences Network (ESnet) was there, both to share with and learn from their peers. As the United States’ foremost scientific research network, ESnet partners with GÉANT, a federation of European NRENs, as well as with multiple individual NRENs such as SURF in the Netherlands. They’re united by a similar goal: to provide innovative networking infrastructure and services that support and advance scientific research all over the world.
Cummings (standing) and ESnet’s Nemi McCarter-Ribakoff (seated, center) kick off the Workflow Orchestrator workshop.
Sharing Lessons Learned and Best Practices
In that vein, ESnet Orchestration and Core Data Software Engineer Chris Cummings, with help from ESnet colleagues Nemi McCarter-Ribakoff and Brian Eschen, teamed up with SURFnet Senior Network Architect & Innovation Hans Trompert and Peter Boers for the well-received session “From Zero to Orchestrated — A Workflow Orchestrator Workshop.” This was the first time that ESnet and SURF have together shared the in-depth learnings and hard-earned knowledge gained by their network and software engineers.
Network orchestration and intent-based networking refers to the design and centralized coordination of network resources that allows higher-level services to be realized on the network. This is in contrast to the legacy approach of individual configuration and provisioning of routers, switches, firewalls, and other network devices to deliver a network service. The open-source Workflow Orchestrator tool developed by SURF and ESnet helps network administrators both automate (execute repetitive tasks reliably and easily) and orchestrate (adding a layer of intelligence to tasks being automated and a complete audit log of changes).
Many NRENs would like to add more orchestration, but getting started can be a daunting task requiring a lot of forethought and domain knowledge. Representatives from more than 20 NRENs from six continents attended the all-day, interactive workshop at TNC, which began with introductions to product and workflow modeling, followed by interactive development sessions, and ending with an open discussion around tailoring the Workflow Orchestrator to theoretical use cases. The goal was for attendees to get a locally running version of the Workflow Orchestrator on their laptops as well as some example workflows, provide guided troubleshooting, and show how to make code changes to fix bugs.
While the attendees appreciated having a working environment to take home with them, “it was also very beneficial for us – learning how to think in an orchestration-forward manner by spending time planning out theoretical product designs with other R&E community members,” says Cummings.
There were challenges: some had trouble getting the workshop running, due to unfamiliarity with the docker containerization platform, as well as not having administrative rights to install docker on their systems. Pulling resources over the hotel wifi was also difficult, but Cummings reports that “Karl Newell from Internet2 came up with some really clever solutions to help his fellow workshop-mates access the images locally – a great example of cross-R&E teamwork!” Cummings and Trompert, Nemi McCarter-Ribakoff, and other ESnet engineers will be applying these lessons learned to the next edition of the Workflow Orchestrator workshop, planned for Internet2’s TechEx conference in late September.
ESnet’s Tom Lehman explained the advantages of SENSE orchestration to the TNC23 audience.
ESnet at TNC23
Among ESnet’s other speakers and presenters were Chief Technology Officer and Planning & Innovation Group Lead Chin Guok and ESnet/Berkeley Lab Networked Systems Researcher and Developer Tom Lehman, who took the massive concert hall stage to share an overview of Managed Network Services for Large Data Transfers, focusing on the integration work between the SENSE [SDN for End-to-End Networking at Exascale] orchestration and Rucio data management systems. Their goal was to demystify the often opaque role of the network in science workflow processes by showing how advanced wide area network traffic engineering, end site infrastructure awareness/control, and domain science workflow intelligence can improve research results and planning abilities.
Science Engagement Acting Group Lead Eli Dart gave an update on the high-performance network design pattern Science DMZ.
ESnet Science Engagement Acting Group Lead Eli Dart presented on The Strategic Future of the Science DMZ, the science-focused high-performance network design pattern created by ESnet, highlighting new environments and applications such as Streaming DTNs, Zero Trust, and Exascale HPC, and workflows that couple experimental and computing facilities to achieve previously unachievable results. Dart also teamed up with Karl Newell from Internet2 to talk about Identifying and Understanding Scientific Network Flows, in particular the effort underway from the High-Energy Physics (HEP) and Worldwide LHC Computing Grid (WLCG) communities to mark packets/flows so they can be correlated with specific research projects. This approach, which allows identification of flows for troubleshooting and gives network providers visibility into the research flows they support, can be leveraged by any research organization and network provider willing to participate in packet marking.
Chin Guok shared an assessment of the effectiveness of ESnet’s pilot cache system.
Planning & Architecture Computer Systems Engineer Nick Buraglio chaired a session titled “If It Was Easy, We’d Have Done it By Now” about innovations in networking that included Guok summarizing the findings of ESnet’s In-Network Caching Pilot. Guok also co-chaired a workshop, Planning and Development in R&E Networks, that included strategy discussion for approaches to intercontinental connectivity, packet layer renewal, automation, and Big Science requirements.
It was an intense couple of days of networking. Another half-dozen ESnetters, including Executive Director Inder Monga, were also in Albania to attend TNC23. A group of them unwound after the conference ended by going on a challenging hike above stunning Lake Bovilla.
“We all learned a lot,” said Cummings. “And it was great to be able to contribute in a concrete way to the workflow orchestration community.”
After TNC ended, a group of ESnetters hiked up above Lake Bovilla, a reservoir northeast of Tirana within Mount Dajt National Park. Photo: Brian Eschen.
Among this summer’s cohort of 53 Experiences in Research high school interns are three from Hawai’i and two from the Bay Area who are working on similar but different network data visualization projects for ESnet.
“Much of the things I am doing in the project were things I could not have imagined were in my ability to try a month ago,” said Ella Jeon, a rising junior in Pleasanton, CA. “One significant new mindset I have experienced over the course of this internship is the whole ‘being able to try things that I didn’t think were really possible or something I was really capable of’ type of realization. The boost of guidance and support in this internship has made me realize how much more I could go on to try and achieve on my own as well.”
Diagram of ESnet6’s peering points for the new Cloud Connect Service
By Joshua Stewart, ESnet
Part of managing a network dedicated to handling vast swaths of scientific data is also ensuring it adapts to trends for how data is being created, stored, and computed. A pattern has emerged in recent years allowing for access to elastic and scalable systems on demand. Nebulously titled “The Cloud,” it refers to software and services that run over the public internet. For ESnet, this is just another place where science intends to happen.
To drill down more on the nebulosity of the term “The Cloud,” there are different flavors of how the services/software are consumed. “Public Cloud” refers to services and software that are open for all users and subscribers around the world: for example, those provided by Dropbox, Slack, Salesforce, and Office 365. Meanwhile, as its name suggests, a Virtual Private Cloud (VPC) is an environment in which all virtualized hardware and software resources are dedicated exclusively to, and accessible only by, a single organization. The intention of a VPC is to emulate the on-premise data centers of old while removing the headaches of managing their physicality (space and power constraints), and offering the added benefit of instantaneous access to scale when needed. Although some organizations decided to go all-in on the new virtual environments by harnessing a cloud-native posture, some took a more measured approach by seamlessly blending their on-premises infrastructure with the new virtualized territory, in a format also known as a hybrid cloud.
As usage of virtual private clouds grew, it became apparent that connectivity over the public internet was too unreliable, slow, and insecure: dedicated, high-bandwidth connectivity was a must-have. In response, every major Cloud Service Provider (CSP) launched an offering. Amazon Web Services (AWS) was first, launching “Direct Connect” in 2012; Azure followed in 2014 with its “ExpressRoute”; and in 2017, Google launched Cloud Interconnect. (Read more about the history.)
These virtual circuits are the driver behind the new ESnet Cloud Connect service aimed at supporting both scientific and enterprise workloads. The goal is to carve out a dedicated, high-bandwidth path (up to 10 Gbps) across ESnet’s 400GE-capable backbone from any supported user facility to the nearest cloud on-ramp by utilizing two interim network service providers: Packet Fabric and Equinix. From there, ESnet would help provision the major CSPs’ (Azure, AWS, GCP) aforementioned flavor of dedicated connectivity into your Virtual Private Cloud.
This solution is designed to scale from simple dedicated connectivity and a singular cloud provider to a virtual routed network utilizing multiple cloud providers, onramps, and interconnecting user facilities. This series of blog posts will focus on a few suggested use cases for utilizing ESnet’s new service offering. For questions or to learn more, email Joshua Stewart.
You must be logged in to post a comment.