EJFAT prototype demonstrates proof of concept for connecting scientific instruments with remote high-performance computing for rapid data processing
Scientists at Thomas Jefferson National Accelerator Facility (Jefferson Lab) clicked a button and held their collective breaths. Moments later, they exulted as a monitor showed steady saturation of their new 100 gigabit-per-second connection with raw data from a nuclear physics experiment. Across the country, their collaborators at Energy Sciences Network (ESnet) were also cheering: the data torrent was streaming flawlessly in real time from 3,000 miles away, across the ESnet6 network backbone, and into the National Energy Research Scientific Computing Center’s (NERSC‘s) Perlmutter supercomputer at Lawrence Berkeley National Laboratory (Berkeley Lab).
Once it reached NERSC, 40 Perlmutter nodes (more than 10,000 cores) massively processed the data stream and sent the results back to Jefferson Lab in real time for validation, persistence, and final physics analysis. This was achieved without the need for any buffering or temporal storage and without experiencing data loss or latency-related problems. (In this context, “real time” means streamed continuously while processing is performed, with no significant delays or storage bottlenecks.)
This was only a test — but not just any test. “This was a major breakthrough for the transmission and processing of scientific data,” said Graham Heyes, Technical Director of the High Performance Data Facility (HPDF). “Capturing this data and processing it in real time is challenging enough; doing it when the data source and destination are separated by distances on continental scales is very difficult. This proof-of-concept test shows that it can be done and will be a game changer.”
A recent post on Microsoft’s Networking blog, clickably titled “Three Reasons Why You Should Not Use iPerf3 on Windows,” caused a mini-kerfuffle in the world of network speed measurement and a lively discussion in the comments section. Although the post has since been updated with some important disclaimers — the no. 1 answer to the question being that ESnet has never and still does not support Windows for iperf3 — ESnet’s iperf3 team wanted to set the record straight on a few additional points publicly for anyone who might still be confused.
A little background on ESnet and iperf3 ESnet (www.es.net) provides scientific networking services to support the U.S. Department of Energy and its national laboratories, user facilities, and scientific instruments complex. We developed iperf3 as a rewrite of iperf2 in order to be able to test the end-to-end performance of networks doing large transfers of scientific data. The primary consumer of iperf3 is the perfSONAR measurement system), which is widely used in the research and education (R&E) networking community. iperf3 is of course also usable as a standalone tool, which is one of the reasons it’s been released separately on GitHub. Many large corporations, including SpaceX’s Starlink and Comcast, are using it to measure their own networks. It has been downloaded from GitHub nearly 100,000 times according to a third-party tool; this does not measure the other ways that users could acquire iperf3.
Work on iperf3 started in 2009 (the first commit was an import of the iperf-2.0 sources), with the first public release in 2013. The commit history (and the original iperf2 project maintainer) will confirm that iperf3 was intended essentially as an iperf2 replacement. Thus there was a time during which iperf2 was basically abandonware. Fortunately, Bob McMahon from Broadcom has assumed the maintainership of this code base and is actively developing for it.
Linux vs. other operating systems
Most of the high-performance networking that we see in the R&E networking space comes from Linux hosts, so it was natural that this is the main supported platform. Supporting iperf3 on FreeBSD and macOS has ensured some level of cross-platform support, at least to the extent of other UNIX and UNIX-like systems. While we have had many requests to make iperf3 work under Windows, we didn’t have the developer skills or resources to support that — and we still don’t. The fact that iperf3 works on Windows at all is a result of code contributions from the community, which we gratefully acknowledge.
There are many facets to end-to-end application network performance. These include of course routers, switches, NICs, and network links, but also the end host operating system, runtime libraries, and the application itself. To that extent iperf3 does characterize the performance of a certain set of applications designed for UNIX but trying to run (with some emulation or adaptation) in a Windows environment. We completely agree that this may not provide the highest throughput numbers on Windows, compared to a program that uses native APIs.
Iperf3 and Windows
ESnet is happy to see that iperf.fr has removed the old, obsolete binaries from their Web site. This is a problem that can affect any open-source project, not just iperf3.
As mentioned earlier, we’ve generally accepted patches for iperf3 to run on Windows (or other not-officially-supported operating systems such as Android, iOS, or various commercial UNIXes). These changes have allowed Windows hosts to run iperf3 tests (apparently with sub-optimal performance) against any other instance of iperf3, regardless of operating system.
If there’s interest on the part of Microsoft in making a more-Windows-friendly version of iperf3, we’d welcome a conversation on that topic. Feel free to reach out to me (Bruce Mah) anytime.
The Advanced North Atlantic (ANA) collaboration has added three 400-Gbps spectrum circuits between exchange points in the U.S., U.K., and France, boosting ANA’s combined trans-Atlantic network capacity to 2.4 Tbps. READ MORE
Leading research and education (R&E) networking organizations Energy Sciences Network (ESnet), GÉANT, GlobalNOC at Indiana University, Internet2, and Texas Advanced Computing Center (TACC) have joined forces to form MetrANOVA, a consortium for Advancing Network Observation, Visualization, and Analysis. MetrANOVA’s goal is to develop and disseminate common network measurement and analysis tools, tactics, and techniques that can be applied throughout the global R&E community. READ MORE
The software automation system OSCARS, one of the key innovations powering ESnet’s high-speed network for Department of Energy–funded scientific research, has just gotten a major update: OSCARS 1.1, which is designed to take advantage of the capabilities offered by ESnet6, the latest iteration of the network.
ESnet has released iperf-3.16, a significant new version of the open-source network performance measurement tool iperf3. Part of perfSONAR, iperf3 can also be used as a stand-alone tool for measuring network performance in general. The new version gives better insights into high-speed network behaviors. READ MORE
In a TNC23 workshop organized by ESnet’s Chris Cummings and SURFnet’s Hans Trompert, NREN administrators hailing from six continents took the Workflow Orchestrator for a test drive.
If the world’s scientific research networks transported people instead of data packets, in early June you would have seen a traffic spike transiting Tirana, Albania – the site of TNC23, the prestigious research and education networking conference put on by GÉANT. More than 800 participants from 70-plus countries, representing regional and national research and education networks (NRENs), schools and universities, technology providers, and world-changing scientific projects, came together in southeastern Europe for three days of discussion and collaboration.
A sizable delegation from the Department of Energy’s Energy Sciences Network (ESnet) was there, both to share with and learn from their peers. As the United States’ foremost scientific research network, ESnet partners with GÉANT, a federation of European NRENs, as well as with multiple individual NRENs such as SURF in the Netherlands. They’re united by a similar goal: to provide innovative networking infrastructure and services that support and advance scientific research all over the world.
Cummings (standing) and ESnet’s Nemi McCarter-Ribakoff (seated, center) kick off the Workflow Orchestrator workshop.
Sharing Lessons Learned and Best Practices
In that vein, ESnet Orchestration and Core Data Software Engineer Chris Cummings, with help from ESnet colleagues Nemi McCarter-Ribakoff and Brian Eschen, teamed up with SURFnet Senior Network Architect & Innovation Hans Trompert and Peter Boers for the well-received session “From Zero to Orchestrated — A Workflow Orchestrator Workshop.” This was the first time that ESnet and SURF have together shared the in-depth learnings and hard-earned knowledge gained by their network and software engineers.
Network orchestration and intent-based networking refers to the design and centralized coordination of network resources that allows higher-level services to be realized on the network. This is in contrast to the legacy approach of individual configuration and provisioning of routers, switches, firewalls, and other network devices to deliver a network service. The open-source Workflow Orchestrator tool developed by SURF and ESnet helps network administrators both automate (execute repetitive tasks reliably and easily) and orchestrate (adding a layer of intelligence to tasks being automated and a complete audit log of changes).
Many NRENs would like to add more orchestration, but getting started can be a daunting task requiring a lot of forethought and domain knowledge. Representatives from more than 20 NRENs from six continents attended the all-day, interactive workshop at TNC, which began with introductions to product and workflow modeling, followed by interactive development sessions, and ending with an open discussion around tailoring the Workflow Orchestrator to theoretical use cases. The goal was for attendees to get a locally running version of the Workflow Orchestrator on their laptops as well as some example workflows, provide guided troubleshooting, and show how to make code changes to fix bugs.
While the attendees appreciated having a working environment to take home with them, “it was also very beneficial for us – learning how to think in an orchestration-forward manner by spending time planning out theoretical product designs with other R&E community members,” says Cummings.
There were challenges: some had trouble getting the workshop running, due to unfamiliarity with the docker containerization platform, as well as not having administrative rights to install docker on their systems. Pulling resources over the hotel wifi was also difficult, but Cummings reports that “Karl Newell from Internet2 came up with some really clever solutions to help his fellow workshop-mates access the images locally – a great example of cross-R&E teamwork!” Cummings and Trompert, Nemi McCarter-Ribakoff, and other ESnet engineers will be applying these lessons learned to the next edition of the Workflow Orchestrator workshop, planned for Internet2’s TechEx conference in late September.
ESnet’s Tom Lehman explained the advantages of SENSE orchestration to the TNC23 audience.
ESnet at TNC23
Among ESnet’s other speakers and presenters were Chief Technology Officer and Planning & Innovation Group Lead Chin Guok and ESnet/Berkeley Lab Networked Systems Researcher and Developer Tom Lehman, who took the massive concert hall stage to share an overview of Managed Network Services for Large Data Transfers, focusing on the integration work between the SENSE [SDN for End-to-End Networking at Exascale] orchestration and Rucio data management systems. Their goal was to demystify the often opaque role of the network in science workflow processes by showing how advanced wide area network traffic engineering, end site infrastructure awareness/control, and domain science workflow intelligence can improve research results and planning abilities.
Science Engagement Acting Group Lead Eli Dart gave an update on the high-performance network design pattern Science DMZ.
ESnet Science Engagement Acting Group Lead Eli Dart presented on The Strategic Future of the Science DMZ, the science-focused high-performance network design pattern created by ESnet, highlighting new environments and applications such as Streaming DTNs, Zero Trust, and Exascale HPC, and workflows that couple experimental and computing facilities to achieve previously unachievable results. Dart also teamed up with Karl Newell from Internet2 to talk about Identifying and Understanding Scientific Network Flows, in particular the effort underway from the High-Energy Physics (HEP) and Worldwide LHC Computing Grid (WLCG) communities to mark packets/flows so they can be correlated with specific research projects. This approach, which allows identification of flows for troubleshooting and gives network providers visibility into the research flows they support, can be leveraged by any research organization and network provider willing to participate in packet marking.
Chin Guok shared an assessment of the effectiveness of ESnet’s pilot cache system.
Planning & Architecture Computer Systems Engineer Nick Buraglio chaired a session titled “If It Was Easy, We’d Have Done it By Now” about innovations in networking that included Guok summarizing the findings of ESnet’s In-Network Caching Pilot. Guok also co-chaired a workshop, Planning and Development in R&E Networks, that included strategy discussion for approaches to intercontinental connectivity, packet layer renewal, automation, and Big Science requirements.
It was an intense couple of days of networking. Another half-dozen ESnetters, including Executive Director Inder Monga, were also in Albania to attend TNC23. A group of them unwound after the conference ended by going on a challenging hike above stunning Lake Bovilla.
“We all learned a lot,” said Cummings. “And it was great to be able to contribute in a concrete way to the workflow orchestration community.”
After TNC ended, a group of ESnetters hiked up above Lake Bovilla, a reservoir northeast of Tirana within Mount Dajt National Park. Photo: Brian Eschen.
Among this summer’s cohort of 53 Experiences in Research high school interns are three from Hawai’i and two from the Bay Area who are working on similar but different network data visualization projects for ESnet.
“Much of the things I am doing in the project were things I could not have imagined were in my ability to try a month ago,” said Ella Jeon, a rising junior in Pleasanton, CA. “One significant new mindset I have experienced over the course of this internship is the whole ‘being able to try things that I didn’t think were really possible or something I was really capable of’ type of realization. The boost of guidance and support in this internship has made me realize how much more I could go on to try and achieve on my own as well.”
Diagram of ESnet6’s peering points for the new Cloud Connect Service
By Joshua Stewart, ESnet
Part of managing a network dedicated to handling vast swaths of scientific data is also ensuring it adapts to trends for how data is being created, stored, and computed. A pattern has emerged in recent years allowing for access to elastic and scalable systems on demand. Nebulously titled “The Cloud,” it refers to software and services that run over the public internet. For ESnet, this is just another place where science intends to happen.
To drill down more on the nebulosity of the term “The Cloud,” there are different flavors of how the services/software are consumed. “Public Cloud” refers to services and software that are open for all users and subscribers around the world: for example, those provided by Dropbox, Slack, Salesforce, and Office 365. Meanwhile, as its name suggests, a Virtual Private Cloud (VPC) is an environment in which all virtualized hardware and software resources are dedicated exclusively to, and accessible only by, a single organization. The intention of a VPC is to emulate the on-premise data centers of old while removing the headaches of managing their physicality (space and power constraints), and offering the added benefit of instantaneous access to scale when needed. Although some organizations decided to go all-in on the new virtual environments by harnessing a cloud-native posture, some took a more measured approach by seamlessly blending their on-premises infrastructure with the new virtualized territory, in a format also known as a hybrid cloud.
As usage of virtual private clouds grew, it became apparent that connectivity over the public internet was too unreliable, slow, and insecure: dedicated, high-bandwidth connectivity was a must-have. In response, every major Cloud Service Provider (CSP) launched an offering. Amazon Web Services (AWS) was first, launching “Direct Connect” in 2012; Azure followed in 2014 with its “ExpressRoute”; and in 2017, Google launched Cloud Interconnect. (Read more about the history.)
These virtual circuits are the driver behind the new ESnet Cloud Connect service aimed at supporting both scientific and enterprise workloads. The goal is to carve out a dedicated, high-bandwidth path (up to 10 Gbps) across ESnet’s 400GE-capable backbone from any supported user facility to the nearest cloud on-ramp by utilizing two interim network service providers: Packet Fabric and Equinix. From there, ESnet would help provision the major CSPs’ (Azure, AWS, GCP) aforementioned flavor of dedicated connectivity into your Virtual Private Cloud.
This solution is designed to scale from simple dedicated connectivity and a singular cloud provider to a virtual routed network utilizing multiple cloud providers, onramps, and interconnecting user facilities. This series of blog posts will focus on a few suggested use cases for utilizing ESnet’s new service offering. For questions or to learn more, email Joshua Stewart.
ESnet Measurement & Analysis Intern Felix Renken is a senior student from Technische Universität Berlin, majoring in Computer Science with a focus on Media Technologies and Signal Processing. Originally from a rural area near Hamburg in northern Germany, he moved to Berlin to pursue his college education. He arrived in Berkeley in March and will be going home in early July.
During his internship, Felix worked on developing an open-source Grafana plugin for visualizing network data that can be used in ESnet’s Stardust system, which collects precise network measurement data and allows users to retrieve information about specific equipment over a given time range. (Learn more about Stardust via this talk by Ed Balas and Andy Lake.) Felix’s plugin enables users to visualize various data collected by Stardust, revealing the relationship between pairs of data from different destinations for a single source and showcasing common attributes in nodes and links along with the option to visualize AS paths. The plugin is currently undergoing the Grafana community plugin review process; the source code is available on GitHub. It is installed on Stardust too, for anyone who wants to check it out.
During my search for interesting internship opportunities, I came across ESnet’s student program and contacted Marc Körner and Katrina Turner to get more information on the projects they supervise. I eventually applied for the “Data Visualization of Network Measurement Data” project. It encompasses the development of an open-source tool that visualizes network data in an exciting way. The opportunity of getting work experience in a research environment greatly appealed to me. And, of course, the chance to spend time in California!
What is the most exciting aspect of your field right now?
The cross-disciplinary nature of visualizing data is particularly interesting to me. It utilizes principles from design, statistics, and computer science, offering opportunities to learn from diverse perspectives.
How was Berkeley different from Berlin? What fun things did you do here?
Berkeley and Berlin are distinct in so many aspects. Berkeley is, of course, much smaller in size than Berlin, and I really enjoyed being in a city that is less hectic. People here seem more relaxed. And the fact that Berkeley is somewhat shaped by its university was also something that I’m not used to from Berlin or any other German city. Cycling here was scarier than in Berlin though. Another thing is the accessibility to the fantastic nature around Berkeley. I went hiking a lot and will definitely miss being in close proximity to beautiful trails when going back to Germany. Other fun things I did were camping and eating a lot of burritos.
You must be logged in to post a comment.