Two graduate students working with ESnet have published their papers recently in IEEE and ACM workshops.
Bibek Shrestha, a graduate student at the University of Nevada, Reno, and his advisor Engin Arslan worked with Richard Cziva from ESnet to publish a work on “INT Based Network-Aware Task Scheduling for Edge Computing”. In the paper, Bibek investigated the use of in-band network telemetry (INT) for real-time in-network task scheduling. Bibek’s experimental analysis using various workload types and network congestion scenarios revealed that enhancing task scheduling of edge computing with high-precision network telemetry can lead up to a 40% reduction in data transfer times and up to 30% reduction in total task execution times by favoring edge servers in uncongested (or mildly congested) sections of the network when scheduling tasks. The paper will appear in the 3rd Workshop on Parallel AI and Systems for the Edge (PAISE) co-conducted with IEEE IPDPS 2021 conference to be held on May 21st, 2021, in Portland, Oregon.
Zhang Liu, a former ESnet intern and a current graduate student at the University of Colorado at Boulder, worked with the ESnet High Touch Team – Chin Guok, Bruce Mah, Yatish Kumar, and Richard Cziva – on fastcapa-ng, ESnet’s telemetry processing software. In the paper “Programmable Per-Packet Network Telemetry: From Wire to Kafka at Scale,” Zhang showed the scaling and performance characteristics of fastcapa-ng, and highlighted the most critical performance considerations that allow the pushing of 10.4 million telemetry packets per second to Kafka with only 5 CPU cores, which is more than enough to handle 170 Gbit/s of original traffic with 1512B MTU. This paper will appear in the 4th International Workshop on Systems and Network Telemetry and Analytics (SNTA 2021) held at the ACM HPCD 2021 conference in Stockholm, Sweden between 21-25 June 2021.
Congratulations Bibek and Zhang!
If you are a networked systems research student looking to collaborate with us on network measurements, please reach out to Richard Cziva. If you are interested in a summer internship with ESnet, please visit this page.
Created in 1986, the U.S. Department of Energy’s (DOE’s) Energy Sciences Network (ESnet) is a high-performance network built to support unclassified science research. ESnet connects more than 40 DOE research sites—including the entire National Laboratory system, supercomputing facilities and major scientific instruments—as well as hundreds of other science networks around the world and the Internet.
Step 9: Why just rent fiber? Pick up your own dark fiber network at a bargain price for future expansion. In the meantime, boost your bandwidth to 100G for everyone. (2012)
Step 10: Here’s a cool idea, come up with a new network design so that scientists moving REALLY BIG DATASETS can safely avoid institutional firewalls, call it the Science DMZ, and get research moving faster at universities around the country. (2012)
Step 12: 100G is fast, but it’s time to get ready for 400G. To pave the way, ESnet installs a production 400G network between facilities in Berkeley and Oakland, Calif., and even provides a 400G testbed so network engineers can get up to speed on the technology. (2015)
Step 13: Celebrate 30 years as a research and education network leader, but keep looking forward to the next level. (2016)
About 100 networking experts from academia, industry, national labs and federal agencies met for a two-day workshop at the National Science foundation to plan a path forward to develop, deploy and operate a prototype SDN network. SDN, or Software Defined Networking, is an upcoming technology paradigm aimed at making it easier for software applications to automatically configure and control the various layers of the network to improve flexibility, predictability and reliability.
ESnet Chief Technologist Inder Monga was the lead organizer of the workshop and ESnet network engineer Erich Pouyoul gave a talk on science drivers for SDN. Monga also led a breakout session on “Technology and Operational Gap Analysis.” ESnet has been a pioneer in developing and deploying SDN technology in support of data-intensive science for almost a decade, starting with research on virtual network circuits that eventually culminated in the facility’s OSCARS project, recipient of a 2013 R&D100 award.
The invitation-only workshop was held at the National Science Foundation in Arlington, Va., and included speakers from the White House Office on Science and Technology Policy (OSTP), Google, DARPA, Internet2, SRI and Brocade, as well as ESnet. Among the areas covered were transparency and interoperation among SDN domains, security and identity management, and the participation of equipment vendors to advance technology transfer.
The workshop was organized after the OSTP directed federal agencies participating in the Networking and Information Technology Research and Development (NITRD) Subcommittee’s Large Scale Networking (LSN) Coordinating Group to plan and hold an LSN workshop. The goal was to have participation by representatives from federal agencies, the commercial sector, researchers, and other networking and distributed systems research community participants to explore and report on the need for a prototype SDN network.
The workshop participants will draft a report documenting recommendations for needed R&D, resources and collaboration to deploy and operate the prototype SDN network and to identify future SDN research needs.
On the eve of the workshop, Federal Computer Week magazine published an article about federal agencies looking into SDN. Monga was among the sources interviewed for the article, which describes SDN as the next major architectural change looming for the IT community.
Widely recognized as a mark of excellence, the R&D 100 Awards are the only industry-wide competition rewarding the practical applications of science. Among the 2013 winners of this prestigious award is the latest version of OSCARS, the On-demand Secure Circuits and Reservation System, the development of which was led by ESnet. OSCARS is a software service that creates dedicated bandwidth channels for scientists who need to move massive, time-critical data sets around the world.
What makes OSCARS so useful is that it can automatically create end-to-end circuits, crossing multiple network domains. Before OSCARS, this was a time consuming process–in 2010, for example, ESnet engineers needed 10 hours of phone calls and about 100 emails over three months to do what one person can do in five minutes using OSCARS. The automation of this complex process—through a technique known as Software Defined Networking—is accelerating scientific discovery in high-energy physics and many other disciplines.
“It’s wonderful to see the innovation that’s gone into the latest version of OSCARS recognized with an R&D100 Award,” said ESnet Director Greg Bell. “This early example of Software Defined Networking is yet another example of research networks taking the lead. But OSCARS is not just an ESnet achievement. We’re grateful for collaborations with many partners over the last decade, as the project matured from a bold idea into a production software suite.”
ESnet and its collaborators successfully completed three days of demonstrating its End-to-End Circuit Service at Layer 2 (ECSEL) software at the Open Networking Summit held at Stanford a couple of weeks ago. Our goal is to build “zero-configuration circuits” to help science applications seamlessly use networks for optimized end-to-end data transport. ECSEL, developed in collaboration with NEC, Indiana University, and the University of Delaware builds on some exciting new conceptual thinking in networking.
Wrangling Big Data
To put ECSEL in context, the proliferating tide of scientific data flows – anticipated at 2 petabytes per second as planned large-scale experiments get in motion – is already challenging networks to be exponentially more efficient. Wide area networks have vastly increased bandwidth and enable flexible, distributed, scientific workflows that involve connecting multiple scientific labs to a supercomputing site, a university campus, or even a cloud data center.
The increasing adoption of distributed, service-oriented computing means that resource and vendor independence for service delivery is a key priority for users. Users expect seamless end-to-end performance and want the ability to send data flows on demand, no matter how many domains and service providers are involved. The hitch is that even though the Wide Area Network (WAN) can have turbocharged bandwidth, at these exponentially increasing rates of network traffic even a small blockage in the network can seriously impair the flow of data, trapping users in a situation resembling commute conditions on sluggish California freeways. These scientific data transport challenges that we and other R&E networks face are just a taste of what the commercial world will encounter with the increasing popularity of cloud computing and service-driven cloud storage.
Abstracting a solution
One of the key feedback from application developers, scientists and end-users is that they do not want to deal with the complexity at the infrastructure level while still accomplishing their mission. At ESnet, we are exploring various ways to make networks work better for users. A couple of concepts could be game-changers, according to Open Network Summit conference presenter and Berkeley professor Scott Shenker: 1) using abstraction to manage network complexity, and 2) extracting and exposing simplicity out of the network. Shenker himself cites Barbara Liskov’s Turing Lecture as inspiration.
ECSEL is leveraging OSCARS and OpenFlow within the Software Defined Networking (SDN) paradigm to elegantly prevent end-to-end network traffic jams. OpenFlow is an open standard to allow application-driven manipulation of network flows. ECSEL is using OSCARS-controlled MPLS virtual circuits with OpenFlow to dynamically stitch together a seamless data plane delivering services over multi-domain constructs. ECSEL also provides an additional level of simplicity to the application, as it can discover host-network interconnection points as necessary, removing the requirement of applications being “statically configured” with their network end-point connections. It also enables stitching of the paths end-to-end, while allowing each administrative entity to set and enforce its own policies. ECSEL can be easily enhanced to enable users to verify end-to-end performance, and dynamically select application-specific protocol forwarding rules in each domain.
The OpenFlow capabilities, whether it be in an enterprise/campus or within the data center, were demonstrated with the help of NEC’s ProgrammableFlow Switch (PFS) and ProgrammableFlow Controller (PFC). We leveraged a special interface developed by them to program a virtual path from ingress to egress of the OpenFlow domain. ECSEL accessed this special interface programmatically when executing the end-to-end path stitching workflow.
Our anticipated next step is to develop ECSEL as an end-to-end service by making it an integral part of a scientific workflow. The ECSEL software will essentially act as an abstraction layer, where the host (or virtual machine) doesn’t need to know how it is connected to the network–the software layer does all the work for it, mapping out the optimum topologies to direct data flow and make the magic happen. To implement this, ECSEL is leveraging the modular architecture and code of the new release of OSCARS 0.6. Developing this demonstration yielded sufficient proof that well-architected and modular software with simple APIs, like OSCARS 0.6, can speed up the development of new network services, which in turn validates the value-proposition of SDN. But we are not the only ones who think that ECSEL virtual circuits show promise as a platform for spurring further innovation. Vendors such as Brocade and Juniper, as well as other network providers attending the demo were enthusiastic about the potential of ECSEL.
But we are just getting started. We will reprise the ECSEL demo at SC11 in Seattle, this time with a GridFTP application using Remote Direct Memory Access (RDMA) which has been modified to include the XSP (eXtensible Session Protocol) that acts as a signaling mechanism enabling the application to become “network aware.” XSP, conceived and developed by Martin Swany and Ezra Kissel of Indiana University and University of Delaware, can directly interact with advanced network services like OSCARS – making the creation of virtual circuits transparent to the end user. In addition, once the application is network aware, it can then make more efficient use of scalable transport mechanisms like RDMA for very large data transfers over high capacity connections.
We look forward to seeing you there and exchanging ideas. Until Seattle, any questions or proposals on working together on this or other solutions to the “Big Data Problem,” don’t hesitate to contact me.
Eric Pouyoul, Vertika Singh (summer intern), Brian Tierney: ESnet
Last month was the first in which the ESnet network crossed a major threshold – over 10 petabytes of traffic! Traffic volume was 40% higher than the prior month and 10 times higher than just a little over 4 years ago. But what’s behind this dramatic increase in network utilization? Could it be the extreme loads ESnet circuits carried for SC10, we wondered?
Breaking down the ESnet traffic highlighted a few things. Turns out it wasn’t all that demonstration traffic sent across thousands of miles to the Supercomputing Conference in New Orleans (151.99 TB delivered), since that accounted for only slightly more than 1% of November’s ESnet-borne traffic. We observed for the first time significant volumes of genomics data traversing the network as the Joint Genome Institute sent over 1 petabyte of data to NERSC. JGI alone accounted for about 10% of last month’s traffic volume. And as we’ve seen since it went live in March, the Large Hadron Collider continues to churn out massive datasets as it increases its luminosity, which ESnet delivers to researchers across the US.
Summary of Total ESnet Traffic, Nov. 2010
Total Bytes Delivered: 10.748 PB
Total Bytes OSCARS Delivered: 5.870 PB
Pecentage of OSCARS Delivered: 54.72%
What is is really going on is quite prosaic, but to us, exciting. We can follow the progress of distributed scientific projects such as the LHC by tracking the proliferation of our network traffic, as the month-to-month traffic volume on ESnet correlates to the day-to-day conduct of science. Currently, Fermi and Brookhaven LHC data continue to dominate the volume of network traffic, but as we see, production and sharing of large data sets by the genomics community is picking up steam. What the stats are predicting: as science continues to become more data-intensive, the role of the network will become ever more important.
This award represents teamwork on several fronts. For example, earlier this year, ESnet’s engineering chops were tested when the Joint Genome Institute, one of Magellan’s first users, urgently needed increased computing resources at short notice.
Within a nailbiting span of several hours, technical staff at both centers collaborated with ESnet engineers to establish a dedicated 9 Gbps virtual circuit between JGI and NERSC’s Magellan system over ESnet’s Science Data Network (SDN). Using the ESnet-developed On-Demand Secure Circuits and Advance Reservation System (OSCARS), the virtual circuit was set up within an hour after the last details were finalized.
NERSC raided its closet spares for enough networking components to construct a JGI@NERSC local area network and migrated a block of Magellan cores over to JGI control. This allowed NERSC and JGI staff to spend the next 24 hours configuring hundreds of processor cores on the Magellan system to mimic the computing environment of JGI’s local compute clusters.
With computing resources becoming more distributed, complex networking challenges will occur more frequently. We are constantly solving high-stakes networking problems in our job connecting DOE scientists with their data. But thanks to OSCARS, we now have the ability to expand virtual networks on demand. And OSCARS is just getting better as more people in the community refine its capabilities.
The folks at JGI claim they didn’t feel a thing. They were able to continue workflow and no data was lost in the transition.
Which makes us very encouraged about the prospects for Magellan, and cloud computing in general. Everybody is hoping that putting data out there in the cloud will expand capacity. At ESnet, we just want to make the ride as seamless and secure as possible.
Kudos to Magellan. We’re glad to back you up, whatever the weather.
It is midafternoon Wednesday at SC10 and the demos are going strong. Jon Dugan supplied an automatically updating graph in psychedelic colors http://bit.ly/9HUrqL of the traffic ESnet is able to carry with all the circuits we set up. Getting this far required many hours of work from a lot of ESnet folk to accommodate the virtual circuit needs of both ESnet sites and SCinet customers using the OSCARS IDC software. As always, the SCinet team has put in long hours in a volatile environment to deliver a high performance network that meets the needs of the exhibitors.
At 1 p.m. at Berkeley Lab booth 2448, catch ESnet’s Inder Monga’s round-table discussion on OSCARS virtual circuits. OSCARS, the acronym for On- demand Secure Circuits and Advance Reservation System, allows users to reserve guaranteed bandwidth. Many of the demos at SC10 are being carried by OSCARS virtual circuits which were developed by ESnet with DOE support. Good things to come: ESnet anticipates the rollout of OSCARS 0.6 in early 2011. Version 0.6 will offer greatly expanded capabilities and versatility, such as a modular architecture enabling easy plug and play of the various functional modules and a flexible path computation engine (PCE) workflow architecture.
Then, stick around, because next at 2 p.m. Brian Tierney from ESnet will lead a roundtable on the research being produced from the ARRA-funded Advanced Networking Initiative (ANI) testbed.
In 2009, the DOE Office of Science awarded ESnet $62 million in recovery funds to establish ANI, a next generation 100Gbps network connecting DOE’s largest unclassified supercomputers, as well as a reconfigurable network testbed for researchers to test new networking concepts and protocols.
Brian will discuss progress on the 100Gbps network, update you on the several research projects already underway on the testbed, discuss testbed capabilities and how to get access to the testbed. He will also answer your questions on how to submit proposals for the next round of testbed network research.
In the meantime, some celeb-spotting at the LBNL booth at SC10.
It’s Wednesday at 10 a.m. in the SCSD booth, and Rick Wagner is testing simulations of cosmic matter and gases streamed in from Argonne National Lab. Wagner about to run a real time volume-rendering application at Argonne. The application renders data in real time, which will stream the results across a wide area (from Argonne to New Orleans) and display it on the tiled screen in the SDSC booth. To do so, SDSC is using OSCARS, ESnet’s on-demand reservation software to schedule data movement on demand.
Aside from the sheer technical feat of rendering data in real time and streaming massive amounts of it across long distances, on-demand data scheduling enables scientists to be more versatile–easily working with the data as needed. For Wagner and his collaborators, improvements in data streaming are all about new capabilities. “We’ve never had this functionality,” said Wagner. “We want to be able to compare the data sets side by side.”
Wagner will next add in variables such as radiation, to the images depicting gasses and matter from the early moments of the universe. This kind of demo illustrates what ESnet is all about. It is our mission to link scientists to collaborators and their data. But we are always striving for improvements in functionality, so that our end users will be more effective in their research.