Looking back at ESnet’s 2020

Advancing our strategy and shaping our position on the board.
Some thoughts from Inder on the year-that-was.

Miniature from Alfonso X’s Libro del axedrez dados et tablas (Book of chess, dices and tables), c. 1283. , Public domain, via Wikimedia Commons

Dear Friends, Well-wishers, Colleagues, and all of ESnet,

Chess! 2020 has been much more challenging than this game. It’s also been a year where we communicated through the squares on our zoom screens, filled with faces of our colleagues, collaborators, and loved ones.

In January, Research and Education leaders came together in Hawaii at the Pacific Telecommunications Council meeting to discuss the future of networking across the oceans. It was impossible to imagine then that we would not be able to see each other again for such a long time. Though thanks to those underwater cables, we have been able to communicate seamlessly across the globe.

Looking back at 2020, we not only established a solid midgame position on our ESnet chessboard, but succeeded in ‘winning positions’ despite the profound challenges. The ESnet team successfully moved our network operations to be fully remote (and 24/7) and accomplished several strategic priorities. 

ESnet played some really interesting gambits this year: 

  1. Tackled COVID-related network growth and teleworking issues for the DOE complex
    • We saw a 4x spike in remote traffic and worked closely across several Labs to upgrade their connectivity. We continue to address the ever-growing demand in a timely manner. 

    • As we all shifted to telework from home, ESnet engineers developed an impromptu guide that was valuable to troubleshoot our home connectivity issues. 
  2. Progressed greatly on implementing our next-generation network, ESnet6
    • We deployed and transitioned to the ESnet6 optical backbone network, with 300 new site installations, 100’s of 100G waves provisioned, with just six months of effort, and while following pandemic safety constraints. I am grateful to our partners Infinera (Carahsoft) and Lumen for working with our engineers to make this happen. Check out below how we decommissioned the ESnet5 optical network and lit up the ESnet6 network.
    • Installed a brand new management network and security infrastructure upgrades along with significant performance improvements.
    • We awarded the new ESnet6 router RFP (Congratulations Nokia and IMPRES!); the installs start soon.
    • Issued another RFP for optical transponders, and will announce the winner shortly.
  3. Took initiative on several science collaborations to address current and future networking needs
    • We brainstormed new approaches with the Rubin Observatory project team, Amlight, DOE and NSF program managers to meet the performance and security goals for traffic originating in Chile. We moved across several countries in South America before reaching the continental U.S. in Florida (Amlight), and eventually the U.S. Data Facility at SLAC via ESnet.
    • Drew insights through deep engagement of ESnet engineers with the High Energy Physics program physicists, for serving the data needs of their current and planned experiments expediently.
      Due to the pandemic, a two-day immersive in-person meeting turned into a multi-week series of Zoom meetings, breakouts, and discussions.
    • When an instrument produces tons of data, how do you build the data pipeline reliably? ESnet engineers took on this challenge, and worked closely with the GRETA team to define and develop the networking architecture and data movement design for this instrument. This contributed to a successful CD 2/3 review of the project—a challenging enough milestone during normal times, and particularly tough when done remotely. 
    • Exciting opening positions were created with EMSL, FRIB, DUNE/SURF, LCLS-II…these games are still in progress, more will be shared soon. 
  4. Innovated to build a strong technology portfolio with a series of inspired moves
    • AI/ML
      • We demonstrated Netpredict, a tool using deep learning models and real-time traffic statistics to predict when and where the network will be congested. Mariam’s web page showcases some of the other exciting investigations in progress. 
      • Richard and his collaborators published Real-time flow classification by applying AI/ML to detailed network telemetry.
    • High-touch ESnet6 project
      • Ever dream of having the ability to look at every packet, a “packetscope”, at your fingertips? An ability to create new ways to troubleshoot, performance engineer, and gain application insights? We demonstrated a working prototype of that vision at the SC20 XNET workshop
    • SENSE
      • We deployed a beta version of software that provides science applications the ability to orchestrate large data flows across administrative domains securely. What started as a small research project five years ago (Thanks ASCR!) is now part of the AutoGOLE project initiative in addition to being used for Exascale Computing Project (ECP) project, ExaFEL.
    • TCP
      • Initiated the Q-Factor project this year, a research collaboration with Amlight, funded by NSF. The project will enable ultra-high-speed data transfer optimization by TCP parameter tuning through the use of programmable dataplane telemetry: https://q-factor.io/
      • We testbed thoroughly the interactions between TCP congestion control algorithms, BBRv2 and CUBIC. A detailed conversation with Google, the authors of the BBRv2 implementation, is in progress.
  5. Initiated strategic new games, with a high potential for impact
    • FABRIC/FAB
      • Executed on the vision and design of a nationwide @scale research testbed working alongside a superstar multi-university team.
      • With the new FAB grant, FABRIC went international with plans to put nodes in Bristol, Amsterdam, Tokyo and Geneva. More locations and partners are possibilities for the future.  
    • Edge Computing
      • Created an prototype FPGA-based edge-computing platform for data-intensive science instruments in collaboration with the Computational Research Division and Xilinx. Look for exciting news on the blog as we complete the prototype deployment of this platform.
    • Quantum
    • 5G
      • What are the benefits of widespread deployment of 5G technology on science research? We contributed to the development of this important vision at a DOE workshop. New and exciting pilots are emerging that will change the game on how science is conducted. Stay tuned. 

Growth certainly has its challenges. But, as we grew, we evolved from our old game into an adept new playing style. I am thankful for the trust that all of you placed in ESnet leadership, vital for our numerous, parallel successes. Our 2020 reminds me of the scene in Queen’s Gambit where the young Beth Harmon played all the members of a high-school chess team at the same time. 

Several achievements could not make it to this blog, but are important pieces on the ESnet chess board. They required immense support from all parts of ESnet, CS Area staff, Lab procurement, Finance, HR, IT, Facilities, and Communications partners.

I am especially grateful to the DOE Office of Science, Advanced Scientific Computing Research leadership, NSF, and our program manager Ben Brown, whose unwavering support has enabled us to adapt and execute swiftly despite blockades. 

All this has only been possible due to the creativity, resolve, and resilience of ESnet staff — I am truly proud of each one of you. I am appreciative of the new hires that trusted their careers with us and joined us remotely—without shaking hands or even stepping foot at the lab.

My wish is for all to stay safe this holiday season, celebrate your successes, and enjoy that extra time with your immediate family. In 2021, I look forward to killer moves on the ESnet chessboard, while humanity checkmates the virus. 

Signing off for the year, 

Inder Monga

Charting a resilient path for the future

The ESnet Site Coordinating Committee (ESCC) meeting on 22-23 October was attended by over 50 members representing all of the major DOE sites and projects supported by our team. This was the first ESCC meeting held via Zoom.

The meeting focused on network resiliency, both on lessons learned from adapting to working from home, as well as longer term plans for ESnet6. 

Highlights of the sessions were ESnet’s director Inder Monga’s presentation on the ways we ensured operational continuity during the pandemic. 

Inder Monga presents on ESnet's support for the DOE complex during the pandemic
Inder Monga presents on ESnet’s support for the DOE complex during the pandemic

The DOE ESnet program manager, Ben Brown, provided a vision for future research opportunities as well as future operational needs for the nation’s scientific complex.

DOE's Ben Brown presents on future objectives and priorities
DOE’s Ben Brown presents on future objectives and priorities

Attendees identified several key activities for ESCC collaboration as part of advancing shared resilience goals. Foremost among these is the creation of a working group to develop improved metrics for ESnet resilience, to identify ways that resilience features can be better incorporated into infrastructure funding and planning, and to establish better ways to engage scientific programs into risk management processes.

ESnet thanks ESCC participants for attending and we look forward to returning to in-person ESCC meetings in future!

A sample of attendees zoom screenshots - the first ESCC via Zoom
A sample of attendees – the first ESCC via Zoom

How the World’s Fastest Science Network Was Built

Created in 1986, the U.S. Department of Energy’s (DOE’s) Energy Sciences Network (ESnet) is a high-performance network built to support unclassified science research. ESnet connects more than 40 DOE research sites—including the entire National Laboratory system, supercomputing facilities and major scientific instruments—as well as hundreds of other science networks around the world and the Internet.

Funded by DOE’s Office of Science and managed by the Lawrence Berkeley National Laboratory (Berkeley Lab), ESnet moves about 51  petabytes of scientific data every month. This is a 13-step guide about how ESnet has evolved over 30 years.

Step 1: When fusion energy scientists inherit a cast-off supercomputer, add 4 dialup modems so the people at the Princeton lab can log in. (1975)

Online3

Step 2: When landlines prove too unreliable, upgrade to satellites! Data screams through space. (1981)

18ogxd

Step 3: Whose network is best? High Energy Physics (HEPnet)? Fusion Physics (MFEnet)?  Why argue? Merge them into one-Energy Sciences Network (ESnet)-run by the Department of Energy!  Go ESnet! (1986)

ESnetListicle

Step 4: Make it even faster with DUAL Satellite links! We’re talking 56 kilobits per second! Except for the Princeton fusion scientists – they get 112 Kbps! (1987)

Satellite

Step 5:  Whoa, when an upgrade to 1.5 MEGAbits per second isn’t enough, add ATM (not the money machine, but Asynchronous Transfer Mode) to get more bang for your buck. (1995)

18qlbh

Step 6: Duty now for the future—roll out the very first IPv6 address to ensure there will be enough Internet addresses for decades to come. (2000)

18s8om

Step 7: Crank up the fastest links in the network to 10 GIGAbits per second—16 times faster than the old gear—a two-generation leap in network upgrades at one time. (2003)

18qlnc

Step 8: Work with other networks to develop really cool tools, like the perfSONAR toolkit for measuring and improving end-to-end network performance and OSCARS (On-Demand Secure Circuit and Advance Reservation), so you can reserve a high-speed, end-to-end connection to make sure your data is delivered on time. (2006)

18qn9e

Step 9: Why just rent fiber? Pick up your own dark fiber network at a bargain price for future expansion. In the meantime, boost your bandwidth to 100G for everyone. (2012)

18on55

Step 10: Here’s a cool idea, come up with a new network design so that scientists moving REALLY BIG DATASETS can safely avoid institutional firewalls, call it the Science DMZ, and get research moving faster at universities around the country. (2012)

18onw4

18oo6c

Step 11: We’re all in this science thing together, so let’s build faster ties to Europe. ESnet adds three 100G lines (and a backup 40G link) to connect researchers in the U.S. and Europe. (2014)

18qnu6

Step 12: 100G is fast, but it’s time to get ready for 400G. To pave the way, ESnet installs a production 400G network between facilities in Berkeley and Oakland, Calif., and even provides a 400G testbed so network engineers can get up to speed on the technology. (2015)

18oogv

Step 13: Celebrate 30 years as a research and education network leader, but keep looking forward to the next level. (2016)

ESnetFireworks

Attending SC15? Get a Close-up Look at Virtualized Science DMZs as a Service

Attending SC15? Get a Close-up Look at Virtualized Science DMZs as a Service

ESnet, NERSC and RENCI are pooling their expertise to demonstrate “Virtualized Science SMZs as a Service”  at the SC15 conference being held Nov. 15-20 in Austin. They will be giving the demos at 2:30-3:30 p.m. Tuesday and Wednesday and 1:30-2:30 p.m. Thursday in the RENCI booth #181.

Here’s the background: Many campuses are installing ScienceDMZs to support efficient large-scale scientific data transfers. There’s a need to create custom configurations of ScienceDMZs for different groups on campus. Network function virtualization (NFV) combined with compute and storage virtualization enables a multi-tenant approach to deploying virtual ScienceDMZs. It makes it possible for campus IT or NREN organizations to quickly deploy well-tuned ScienceDMZ instances targeted at a particular collaboration or project. This demo shows a prototype implementation of ScienceDMZ-as-a-Service using ExoGENI racks (ExoGENI is part of NSF GENI federation of testbeds) deployed at StarLight facility in Chicago and at NERSC.

The virtual ScienceDMZs deployed on-demand in these racks use the SPOT software suite developed at Lawrence Berkeley National Laboratory to connect to a data source at Argonne National Lab and a compute cluster at NERSC to provide seamless end-to-end high-speed data transfers of data acquired from Argonne’s Advanced Photon Source (APS) to be processed at NERSC. The ExoGENI racks dynamically instantiate necessary compute virtual resources for ScienceDMZ functions and connect to each other on-demand using ESnet’s OSCARS and Internet2’s AL2S system.

NRL and Collaborators Conduct 100 Gigabit/Second Remote I/O Demonstration

The Naval Research Laboratory (NRL), in collaboration with the DOE’s Energy Sciences Network (ESnet), the International Center for Advanced Internet Research (iCAIR) at Northwestern University, the Center for Data Intensive Science (CDIS) at the University of Chicago, the Open Cloud Consortium (OCC) and significant industry support, have conducted a 100 gigabits per second (100G) remote I/O demonstration at the SC14 supercomputing conference in New Orleans, LA.

The remote I/O demonstration illustrates a pipelined distributed processing framework and software defined networking (SDN) between distant operating locations. The demonstration shows the capability to dynamically deploy a production quality 4K Ultra-High Definition Television (UHDTV) video workflow across a nationally distributed set of storage and computing resources that is relevant to emerging Department of Defense data processing challenges.

Visit the My Esnet Portal at https://my.es.net/demos/sc14#/nrl to view real-time network traffic on ESnet.
Visit the My Esnet Portal at https://my.es.net/demos/sc14#/nrl to view real-time network traffic on ESnet.

Read more: http://www.nrl.navy.mil/media/news-releases/2014/nrl-and-collaborators-conduct-100-gigabit-second-remote-io-demonstration#sthash.35f9S8Wy.dpu

ESnet gives CISCO Nerd Lunch talk, learns televangelism is harder than it seems

As science transitions from lab-oriented to a distributed computational and data-intensive activity, the research and education (R&E) networking community is tracking the growing data needs of scientists. Huge instruments like the Large Hadron Collider are being planned and built. These projects require global-scale collaborations and contributions from thousands of scientists, and as the data deluge from the instruments grows, even more scientists are interested in analyzing it for the next breakthrough discovery. Suffice it to say that even though worldwide video consumption on the Internet is driving a similar increase in commercial bandwidth, the scale, characteristics, and requirements of scientific data traffic is quite different.

And this is why ESnet got invited to Cisco Systems’ headquarters last week to talk about how we how we handle data as part of their regular Nerd Lunch talk series. What I found interesting although not surprising, was that with Cisco being a big evangelist of telepresence, more employees attended the talk from their desks than in person.  This was a first for me and I came away with a new appreciation for the challenges of collaborating across distances.

From a speaker’s perspective, the lesson learnt by me was to brush up my acting skills. My usual preparations are to rehearse the difficult transitions and  focus on remembering the few important points to make on every slide. When presenting, that slide presentation portion of my brain goes on auto-pilot, while my focus turns towards evaluating the impact on the audience. When speaking at a podium one can observe when someone in the audience opens a notebook to jot down a thought, when their attention drifts to email on the laptop in front of them, or when a puzzled look appears on the face of someone as they try to figure out the impact of the point I’m trying to make. But these visual cues go missing with a largely webcast audience, making it harder to know when to stop driving home a point or when to explain the point further to the audience.  In the future, I’ll have to be better at keeping the talk interesting without the usual clues from my audience.

Maybe the next innovation in virtual-reality telepresence is just waiting to happen?

Notwithstanding the challenges of presenting to a remote audience, enabling remote collaboration is extremely important to ESnet. Audio, video and web collaboration is a key service offered by us to the DOE labs. ESnet employees use video extensively in our day-to-day operations. The “ESnet watercooler”, a 24×7 open video bridge, is used internally by our distributed workforce to discuss technical issues, as well as, to have ad-hoc meetings on topics of interest. As science goes increasingly global, scientists are also using this important ESnet service for their collaborations.

With my brief stint in front of a stage now over, it is back to ESnet and then on to the 100G invited panel/talk at IEEE ANTS conference in Mumbai. Wishing all of you a very Happy New Year!

Inder Monga

Why this spiking network traffic?

ESnet November 2010 Traffic

Last month was the first in which the ESnet network crossed a major threshold – over 10 petabytes of traffic! Traffic volume was 40% higher than the prior month and 10 times higher than just a little over 4 years ago. But what’s behind this dramatic increase in network utilization?  Could it be the extreme loads ESnet circuits carried for SC10, we wondered?

Breaking down the ESnet traffic highlighted a few things.  Turns out it wasn’t all that demonstration traffic sent across thousands of miles to the Supercomputing Conference in New Orleans (151.99 TB delivered), since that accounted for only slightly more than 1% of November’s ESnet-borne traffic.  We observed for the first time significant volumes of genomics data traversing the network as the Joint Genome Institute sent over 1 petabyte of data to NERSC. JGI alone accounted for about 10% of last month’s traffic volume. And as we’ve seen since it went live in March, the Large Hadron Collider continues to churn out massive datasets as it increases its luminosity, which ESnet delivers to researchers across the US.

Summary of Total ESnet Traffic, Nov. 2010

Total Bytes Delivered: 10.748 PB
Total Bytes OSCARS Delivered: 5.870 PB
Pecentage of OSCARS Delivered: 54.72%

What is is really going on is quite prosaic, but to us, exciting. We can follow the progress of distributed scientific projects such as the LHC  by tracking the proliferation of our network traffic, as the month-to-month traffic volume on ESnet correlates to the day-to-day conduct of science. Currently, Fermi and Brookhaven LHC data continue to dominate the volume of network traffic, but as we see, production and sharing of large data sets by the genomics community is picking up steam. What the stats are predicting: as science continues to become more data-intensive, the role of the network will become ever more important.


A few grace notes to SC10

As SC10 wound down, ESnet started disassembling the network of connections that brought experimental data from the rest of the country to New Orleans, (and at least a bit of the universe as well). We detected harbingers of 100Gbps in all sorts of places. We will be sharing our observations on promising and significant networking technologies with you in blogs to come.

We were impressed by the brilliant young people we saw at the SC Student Cluster Competition organized collaboratively part of SC Communities, which brings together programs designed to support emerging leaders and groups that have traditionally been under-represented in computing.  Teams came from U.S. universities, including Purdue, Florida A&M, SUNY Stonybrook, and University of Texas at Austin, as well as universities from China and Russia.

Florida A&M team

Nizhni Novgorod State University team

At ESnet, we are always looking for bright, committed students interested in networking internships (paid!). We are also still hiring.

 

As SC10 concluded, the computer scientists and network engineers on the streets of the city dissipated, replaced by a conference of anthropologists. SC11 is scheduled for Seattle. But before we go, a note of appreciation to New Orleans.

Katrina memorial

Across from the convention center is a memorial to the people lost to Katrina; a sculpture of a wrecked house pinioned in a tree. But if you walk down the street to the corner of Bourbon and Canal, each night you will hear the trumpets of the ToBeContinued Brass Band. The band is a group of friends who met in their high school marching bands and played together for years until scattered by Katrina. Like the city, they are regrouping, and are profiled in a new documentary.

Our mission at ESnet is to help scientists to collaborate and share research. But a number of ESnet people are also musicians and music lovers, and we draw personal inspiration from the energy, technical virtuosity and creativity of artists as well as other engineers and scientists. We are not alone in this.

New Orleans is a great American city, and we wish it well.

100G: it may be voodoo, but it certainly works

SC10, Thursday morning.

During the SC10 conference, NASA, NOAA, ESnet, the Dutch Research Consortium, US LHCNet and CANARIE announced that they would transmit 100Gbps of scientific data between Chicago and New Orleans.  Through the use of 14 10GigE interconnects, researchers attempted to  completely utilize the full 100 Gbps worth of bandwidth by producing up to twelve 8.5-to-10Gbps individual data flows.

Brian Tierney reports: “We are very excited that a team from NASA Goddard completely filled the 100G connection from the show floor to Chicago.  It is certainly the first time for the supercomputing conference that a single wavelength over the WAN achieved 100Gbps. The other thing that is so exciting about it that they used a single sending host to do it.”

“Was this just voodoo?” asked NERSC’s Brent Draney.

Tierney assures us that indeed it must have been… but whatever they did, it certainly works.

The circuits behind all those SC10 demos

It is midafternoon Wednesday at SC10 and the demos are going strong. Jon Dugan supplied an automatically updating graph in psychedelic colors  http://bit.ly/9HUrqL of the traffic ESnet is able to carry with all the circuits we set up. Getting this far required many hours of work from a lot of ESnet folk to accommodate the virtual circuit needs of both ESnet sites and SCinet customers using the OSCARS IDC software.  As always, the SCinet team has put in long hours in a volatile environment to deliver a high performance network that meets the needs of the exhibitors.