Why this spiking network traffic?

ESnet November 2010 Traffic

Last month was the first in which the ESnet network crossed a major threshold – over 10 petabytes of traffic! Traffic volume was 40% higher than the prior month and 10 times higher than just a little over 4 years ago. But what’s behind this dramatic increase in network utilization?  Could it be the extreme loads ESnet circuits carried for SC10, we wondered?

Breaking down the ESnet traffic highlighted a few things.  Turns out it wasn’t all that demonstration traffic sent across thousands of miles to the Supercomputing Conference in New Orleans (151.99 TB delivered), since that accounted for only slightly more than 1% of November’s ESnet-borne traffic.  We observed for the first time significant volumes of genomics data traversing the network as the Joint Genome Institute sent over 1 petabyte of data to NERSC. JGI alone accounted for about 10% of last month’s traffic volume. And as we’ve seen since it went live in March, the Large Hadron Collider continues to churn out massive datasets as it increases its luminosity, which ESnet delivers to researchers across the US.

Summary of Total ESnet Traffic, Nov. 2010

Total Bytes Delivered: 10.748 PB
Total Bytes OSCARS Delivered: 5.870 PB
Pecentage of OSCARS Delivered: 54.72%

What is is really going on is quite prosaic, but to us, exciting. We can follow the progress of distributed scientific projects such as the LHC  by tracking the proliferation of our network traffic, as the month-to-month traffic volume on ESnet correlates to the day-to-day conduct of science. Currently, Fermi and Brookhaven LHC data continue to dominate the volume of network traffic, but as we see, production and sharing of large data sets by the genomics community is picking up steam. What the stats are predicting: as science continues to become more data-intensive, the role of the network will become ever more important.

Cheers for Magellan

We were glad to see DOE’s Magellan project getting some well-deserved recognition by the HPCwire Readers’ and Editors’ Choice Award at SC10 in New Orleans. Magellan investigates how cloud computing can help DOE researchers to manage the massive (and increasing) amount of data they generate in scientific collaborations. Magellan is a joint research project at NERSC at Berkeley Lab in California and Argonne Leadership Computing Facility in Illinois.

This award represents teamwork on several fronts. For example, earlier this year, ESnet’s engineering chops were tested when the Joint Genome Institute, one of Magellan’s first users, urgently needed increased computing resources at short notice.

Within a nailbiting span of several hours, technical staff at both centers collaborated with ESnet engineers to establish a dedicated 9 Gbps virtual circuit between JGI and NERSC’s Magellan system over ESnet’s Science Data Network (SDN). Using the ESnet-developed On-Demand Secure Circuits and Advance Reservation System (OSCARS), the virtual circuit was set up within an hour after the last details were finalized.

NERSC raided its closet spares for enough networking components to construct a JGI@NERSC local area network and migrated a block of Magellan cores over to JGI control.  This allowed NERSC and JGI staff to spend the next 24 hours configuring hundreds of processor cores on the Magellan system to mimic the computing environment of JGI’s local compute clusters.

With computing resources becoming more distributed, complex networking challenges will occur more frequently. We are constantly solving high-stakes networking problems in our job connecting DOE scientists with their data. But thanks to OSCARS, we now have the ability to expand virtual networks on demand. And OSCARS is just getting better as more people in the community refine its capabilities.

The folks at JGI claim they didn’t feel a thing. They were able to continue workflow and no data was lost in the transition.

Which makes us very encouraged about the prospects for Magellan, and cloud computing in general. Everybody is hoping that putting data out there in the cloud will expand capacity.  At ESnet, we just want to make the ride as seamless and secure as possible.

Kudos to Magellan. We’re glad to back you up, whatever the weather.

A few reasons why ESnet matters to scientists.

Keith Jackson, ESnet Guest Blogger

Recently we’ve been testing the ability to move huge amounts of scientific data in and out of commercial cloud providers like Amazon and Google. We were doing this because if you want to do scientific computation in the cloud, you need to be able to move data in and out efficiently or it will never be useful for science.

Recently we’ve been working with engineers at Google to test the performance of their cloud storage solution. We were in the midst of transferring data between Berkeley Lab servers and the Google cloud when we noticed the data wasn’t moving as fast as it should.

We tried to figure out the root of the problem. The Google folks talked to their networking people and we talked to our engineers at ESnet.

We found there was a bottleneck in the path between Berkeley and Google on the ESnet side. One path was still only 1 gigabit and was scheduled to be upgraded to 10 gigabit in the next week or so. But it limited us to no more than a gigabit per second data transfers.

Using OSCARS, not only did we find the bottleneck, but as Vangelis talked about in a prior blogpost, we were able to find a way to reroute traffic to avoid the slow link, completely bypassing the problem. ESnet was not only able to help me diagnose the problem right away, but were able to suggest and quickly deploy a solution.

In thinking about that problem, a few things occurred to me. For a scientist just concerned with getting data through the network, it is probably easier to work with ESnet than a commercial provider for several reasons.

As a research network, ESnet is completely accessible. A commercial provider would have been completely opaque because of proprietary issues and have no incentive to grant access into its network for troubleshooting by outsiders. Since serving scientists is not its main mission, its sense of urgency would be different. Moreover, a commercial network’s interfaces are not designed for the particular needs of scientists.

But ESnet exists solely to support science, and scientists. Sometimes we need to be reminded that to scientists, quite literally, the “network matters.”

Direct Wormhole to Google Cloud

Earlier this week our network engineers were presented with an interesting problem: researchers from Lawrence Berkeley National Laboratory were moving data in and out of the Google Cloud service, but it looked like the transfers were “slow”, running at a mere 1 gigabit per second. Most people wouldn’t call that slow – but we know that we can do better!

After some investigation, it turned out that all these transfers were going through a bottleneck in the network: an outdated 1Gbps connection to a commercial internet exchange located in San Jose, CA, that hasn’t yet been upgraded to the usual 10Gbps.

To resolve this, we decided to do a bit of traffic engineering: create a network “wormhole” that would suck in data from LBNL, move it through the Science Data Network, and drop it off to a different internet exchange point thousands of miles away – in Chicago, IL.

This is a picky wormhole, by the way; it will only suck in data that needs to travel between the researchers’ computers and Google Cloud, leaving other data flows alone. And, as long as the data is traveling in the wormhole, other traffic can’t cause any congestion that would limit throughput. We call these virtual circuits, and the OSCARS software developed here at ESnet provides the ability for our engineers to easily create and manage them.

Keith Jackson, a scientist in Advanced Computing for Science and Computational Research division at LBNL, had this to say:

”It was really impressive that we were able, in a matter of hours to set up a circuit and route this traffic to the Google cloud to avoid this network bottleneck. From my perspective as a researcher, the process looked seamless. This allowed us to conduct tests that we couldn’t have done otherwise. “

ESnet has a lot of virtual circuits snaking through our network – about 30 at last count. This one, though, is special: it’s the first one that connects up one of ESnet’s sites with a commercial service such as Google Cloud.

Jackson and other researchers are examining how commercial networks can be used for data driven computation. They are exploring with Google how fast we will be able to move data and what infrastructure is necessary to do this—one virtual circuit at a time.

ESnet's latest method of selective data transmission