ECSEL leverages OpenFlow to demonstrate new network directions

ESnet and its collaborators successfully completed three days of demonstrating its End-to-End Circuit Service at Layer 2 (ECSEL) software at the Open Networking Summit held at Stanford a couple of weeks ago. Our goal is to build “zero-configuration circuits” to help science applications seamlessly use networks for optimized end-to-end data transport. ECSEL, developed in collaboration with NEC, Indiana University, and the University of Delaware builds on some exciting new conceptual thinking in networking.

Wrangling Big Data 

To put ECSEL in context, the proliferating tide of scientific data flows – anticipated at 2 petabytes per second as planned large-scale experiments get in motion – is already challenging networks to be exponentially more efficient. Wide area networks have vastly increased bandwidth and enable flexible, distributed, scientific workflows that involve connecting multiple scientific labs to a supercomputing site, a university campus, or even a cloud data center.

Heavy network traffic to come

The increasing adoption of distributed, service-oriented computing means that resource and vendor independence for service delivery is a key priority for users. Users expect seamless end-to-end performance and want the ability to send data flows on demand, no matter how many domains and service providers are involved.  The hitch is that even though the Wide Area Network (WAN) can have turbocharged bandwidth, at these exponentially increasing rates of network traffic even a small blockage in the network can seriously impair the flow of data, trapping users in a situation resembling commute conditions on sluggish California freeways. These scientific data transport challenges that we and other R&E networks face are just a taste of what the commercial world will encounter with the increasing popularity of cloud computing and service-driven cloud storage.

Abstracting a solution

One of the key feedback from application developers, scientists and end-users is that they do not want to deal with the complexity at the infrastructure level while still accomplishing their mission. At ESnet, we are exploring various ways to make networks work better for users. A couple of concepts could be game-changers, according to Open Network Summit conference presenter and Berkeley professor Scott Shenker: 1) using abstraction to manage network complexity, and 2) extracting and exposing simplicity out of the network. Shenker himself cites Barbara Liskov’s Turing Lecture as inspiration.

ECSEL is leveraging OSCARS and OpenFlow within the Software Defined Networking (SDN) paradigm to elegantly prevent end-to-end network traffic jams.  OpenFlow is an open standard to allow application-driven manipulation of network flows. ECSEL is using OSCARS-controlled MPLS virtual circuits with OpenFlow to dynamically stitch together a seamless data plane delivering services over multi-domain constructs.  ECSEL also provides an additional level of simplicity to the application, as it can discover host-network interconnection points as necessary, removing the requirement of applications being “statically configured” with their network end-point connections. It also enables stitching of the paths end-to-end, while allowing each administrative entity to set and enforce its own policies. ECSEL can be easily enhanced to enable users to verify end-to-end performance, and dynamically select application-specific protocol forwarding rules in each domain.

The OpenFlow capabilities, whether it be in an enterprise/campus or within the data center, were demonstrated with the help of NEC’s ProgrammableFlow Switch (PFS) and ProgrammableFlow Controller (PFC). We leveraged a special interface developed by them to program a virtual path from ingress to egress of the OpenFlow domain. ECSEL accessed this special interface programmatically when executing the end-to-end path stitching workflow.

Our anticipated next step is to develop ECSEL as an end-to-end service by making it an integral part of a scientific workflow. The ECSEL software will essentially act as an abstraction layer, where the host (or virtual machine) doesn’t need to know how it is connected to the network–the software layer does all the work for it, mapping out the optimum topologies to direct data flow and make the magic happen. To implement this, ECSEL is leveraging the modular architecture and code of the new release of OSCARS 0.6.  Developing this demonstration yielded sufficient proof that well-architected and modular software with simple APIs, like OSCARS 0.6, can speed up the development of new network services, which in turn validates the value-proposition of SDN. But we are not the only ones who think that ECSEL virtual circuits show promise as a platform for spurring further innovation. Vendors such as Brocade and Juniper, as well as other network providers attending the demo were enthusiastic about the potential of ECSEL.

But we are just getting started. We will reprise the ECSEL demo at SC11 in Seattle, this time with a GridFTP application using Remote Direct Memory Access (RDMA) which has been modified to include the XSP (eXtensible Session Protocol) that acts as a signaling mechanism enabling the application to become “network aware.”  XSP, conceived and developed by Martin Swany and Ezra Kissel of Indiana University and University of Delaware,  can directly interact with advanced network services like OSCARS – making the creation of virtual circuits transparent to the end user. In addition, once the application is network aware, it can then make more efficient use of scalable transport mechanisms like RDMA for very large data transfers over high capacity connections.

We look forward to seeing you there and exchanging ideas. Until Seattle, any questions or proposals on working together on this or other solutions to the “Big Data Problem,” don’t hesitate to contact me.

–Inder Monga

imonga@es.net

ECSEL Collaborators:

Eric Pouyoul, Vertika Singh (summer intern), Brian Tierney: ESnet

Samrat Ganguly, Munehiro Ikeda: NEC

Martin Swany, Ahmed Hassany: Indiana University

Ezra Kissel: University of Delaware

Why this spiking network traffic?

ESnet November 2010 Traffic

Last month was the first in which the ESnet network crossed a major threshold – over 10 petabytes of traffic! Traffic volume was 40% higher than the prior month and 10 times higher than just a little over 4 years ago. But what’s behind this dramatic increase in network utilization?  Could it be the extreme loads ESnet circuits carried for SC10, we wondered?

Breaking down the ESnet traffic highlighted a few things.  Turns out it wasn’t all that demonstration traffic sent across thousands of miles to the Supercomputing Conference in New Orleans (151.99 TB delivered), since that accounted for only slightly more than 1% of November’s ESnet-borne traffic.  We observed for the first time significant volumes of genomics data traversing the network as the Joint Genome Institute sent over 1 petabyte of data to NERSC. JGI alone accounted for about 10% of last month’s traffic volume. And as we’ve seen since it went live in March, the Large Hadron Collider continues to churn out massive datasets as it increases its luminosity, which ESnet delivers to researchers across the US.

Summary of Total ESnet Traffic, Nov. 2010

Total Bytes Delivered: 10.748 PB
Total Bytes OSCARS Delivered: 5.870 PB
Pecentage of OSCARS Delivered: 54.72%

What is is really going on is quite prosaic, but to us, exciting. We can follow the progress of distributed scientific projects such as the LHC  by tracking the proliferation of our network traffic, as the month-to-month traffic volume on ESnet correlates to the day-to-day conduct of science. Currently, Fermi and Brookhaven LHC data continue to dominate the volume of network traffic, but as we see, production and sharing of large data sets by the genomics community is picking up steam. What the stats are predicting: as science continues to become more data-intensive, the role of the network will become ever more important.


Cheers for Magellan

We were glad to see DOE’s Magellan project getting some well-deserved recognition by the HPCwire Readers’ and Editors’ Choice Award at SC10 in New Orleans. Magellan investigates how cloud computing can help DOE researchers to manage the massive (and increasing) amount of data they generate in scientific collaborations. Magellan is a joint research project at NERSC at Berkeley Lab in California and Argonne Leadership Computing Facility in Illinois.

This award represents teamwork on several fronts. For example, earlier this year, ESnet’s engineering chops were tested when the Joint Genome Institute, one of Magellan’s first users, urgently needed increased computing resources at short notice.

Within a nailbiting span of several hours, technical staff at both centers collaborated with ESnet engineers to establish a dedicated 9 Gbps virtual circuit between JGI and NERSC’s Magellan system over ESnet’s Science Data Network (SDN). Using the ESnet-developed On-Demand Secure Circuits and Advance Reservation System (OSCARS), the virtual circuit was set up within an hour after the last details were finalized.

NERSC raided its closet spares for enough networking components to construct a JGI@NERSC local area network and migrated a block of Magellan cores over to JGI control.  This allowed NERSC and JGI staff to spend the next 24 hours configuring hundreds of processor cores on the Magellan system to mimic the computing environment of JGI’s local compute clusters.

With computing resources becoming more distributed, complex networking challenges will occur more frequently. We are constantly solving high-stakes networking problems in our job connecting DOE scientists with their data. But thanks to OSCARS, we now have the ability to expand virtual networks on demand. And OSCARS is just getting better as more people in the community refine its capabilities.

The folks at JGI claim they didn’t feel a thing. They were able to continue workflow and no data was lost in the transition.

Which makes us very encouraged about the prospects for Magellan, and cloud computing in general. Everybody is hoping that putting data out there in the cloud will expand capacity.  At ESnet, we just want to make the ride as seamless and secure as possible.

Kudos to Magellan. We’re glad to back you up, whatever the weather.

The circuits behind all those SC10 demos

It is midafternoon Wednesday at SC10 and the demos are going strong. Jon Dugan supplied an automatically updating graph in psychedelic colors  http://bit.ly/9HUrqL of the traffic ESnet is able to carry with all the circuits we set up. Getting this far required many hours of work from a lot of ESnet folk to accommodate the virtual circuit needs of both ESnet sites and SCinet customers using the OSCARS IDC software.  As always, the SCinet team has put in long hours in a volatile environment to deliver a high performance network that meets the needs of the exhibitors.

Catch ESnet roundtable discussions today at SC10, 1 and 2 p.m.

Wednesday Nov. 17 at SC10:

At 1 p.m. at Berkeley Lab booth 2448, catch ESnet’s Inder Monga’s round-table discussion on OSCARS virtual circuits. OSCARS, the acronym for On- demand Secure Circuits and Advance Reservation System, allows users to reserve guaranteed bandwidth. Many of the demos at SC10 are being carried by OSCARS virtual circuits which were developed by ESnet with DOE support. Good things to come: ESnet anticipates the rollout of OSCARS 0.6 in early 2011. Version 0.6 will offer greatly expanded capabilities and versatility, such as a modular architecture enabling easy plug and play of the various functional modules and a flexible path computation engine (PCE) workflow architecture.

Then, stick around, because next at 2 p.m.  Brian Tierney from ESnet will lead a roundtable on the research being produced from the ARRA-funded Advanced Networking Initiative (ANI) testbed.

In 2009, the DOE Office of Science awarded ESnet $62 million in recovery funds to establish ANI, a next generation 100Gbps network connecting DOE’s largest unclassified supercomputers, as well as a reconfigurable network testbed for researchers to test new networking concepts and protocols.

Brian will discuss progress on the 100Gbps network, update you on the several research projects already underway on the testbed, discuss testbed capabilities and how to get access to the testbed. He will also answer your questions on how to submit proposals for the next round of testbed network research.

In the meantime, some celeb-spotting at the LBNL booth at SC10.

Inder Monga
Brian Tierney

Depicting the early universe closer to home at SC10

Rick Wagner in front of the early universe

It’s Wednesday at 10 a.m. in the SCSD booth, and Rick Wagner is testing simulations of cosmic matter and gases streamed in from Argonne National Lab. Wagner about to run a real time volume-rendering application at Argonne. The application renders data in real time, which will stream the results across a wide area (from Argonne to New Orleans) and display it on the tiled screen in the SDSC booth. To do so, SDSC is using OSCARS, ESnet’s on-demand reservation software to schedule data movement on demand.

Aside from the sheer technical feat of rendering data in real time and streaming massive amounts of it across long distances, on-demand data scheduling enables scientists to be more versatile–easily working with the data as needed. For Wagner and his collaborators, improvements in data streaming are all about new capabilities. “We’ve never had this functionality,” said Wagner. “We want to be able to compare the data sets side by side.”

Wagner will next add in variables such as radiation, to the images depicting gasses and matter from the early moments of the universe. This kind of demo illustrates what ESnet is all about. It is our mission to link scientists to collaborators and their data. But we are always striving for improvements in functionality, so that our end users will be more effective in their research.

ESnet recognized for outstanding performance

ESnet’s Evangelos Chaniotakis and Chin Guok received Berkeley Lab’s Outstanding Performance Award for their work in promoting technical standards for international scientific networking. Their work is notable because the implementation of open-source  software development and new technical standards for network interoperability sets the stage for scientists around the world to better share research and collaborate.

Guok and Chaniotakis worked extensively within the DICE community on development of the Inter-domain Controller Protocol (IDCP). They are taking the principles and lessons gained from years of development efforts and applying them to the efforts in international standards bodies such as the Open Grid Forum (OGF), as well as consortia such as the Global Lambda Infrastructure Facility (GLIF).

So far, the IDCP has been adopted by more than a dozen Research and Education (R&E) networks around the world, including Internet2 (the leading US higher education network), GEANT (the trans-European R&E network), NORDUnet (Scandinavian R&E network) and USLHCNet (high speed trans-Atlantic network for the LHC community).

Guok and Chaniotakis have also advanced the widescale deployment of ESnet’s virtual circuits OSCARS (On Demand Secure Circuits and Reservation System). OSCARS, developed with DOE support, enables networks
to schedule and move the increasingly vast amounts of data generated by large-scale scientific collaborations. Since last year, ESnet has seen a 30% increase in the use of virtual circuits. OSCARS virtual circuits now carry over 50% of ESnet’s monthly production traffic.  The increased use of virtual circuits was a major factor enabling ESnet to easily handle a nearly 300% rise in traffic from June 2009 to May 2010.

Why are we reincarnating OSCARS?

OSCARS ESnet traffic patterns

Some recent articles on new developments in virtual circuits such as Fenius and cloud computing with Google, Internet2’s announcements of its ION service, and the recently funded DYNES proposal are all powered by OSCARS or On Demand Secure Circuits and Reservation System, a software engine developed with DOE funding. This open-source software engine provides us with the capability of building a network with highly dynamic, traffic-engineered flows that meet the research data transport needs of scientists. The current deployed release, 0.5.2, has been deployed as a production service within ESnet for the past 3 years. We are currently enhancing 0.5.3 and plan to release the software in the Q4, 2010 time frame.

In the course of running this software as a production service and interacting with scientists, network researchers, and standards community at OGF, we realized we had to redesign the software architecture to be a much more robust and extensible platform. We wanted to be able to easily add new features to the OSCARS platform that would cater to a variety of network engineers and researchers.  With this in mind, the re-architectured OSCARS is planned as release version 0.6. Like any successful product, transitioning from a deployed release to a new one involves thorny issues like backward compatibility and feature parity. Hence, the current balancing act of taking something that is quite good and proven (0.5.2), but making it even better a.k.a. 0.6.

Here are four good reasons why OSCARS 0.6 is the way to go:

1. It can meet production requirements: The modular architecture enables features to be added through the use of distinct modules. This allows specific deployment requirements to be easily integrated into the service. For example, if it is necessary to support a federated AA implementation, the AA modules can be replaced with ones that are compliant with that AA framework (e.g. Shibboleth).  Another example would be High Availability (HA). The 0.6 architecture helps provide HA on a component basis, ensuring that the critical components do not fail.

2. It provides new complex features: As end-sites and their operators become comfortable with point to point provisioning of virtual circuits, we are getting increased requests for complex feature enhancements. The OSCARS 0.5 software architecture is not especially suitable for new features like multi-point circuits and/or multi-layer provisioning. But these new feature requests increase the urgency of moving to the 0.6 release that has been designed with such enhancements in mind. Moreover, the multi-layer ARCHSTONE research project funded by DOE will use 0.6 as the base research platform.

3. Research/GENI and other testbeds: The research community is a major constituent for OSCARS and its continuing development. This community is now conducting experiments on real infrastructure testbeds like the ANI and GENI. To really leverage the power of those testbeds, the research community wants to leverage the OSCARS software base/framework, while researching/innovating on certain algorithms and testing them. OSCARS 0.6 platform’s modular architecture enables the researcher to replace any component with new algorithmic research module. For example, with the new PCE engine re-design, one can write a flexible workflow of custom PCE’s. This flexibility does not exist with the purpose-built, but monolithic architecture of the OSCARS 0.5 codebase.

4. NSI Protocol/Standards: As the European and Asian research and education communities move towards interoperability with the US, it is important to leverage a common understanding brought through via standards. The NSI protocol standardization being discussed in the OGF NSI working group (http://ogf.org/gf/group_info/view.php?group=nsi-wg) needs to be implemented by the network middleware open source community like OSCARS. We feel that the 0.6 is the right platform to upgrade to the standard NSI protocol whenever it is ready.

At ESnet, we invest considerable time in new technology development, but balance this with our operational responsibilities. We invite the community to join in developing OSCARS 0.6, which has greatly improved capabilities over OSCARS 0.5.2. With your participation in the development process, we can accelerate the 0.6 architected software to production-quality as soon as possible.  If this excites you, we welcome you to contribute to the next stage of the OSCARS open source project.

–Chin Guok

A few reasons why ESnet matters to scientists.

Keith Jackson, ESnet Guest Blogger

Recently we’ve been testing the ability to move huge amounts of scientific data in and out of commercial cloud providers like Amazon and Google. We were doing this because if you want to do scientific computation in the cloud, you need to be able to move data in and out efficiently or it will never be useful for science.

Recently we’ve been working with engineers at Google to test the performance of their cloud storage solution. We were in the midst of transferring data between Berkeley Lab servers and the Google cloud when we noticed the data wasn’t moving as fast as it should.

We tried to figure out the root of the problem. The Google folks talked to their networking people and we talked to our engineers at ESnet.

We found there was a bottleneck in the path between Berkeley and Google on the ESnet side. One path was still only 1 gigabit and was scheduled to be upgraded to 10 gigabit in the next week or so. But it limited us to no more than a gigabit per second data transfers.

Using OSCARS, not only did we find the bottleneck, but as Vangelis talked about in a prior blogpost, we were able to find a way to reroute traffic to avoid the slow link, completely bypassing the problem. ESnet was not only able to help me diagnose the problem right away, but were able to suggest and quickly deploy a solution.

In thinking about that problem, a few things occurred to me. For a scientist just concerned with getting data through the network, it is probably easier to work with ESnet than a commercial provider for several reasons.

As a research network, ESnet is completely accessible. A commercial provider would have been completely opaque because of proprietary issues and have no incentive to grant access into its network for troubleshooting by outsiders. Since serving scientists is not its main mission, its sense of urgency would be different. Moreover, a commercial network’s interfaces are not designed for the particular needs of scientists.

But ESnet exists solely to support science, and scientists. Sometimes we need to be reminded that to scientists, quite literally, the “network matters.”

See us at Joint Techs / ESCC

ESnet will be presenting at the Summer Joint Techs / ESCC meeting next week July 11-15 in Columbus, Ohio.  July 11, Joe Metzger and Brian Tierney will be giving a tutorial on “Improving End to End Bulk Data Transfer Rates” at 3 pm that focuses on the problems of moving TeraByte-scale data sets, and Jon Dugan will talk about Iperf in the Network Tools Tutorial.  July 12, at 2:40 pm, Brian Tierney will also be giving a Status Update on the DOE ANI Network Testbed.  July 13, at 10 am, ESnet’s Inder Monga will be replacing Chris Tracy on the panel “Dynamic Provisioning in Multi-Layer, Multi-Vendor Networks“, and Jon Dugan will give another presentation on ESxSNMP.  And finally, at 8:20 am on the 14th, Steve Cotter will give an ESnet Update.

Immediately following Joint Techs, the ESnet Site Coordinating Committee Meeting begins at 1 pm. The agenda is posted here:  http://indico.fnal.gov/conferenceOtherViews.py?view=standard&confId=3428 The ESnet team’s talks will outline the nuts and bolts behind 4 key areas integral to ESnet’s overall strategy:

1. Being an essential scientific resource for DOE. ESnet is making great strides in providing optimal connectivity between DOE labs as well as further developing dedicated network resources, such as our securing of dark fiber at Brookhaven. We are laying the groundwork to manage rapidly accelerating increases in DOE scientific networking traffic.  The first afternoon, Steve Cotter will give a more detailed update on ESnet’s activities at 2:10 pm and Greg Bell will lead the discussion about the ESnet implications of site reliance on cloud or externally-hosted services at 3:55 pm.

2. Knowing our users better than anyone. Steve Cotter will talk about new ways we will be reaching out to and listening to our users needs during his talk.

3. Setting a global standard for user experience.  We may not have invented the seamless user experience, but end to end data transmission is all our users care about. To that end we will be talking about our work on Graphite, URL and Weathermap.  Also, Thursday starting at 9:40 am Joe Metzger will report on the PerfSONAR Joint Interagency Demonstration Project followed by Evangelos Chaniotakis’s presentation on ESnet’s virtual circuit services status.

4. Efficiency. Helping our users optimize their networking resources in collaborations, accessing instrumentation and exascale computing needs in the most energy efficient ways possible.  Be sure not to miss Wednesday evening’s Focus Session on improving WAN network performance with Eli Dart and Joe Metzger beginning at 6:30 pm.

See you in Columbus!