perfSONAR is a set of tools that help engineers actively monitor & troubleshoot multi-domain network issues. It provides the ability to pinpoint and solve service problems immediately to restore the highest levels of network performance
Debugging network performance problems on long multi-domain paths that cross the Atlantic has just become easier for ESnet users. The GÉANT network, a pan-European communications infrastructure serving Europe’s research and education community, has improved the perfSONAR performance monitoring software architecture and toolset it co-developed with ESnet, enhancing the ability of users in the U.S. and Europe to spot red flags in network performance.
PerfSONAR (or as GÉANT calls the European implementation, perfSONAR MDM or Multi-Domain Monitoring) software, enables users to pinpoint trouble spots and bolster end-to-end network performance across network domains—and now across the Atlantic. This improvement to perfSONAR will provide ESnet users enhanced visibility and interoperability, speed the identification of performance issues and help prevent delays caused by engineers working across multiple time zones. European and U.S. users can now monitor network performance via the same intuitive interface using consistent data formats. Specifically, it allows ESnet users to run tests from 8 of the more than 70 ESnet measurement points across the country, including Fermilab in Chicago, IL, Brookhaven National Laboratory in Upton, NY, and Lawrence Berkeley National Laboratory in Berkeley, CA to measurement points distributed across the GÉANT European network and vice versa.
This combination of ESnet and European perfSONAR measurement points provides a comprehensive end-to-end view of network performance, invaluable for global data exchange in projects such as CERN’s Large Hadron Collider.
Is your network up to the data demands of the LHC?
This spring ESnet achieved something akin to global presence, figuratively with our network, and in-person at conferences as we traded ideas with the technical community, such as the limitations of bandwidth on demand, and how to compose services that are easy for end users to understand and use. In May Steve Cotter, Bill Johnston, and Inder Monga were invited speakers at the TERENA Networking Conference in Prague.
Inder Monga followed with a presentation on “Network Service Interface: Concepts and Architecture,” that discussed the motivation, concepts and architecture in the upcoming Open Grid Forum standard that has the promise to enable researchers simple abstract constructs to dynamically create and manage their communication infrastructure to serve their science. During the talk he explained some of the differentiating attributes of the protocol: Recursive and flexible request and response framework, abstraction of physical topology into a service layer representation—and declared that composable services are the next logical step in network design. The key to dealing with complex infrastructure is to abstract it into objects the users can understand, but that is just the beginning. A composable services model contains essential elements like abstracted technical requirements in a language that all users can understand, failsafe backups, service changes that are transparent, transport efficiency and monitoring for “soft” failures. He pointed out that a Topology Service would be the next target for standardization once the Connection Service was fully specified.
Steve Cotter talked about meeting user expectations in “Fighting a Culture of ‘Bad is Good Enough,” asserting that bandwidth on demand on its own is inadequate to meet the growing needs of science. In ESnet’s surveys scientists report that while the technology is often there, they don’t know how to access it or how to make it work. The result is that poor network performance is often the norm at various sites and scientists are left to fend for themselves without technical assistance. Frustrated, many simply give up attempting to send data via the network and instead use ‘sneakernet’. But it doesn’t have to be this way. Cotter cited the LHC as one example of investment of time and commitment to do networking right. “For us as a community to succeed, we need to provide intuitive services to researchers, and documentation and assistance to make it easy for them.” said Cotter, before he launched into a run-down of new ESnet tools and ventures.
Now that OSCARS version 0.6 is code-complete, ESnet is taking offers to help test the code. ESnet is also working with its sites to build secure, dedicated enclaves on the perimeters of networks, dubbed Science DMZs, which are fully instrumented with perfSONAR. Separating the campus science traffic from converged network services like VOIP makes it easier to debug and improves performance across the WAN. To make it easier to test and troubleshoot infrastructure, ESnet has created a community knowledge base, http://fasterdata.es.net that regularly receives more than 2500 hits a week. ESnet is also developing a multi-function web portal called MyESnet that it will launch at ESCC/Joint Techs in a few weeks. MyESnet will have lots of tools and new features for the scientist and networker, including: traffic flow visualizations, high-level information about ‘site health’, the ESnet maintenance calendar, a discussion forum and idea repository, as well as one-stop shop where users will be able to log in with Shibboleth or OpenID, initiate perfSONAR tests, and open trouble tickets.
Going beyond just bandwidth on demand
Three weeks later at the NORDUnet conference in Reykjavik, Iceland, Inder Monga discussed the ins and outs of developing composable network services on demand. Given new developments in network virtualization, co-scheduling, cloud services and 100G bandwidth, the network is playing an ever larger role in providing scientists new services.
Incidentally, Inder used high-speed networking to accomplish the enviable feat of being two places at once without violating any laws of physics. Upon landing in Iceland, Inder promptly presented a talk on green networking from Iceland for the conference on Green and Sustainable ICT in Delhi, India.
Designing “greener” networks is one of ESnet’s key priorities, and something you will be hearing more about from ESnet in the future.
Last week ESnet and Berkeley Lab Computing Sciences sponsored a talk on network anomaly detection by Prasad Calyam, Ph.D., from the Ohio Supercomputer Center/OARnet, and The Ohio State University. For the last year and a half, Calyam has worked on a DOE funded perfSONAR-related project. He emphasized that accurate measurement is necessary to be able to troubleshoot in multiple domains and layers. Calyam’s group is developing metrics and adaptive performance sampling techniques to analyze network performance across multiple layers for better network status awareness, performance, and to determine optimal paths for large data sets.
To do active measurements you need intelligent sampling, Calyam says, but the types of measurements necessary are difficult to accomplish due to policies and other constraints.
Calyam’s group is currently developing two new perfSONAR tools:
The OnTimeDetect Tool to detect network anomalies. The OnTimeDetect Tool leverages perfSONAR lookup services to query for projects or sites, and then using the perfSONAR Measurement Archives for data on the path of interest. The OnTimeDetect tool uses this data to accurately and rapidly detect anomalies as they occur.
The OnTimeSample Tool does intelligent forecasting to manipulate and plan network infrastructure, and validating them with enterprise monitoring data from the E-Center portal led by Fermilab as well as ESnet’s over 60 perfSONAR deployments. The next step is to integrate the tools to present information in a more user friendly way. Useful network performance information can be collected from user logs, anomaly alerts, and measurement data from applications such as PerfSONAR, PingER, and ESxSNMP (ESnet’s SNMP database).
Active vs Passive Measurement
Calyam is also building a mechanism that will enable network infrastructure providers to continuously check networks using both active measurements of end- to-end performance, and passive measurements collected by instrumentation deployed at strategic points in a network.
According to Calyam, one cannot accurately assess network status without adaptive and random sampling techniques. Calyam’s group is trying to determine the optimal sampling frequency and distribution to monitor networks and to forecast and detect anomalies.
“You can do sampling in a particular domain that you control, but the real challenge is in multi-network domains controlled by multiple entities.” Calyam comments, “For this approach to work, you need a federation of ISPs to share network performance information.” For example the perfSONAR measurement federation (e.g. ESnet, Internet2 and GEANT) can share measurement topologies, policies, and measurement exchange formats for mutual troubleshooting.
His group is working an application for scientists who are using instrumentation remotely and experiencing lag in data. Suitable sampling can indicate to users whether the lag in instrument control they are experiencing is due to lags in the movement of physical instrument components, or due to network latency.
However, perfSONAR cannot yet handle strict sampling patterns. It is engineered for ease of use, but that means a trade-off in sampling precision and sophistication. Its current set of tools; ping, traceroute, owamp, and bwctl can potentially conflict with each other or when used concurrently along with any other active measurement tools. Calyam advocates a meta-scheduling function to control measurement tools, as well as new regulation policies and semantic priorities. His group is also building some model user interfaces, with a GUI tool, a twitter publishing API, Google charts that perfSONAR uses as well as Graphite charts.
Calyam’s group was the first to query perfSONAR measurements on 480 paths and 65 sites worldwide. The group has so far developed an adaptive anomaly detection algorithm, demonstrated a new adaptive sampling scheme and released a set of algorithms to the perfSONAR user and developer community. http://ontimedetect.oar.net. Tools are also available at the perfSONAR website.
This guest blog is contributed by Warren Matthews, Cyber-Infrastructure Chief Engineer at the Deep Underground Science and Engineering Lab (DUSEL).
The Deep Underground Science and Engineering Laboratory (DUSEL) is a research lab being constructed in the former Homestake gold mine in Lead, South Dakota, now resurrected to mine data about the earth, new life forms, and the universe itself. When finished, DUSEL will explore fundamental questions in particle physics, nuclear physics and astrophysics. Biologists will study life in extreme environments. Geologists will study the structure of the earth’s crust. Early science programs have already begun to explore some of these questions. In addition, DUSEL education programs are underway to inspire students to pursue careers in science, technology, engineering, and mathematics. This interdisciplinary collaboration of scientists and engineers is led by the University of California at Berkeley and the South Dakota School of Mines and Technology.
I am the cyberinfrastructure chief engineer for DUSEL. As such, my concern is the research environment and advanced services that will be needed to accomplish our scientific goals. To enable future discoveries, scientists will need to capture, analyze, and exchange their data. We will have to deploy and perhaps even develop new technologies to provide the scientists with the technical and logistical support for their research. We expect that the unique research opportunities and instrumentation that will be established at DUSEL will draw scientific teams from all over the world to South Dakota, so high-speed national and International network connectivity will also critical.
National laboratories have made many important contributions in the development of IT and networking technology. I’m very pleased that DUSEL is the newest member of the ESnet community and I have no doubt that we’ll be leveraging their expertise. In conversations with numerous colleagues at other labs it has become apparent that although DUSEL is starting with a clean slate and there are no legacy systems to support, we still have common issues and some difficult decisions to consider. All the labs have the challenges of meeting the needs of both large and small scientific collaborations. We all feel the budget crunch and are streamlining our support infrastructure. We are all wondering how we can optimize our use of the Cloud.
At DUSEL we have our own particular challenges, starting with an extreme underground environment. On the surface, the Black Hills of South Dakota may be freezing, but the further you go down in the mine, the hotter it gets. Rock temperatures at the 4850′ level, where the mid-level campus is under construction, are around 70F (21〫C) and humidity is around 88%. At the 7400′ level, where the deep-level campus is planned, temperatures hover around 120F (50〫C). The high levels of temperature and humidity have a significant impact on computer equipment. We’ll figure out our challenges as we go, depending on shared expertise. After all, national labs were created to focus effort and move forward knowledge where no one university could marshal the resources required. Our goal is to provide a platform where science, technology, and innovation are able to flourish.
We anticipate technology partnerships with the many experiments are going underground at DUSEL. Currently we are expanding IPv6 and deploying perfSONAR. We are leveraging HD video conferencing. We are worrying about identity management and cyber security. We are establishing the requirements for dynamic network provisioning. And at the same time we’re wondering what other technologies will emerge in the next 20 or 30 years and what will be required to dig for new discoveries. You can keep track of our progress here at the Sanford Laboratory Youtube Channel.
ESnet’s performance knowledge base, fasterdata.es.net, had grown organically for almost 15 years, and was in serious need of an overhaul. Out-of-date information needed to be removed, and a general reorganization was needed. As ESnet is in the middle of converting its web site to a CMS (Content Management System) that can be easily referenced and searched, we decided to use this opportunity to upgrade fasterdata as the first experiment with the new CMS.
Check out http://fasterdata.es.net and see what you think. “Some of the content and formatting is still not finalized, but I think you’ll find this a huge improvement over the old site”, said Brian Tierney, the author of much of the site’s content. There are over 85 pages of information and advice. It was a big job to rework all of the old site content but we are glad to say that it is completed, and posted for everyone to access and use. We anticipate later upgrades, but the new CMS will allow us to update information quickly, and make the site far easier to locate pages of interest than in the past.
This site gets over 3000 hits/week from all over the world, and is used by folks in all industries and R&E to improve their network performance and troubleshoot problems. Try it out and let Brian know your opinion at firstname.lastname@example.org.
Our take on ANI, OSCARS, perfSONAR, and the state of things to come.
In 2010 ESnet led the technology curve in the testbed by putting together a great multi-layer design, deploying specially tuned 10G IO Testers, became early investors in the Openflow protocol by deploying the NEC switches, and built a research breadboard of end-hosts leveraging open-source virtualization and cloud technologies.
The first phase of the ANI testbed is concluding. After 6+ months of operational life, with exciting research projects like ARCHSTONE, Flowbench, HNTES, climate studies, and more leveraging the facilities, we are preparing to move the testbed to its second phase on the dark fiber ring in Long Island. Our call for proposals that closed October 1st garnered excellent ideas from researchers and was reviewed by the academic and industry stalwarts in the panel. We are tying up loose ends as we light the next phase of testbed research.
This year the OSCARS team has been extremely productive. We added enhancements to create the next version (0.5.3) of currently production OSCARS software, progressed on architecting and developing a highly modular and flexible platform for the next-generation OSCARS (0.6), a PCE-SDK targeted towards network researchers focused on creating complex algorithms for path computation, and developing FENIUS to support the GLIF Automated GOLE demonstrator.
Not only did the ESnet team multitask on various ANI, operational network and OSCARS deliverables, it also spent significant time supporting our R&E partners like Internet2, SURFnet, NORDUnet, RNP and others interested in investigating the capabilities of this open-source software. We also appreciate Internet2’s participation by dedicating testing resources for OSCARS 0.6 starting next year to ensure a thoroughly vetted and stable platform during the April timeframe. This is just one example of the accomplishments possible for the R&E community by commiting to partnership and collaboration.
perfSONAR kept up its rapid pace of feature additions and new releases in joint collaboration with Internet2 and others. In addition to rapid progress in software capabilities, ESnet is aggressively rolling out perfSONAR nodes in its 10G and 1G POPs, creating an infrastructure where the network can be tuned to hum. With multiple thorny network problems now solved, perfSONAR has proven to be great tool delivering value. This year we focused on making perfSONAR easily deployable and adding the operational features to transform it into a production service. An excellent workshop in August succinctly captured the challenges and opportunities to leverage perfSONAR for operational troubleshooting and also by researchers in understanding further how to improve networks. Joint research projects continue to stimulate further development with a focus on solving end-to-end performance issues.
Life in technology tends to be interesting, even though people keep warning about the commoditization of networking gear. The focus area for innovation just shifts, but never goes away. Some areas of interest as we evaluate our longer term objectives next year:
Enabling the end-to-end world: What new enhancements or innovations are needed to deploy performance measurement, and control techniques to enable a seamless end-to-end application performance?
Life in a Terabit digital world: What network innovations are needed to fully exploit the requirement for Terabit connectivity between supercomputer centers in the 2015-2018 timeframe?
Life in a carbon economy: What are the low-hanging fruit for networks to become more energy-efficient and/or enable energy-efficiency in the IT ecosystem they play? Cloud-y or Clear?
ESnet’s Jon Dugan will lead a Bof on network measurement 12:15, Thurs in room 278-279 at SC10. Functional networks are critical to high performance computing, but to achieve optimal performance, it is necessary to accurately measure networks. Jon will open up the session to discuss ideas in measurement tools such as perfSONAR, emerging standards, and the latest in research directions.
This year’s conference, SC10 is in New Orleans, LA, from November 13 until November 19. Planning for each year’s show starts a few years ahead of time. Not long after one year’s show ends, the serious planning for the next year’s SCinet begins. It’s a cycle that I’ve been through many times now, and it’s a bit like an old friend at this point. Most of the time we enjoy each other’s company immensely but when things get stressful, we can really irritate each other.
SCinet is a pretty amazing network. After a year of planning, there are three weeks of concentrated effort to set it up. It’s operational for about one week and it takes about two days to tear down. This year we will have 270 Gbps of wide area network connectivity with dedicated circuits to ESnet, Internet2, NLR, and NOAA. We will deliver over 200 network connections to the various booths on the show floor.
As amazing as the network is, the people who build it are even more amazing. They are drawn from universities, national laboratories, network equipment vendors and nationwide research and education networks. It’s not just Americans; there are people from several different countries with strong showings from the Netherlands and Germany most years. Many of these folks are leaders in their areas of expertise and all of them are bright, capable people. Each of them has given up a fair bit of their own time to participate (while most have some sponsorship from their employers, it’s not unheard of for people to take vacation time to participate).
Why would people give up many evenings and weekends every fall to be a part of SCinet? Because it’s an amazing opportunity to learn about the state of the art in computer networking and to expand your professional network as welll. I consider myself extremely fortunate to work with each of the people that make up SCinet.
So what is ESnet doing for with SCinet this year? Glad you asked. First off, we are bringing three 10G circuits to the show floor. As of Friday, October 29th all three were up and operational. One of these circuits will be used for general IP traffic, but the other two will be used to carry dynamic circuits managed by the OSCARS IDC software.
These circuits will provide substantial support for various demonstrations by exhibitors, connecting scientific and computational resources at labs and universities to the booths on the show floor.
Finally, ESnet has four people who are volunteering in various capacities within SCinet. Evangelos Chaniotakis and myself will be working with the routing team. The routing team provides IP, IPv6, IP multicast service, manages the wide area peerings, manages wide area layer 2 circuits, configures the interfaces that face the booths on the show floor and works closely with several other teams to provide a highly scalable and reliable network. John Christman is working with the fiber team, building the optical fiber infrastructure to support SCinet (all booth connections are delivered over optical fiber, which allows booths to be connected to the network using the highest-speed interfaces available.) Brian Tierney will be working with the SCinet measurement team collecting network telemetry, and using it to provide useful and meaningful visualizations of what’s happening inside SCinet as well as providing tools and hosts to allow making active network measurement such as Iperf, nuttcp, and OWAMP. The measurement data is also made accessible using the perfSONAR suite of tools. They’re also using the SNMP polling software I wrote for ESnet called ESxSNMP.
Important spots to visit:
If you are coming to SC10 this year, be sure to come by the SCinet NOC in booth 3351. I’d be happy to meet anyone who’s read this; feel free to ask for me at the SCinet help desk at the same booth. LBNL (ESnet’s parent organization) is located in booth 2448. Finally, I am hosting a Bird’s of a Feather (BOF) session on network measurement during the show, the details are here.
And check out the other ESnet demos: You can download a map of ESnet at SC10: SC 2010_floormapFL
LBNL Booth 2448, ESnet roundtable discussions
Inder Monga, Advanced Network Technologies Group, ESnet, will lead a roundtable discussion on: On-demand Secure Circuits and Advance Reservation System (OSCARS), 1-2 p.m., Wednesday, Nov. 17
Many of the demos at SC10 are being carried by OSCARS virtual circuits developed by ESnet with DOE support. OSCARS enables networks to reserve and schedule virtual circuits that provide bandwidth and service guarantees to support large-scale science collaborations. In the first quarter of 2011, ESnet expects to unveil OSCARS 0.6, which will offer vastly expanded capabilities, such as a modular architecture allowing for easy plug and play of the various functional modules and a flexible path computation engine (PCE) workflow architecture. Adoption of OSCARS has been accelerating as 2010 has seen deployments at Internet2 and other domestic and international research and education networks. Since last year, ESnet saw a 30% increase in the use of virtual circuits. OSCARS virtual circuits now carry over 50% of ESnet’s monthly production traffic. Increased use of virtual circuits was a major factor enabling ESnet to easily handle a nearly 300% rise in traffic from June 2009 to May 2010.
Brian Tierney, Advanced Network Technologies Group, ESnet, will lead a roundtable discussion on: ARRA-funded Advanced Networking Initiative (ANI) Testbed, 2- 3 p.m. Wednesday, Nov. 17
The research and education community’s needs for managing and transferring data are exploding in scope and complexity. In 2009 the DOE Office of Science awarded ESnet $62 million in Recovery Act funds to create the Advanced Networking Initiative (ANI). This next-generation, 100 Gbps network will connect DOE’s largest unclassified supercomputers. ANI is also establishing a high performance, reconfigurable network testbed for researchers to experiment with advanced networking concepts and protocols. ESnet has now opened the testbed to researchers. A variety of experiments pushing the boundaries of current network technology are underway. Another round of proposals are in the offing. The testbed will be moving from Lawrence Berkeley National Laboratory to ESnet’s dark fiber ring at Long Island (LIMAN: Long Island Metropolitan Area Network) in January 2011 and eventually the 100 Gbps national prototype network ESnet is building to accelerate deployment of 100 Gbps technologies and provide a platform for the DOE experimental facilities at Oak Ridge National Laboratory and the Magellan resources at at the National Energy Research Scientific Computing Center (NERSC) and Argonne National Laboratory.
The perfSONAR collaboration has just announced the release of version 3.2 of the pS Performance Toolkit. The latest version of perfSONAR-PS is available at http://psps.perfsonar.net/toolkit/
“perfSONAR is critical to helping our constituents achieve acceptable network performance for scientific applications” commented Eli Dart, network engineer at ESnet, who uses perfSONAR routinely. “The continued progress by the perfSONAR collaboration in developing and deploying practical test and measurement infrastructure is helping our scientists conduct critical research collaborations in many areas including climate, energy, biology and genomics.”
Release 3.2 offers features including:
– – Adoption of the CentOS Linux platform
– – Improved admin GUIs to guide the user through setup and maintenance of the host
– – Updated versions of perfSONAR-PS monitoring software
– – Updated versions of performance software including OWAMP, BWCTL, NDT, NPAD, Cacti, and Iperf
– – Stability and bugfix based improvements
– – Ability to install directly to the hard disk
“Large-scale science pushes the limits of computing, storage and networking” said ESnet’s Chris Tracy, who is actively engaged in ongoing testing of trans-Atlantic circuits with US LHCNet for use by the Large Hadron Collider experiments. “perfSONAR is used daily in monitoring the network infrastructure that supports the LHC experiments, and perfSONAR provides the go-to toolset for finding, characterizing, and locating performance problems so that they can be fixed and the science can proceed unhindered.”
At the BES requirements workshop that I led last week in Washington D.C. for scientists
and program managers, I saw a significant result of the impending data explosion that will be produced by the next generation of light sources and instruments at BES facilities.
The sheer quantity of data involved is going to completely change the scientific process for the scientists who use them. The current model for data transport used by most scientists at light sources does not use networks at all – scientists travel to the light source, run their experiments, and travel home with a USB hard drive loaded with a few hundred gigabytes of data (perhaps a terabyte or two, but even that is tractable with portable media). This model has worked well for this community for years.
However, as instruments are upgraded with new detectors and as new data analysis methods are employed, data sets are going to increase in size by up to a factor of 100 over the next few years- – scientists that might carry home 700 gigabytes of data today will need to move 70 terabytes in the near future. I don’t know about
you, but my briefcase isn’t up to the task.
Data will have to be transferred home over the network, or scientists will have to perform the computational analysis on site at the facilities. Other options include streaming data to supercomputer centers for real time or semi-real time analysis. Whatever happens, the scientists will need more from their networks and from the
systems connected to them.
The increase in data as instrumentation capacity improves will mean a significant change in the science process for these communities. Transferring data will require network capacity upgrades at the scientific facilities and the laboratories that support them, as well as network test and measurement tools such as perfSONAR.
ESnet is ready to help, with pilot projects already underway.