Using perfSONAR to Find Network Anomalies

Last week ESnet and Berkeley Lab Computing Sciences sponsored a talk on network anomaly detection by Prasad Calyam, Ph.D., from the Ohio Supercomputer Center/OARnet, and The Ohio State University. For the last year and a half, Calyam has worked on a DOE funded perfSONAR-related project. He emphasized that accurate measurement is necessary to be able to troubleshoot in multiple domains and layers. Calyam’s group is developing metrics and adaptive performance sampling techniques to analyze network performance across multiple layers for better network status awareness, performance, and to determine optimal paths for large data sets.

To do active measurements you need intelligent sampling, Calyam says, but the types of measurements necessary are difficult to accomplish due to policies and other constraints.

Calyam’s group is currently developing two new perfSONAR tools:

  • The OnTimeDetect Tool to detect network anomalies. The OnTimeDetect Tool leverages perfSONAR lookup services to query for projects or sites, and then using the perfSONAR Measurement Archives for data on the path of interest.  The OnTimeDetect tool uses this data to accurately and rapidly detect anomalies as they occur.
  • The OnTimeSample Tool does intelligent forecasting to manipulate and plan network infrastructure, and validating them with enterprise monitoring data from the E-Center portal led by Fermilab as well as ESnet’s over 60 perfSONAR deployments. The next step is to integrate the tools to present information in a more user friendly way. Useful network performance information can be collected from user logs, anomaly alerts, and measurement data from applications such as PerfSONAR, PingER, and ESxSNMP (ESnet’s SNMP database).
Measuring the layers of network intelligence

Active vs Passive Measurement

Calyam is also building a mechanism that will enable network infrastructure providers to continuously check networks using both active measurements of end- to-end performance, and passive measurements collected by instrumentation deployed at strategic points in a network. 

According to Calyam, one cannot accurately assess network status without adaptive and random sampling techniques. Calyam’s group is trying to determine the optimal sampling frequency and distribution to monitor networks and to forecast and detect anomalies.

“You can do sampling in a particular domain that you control, but the real challenge is in multi-network domains controlled by multiple entities.” Calyam comments, “For this approach to work, you need a federation of ISPs to share network performance information.” For example the perfSONAR measurement federation  (e.g. ESnet, Internet2 and GEANT) can share measurement topologies, policies, and measurement exchange formats for mutual troubleshooting.

His group is working an application for scientists who are using instrumentation remotely and experiencing lag in data. Suitable sampling can indicate to users whether the lag in instrument control they are experiencing is due to lags in the movement of physical instrument components, or due to network latency.

However, perfSONAR cannot yet handle strict sampling patterns. It is engineered for ease of use, but that means a trade-off in sampling precision and sophistication.  Its current set of tools; ping, traceroute, owamp, and bwctl can potentially conflict with each other or when used concurrently along with any other active measurement tools. Calyam advocates a meta-scheduling function to control measurement tools, as well as new regulation policies and semantic priorities. His group is also building some model user interfaces, with a GUI tool, a twitter publishing API,  Google charts that perfSONAR uses as well as Graphite charts.

Calyam’s group was the first to query perfSONAR measurements on 480 paths and 65 sites worldwide. The group has so far developed an adaptive anomaly detection algorithm, demonstrated a new adaptive sampling scheme and released a set of algorithms to the perfSONAR user and developer community. http://ontimedetect.oar.net. Tools are also available at the perfSONAR website.