Last week ESnet and Berkeley Lab Computing Sciences sponsored a talk on network anomaly detection by Prasad Calyam, Ph.D., from the Ohio Supercomputer Center/OARnet, and The Ohio State University. For the last year and a half, Calyam has worked on a DOE funded perfSONAR-related project. He emphasized that accurate measurement is necessary to be able to troubleshoot in multiple domains and layers. Calyam’s group is developing metrics and adaptive performance sampling techniques to analyze network performance across multiple layers for better network status awareness, performance, and to determine optimal paths for large data sets.
To do active measurements you need intelligent sampling, Calyam says, but the types of measurements necessary are difficult to accomplish due to policies and other constraints.
Calyam’s group is currently developing two new perfSONAR tools:
- The OnTimeDetect Tool to detect network anomalies. The OnTimeDetect Tool leverages perfSONAR lookup services to query for projects or sites, and then using the perfSONAR Measurement Archives for data on the path of interest. The OnTimeDetect tool uses this data to accurately and rapidly detect anomalies as they occur.
- The OnTimeSample Tool does intelligent forecasting to manipulate and plan network infrastructure, and validating them with enterprise monitoring data from the E-Center portal led by Fermilab as well as ESnet’s over 60 perfSONAR deployments. The next step is to integrate the tools to present information in a more user friendly way. Useful network performance information can be collected from user logs, anomaly alerts, and measurement data from applications such as PerfSONAR, PingER, and ESxSNMP (ESnet’s SNMP database).
Active vs Passive Measurement
Calyam is also building a mechanism that will enable network infrastructure providers to continuously check networks using both active measurements of end- to-end performance, and passive measurements collected by instrumentation deployed at strategic points in a network.
According to Calyam, one cannot accurately assess network status without adaptive and random sampling techniques. Calyam’s group is trying to determine the optimal sampling frequency and distribution to monitor networks and to forecast and detect anomalies.
“You can do sampling in a particular domain that you control, but the real challenge is in multi-network domains controlled by multiple entities.” Calyam comments, “For this approach to work, you need a federation of ISPs to share network performance information.” For example the perfSONAR measurement federation (e.g. ESnet, Internet2 and GEANT) can share measurement topologies, policies, and measurement exchange formats for mutual troubleshooting.
Calyam’s group was the first to query perfSONAR measurements on 480 paths and 65 sites worldwide. The group has so far developed an adaptive anomaly detection algorithm, demonstrated a new adaptive sampling scheme and released a set of algorithms to the perfSONAR user and developer community. http://ontimedetect.oar.net. Tools are also available at the perfSONAR website.