Site icon Light Bytes: Blogging for Science

Using perfSONAR to Find Network Anomalies

Last week ESnet and Berkeley Lab Computing Sciences sponsored a talk on network anomaly detection by Prasad Calyam, Ph.D., from the Ohio Supercomputer Center/OARnet, and The Ohio State University. For the last year and a half, Calyam has worked on a DOE funded perfSONAR-related project. He emphasized that accurate measurement is necessary to be able to troubleshoot in multiple domains and layers. Calyam’s group is developing metrics and adaptive performance sampling techniques to analyze network performance across multiple layers for better network status awareness, performance, and to determine optimal paths for large data sets.

To do active measurements you need intelligent sampling, Calyam says, but the types of measurements necessary are difficult to accomplish due to policies and other constraints.

Calyam’s group is currently developing two new perfSONAR tools:

Measuring the layers of network intelligence

Active vs Passive Measurement

Calyam is also building a mechanism that will enable network infrastructure providers to continuously check networks using both active measurements of end- to-end performance, and passive measurements collected by instrumentation deployed at strategic points in a network. 

According to Calyam, one cannot accurately assess network status without adaptive and random sampling techniques. Calyam’s group is trying to determine the optimal sampling frequency and distribution to monitor networks and to forecast and detect anomalies.

“You can do sampling in a particular domain that you control, but the real challenge is in multi-network domains controlled by multiple entities.” Calyam comments, “For this approach to work, you need a federation of ISPs to share network performance information.” For example the perfSONAR measurement federation  (e.g. ESnet, Internet2 and GEANT) can share measurement topologies, policies, and measurement exchange formats for mutual troubleshooting.

His group is working an application for scientists who are using instrumentation remotely and experiencing lag in data. Suitable sampling can indicate to users whether the lag in instrument control they are experiencing is due to lags in the movement of physical instrument components, or due to network latency.

However, perfSONAR cannot yet handle strict sampling patterns. It is engineered for ease of use, but that means a trade-off in sampling precision and sophistication.  Its current set of tools; ping, traceroute, owamp, and bwctl can potentially conflict with each other or when used concurrently along with any other active measurement tools. Calyam advocates a meta-scheduling function to control measurement tools, as well as new regulation policies and semantic priorities. His group is also building some model user interfaces, with a GUI tool, a twitter publishing API,  Google charts that perfSONAR uses as well as Graphite charts.

Calyam’s group was the first to query perfSONAR measurements on 480 paths and 65 sites worldwide. The group has so far developed an adaptive anomaly detection algorithm, demonstrated a new adaptive sampling scheme and released a set of algorithms to the perfSONAR user and developer community. http://ontimedetect.oar.net. Tools are also available at the perfSONAR website.

Exit mobile version