Charting a resilient path for the future

The ESnet Site Coordinating Committee (ESCC) meeting on 22-23 October was attended by over 50 members representing all of the major DOE sites and projects supported by our team. This was the first ESCC meeting held via Zoom.

The meeting focused on network resiliency, both on lessons learned from adapting to working from home, as well as longer term plans for ESnet6. 

Highlights of the sessions were ESnet’s director Inder Monga’s presentation on the ways we ensured operational continuity during the pandemic. 

Inder Monga presents on ESnet's support for the DOE complex during the pandemic
Inder Monga presents on ESnet’s support for the DOE complex during the pandemic

The DOE ESnet program manager, Ben Brown, provided a vision for future research opportunities as well as future operational needs for the nation’s scientific complex.

DOE's Ben Brown presents on future objectives and priorities
DOE’s Ben Brown presents on future objectives and priorities

Attendees identified several key activities for ESCC collaboration as part of advancing shared resilience goals. Foremost among these is the creation of a working group to develop improved metrics for ESnet resilience, to identify ways that resilience features can be better incorporated into infrastructure funding and planning, and to establish better ways to engage scientific programs into risk management processes.

ESnet thanks ESCC participants for attending and we look forward to returning to in-person ESCC meetings in future!

A sample of attendees zoom screenshots - the first ESCC via Zoom
A sample of attendees – the first ESCC via Zoom

Zeekurity at ESnet

Zeek is an open source network security monitoring software extensively used by ESnet.  Zeek (formally called Bro) was initially developed by researchers at Berkeley Lab. Zeek allows users to identify & manage cyber threats by monitoring network traffic. It acts as a passive monitoring software (NSM – Network Security Monitor), that gives a holistic view of what is transpiring in the network and gives visibility into the network traffic. 

In order to better understand network behavior and provide flexible security services, we use Zeek as an important part of our data center security architecture and are experimenting with placing Zeek clusters on various WAN high value points. This is providing technical insights as well as significant challenges. 

In this post we would present some of our efforts in approaching the WAN security using Zeek for network monitoring, with successes and challenges hit during the process and interesting things learned.

Zeek on the ESnet LAN:

Monitoring local area and data center networks is a familiar and less complex network traffic monitoring design, and ESnet is no different. The traffic flowing through the LAN networks is currently monitored using two Zeek clusters, one at Brookhaven National Lab and another for the west coast at Berkeley Lab. We have implemented BHR (black hole routing) functionality on our data center routers to block external actors which violate our established policies based on Zeek detections on both IPv4 and IPv6 protocol stacks. 

Apart from network security monitoring using “standard” Zeek detection scripts, many enhancements and custom scripts written by the ESnet Security team members serve a vital role in detecting various kinds of suspicious activity. Recently, a Zeek package – Zeek-Known-outbound contributed by Michael “Dop” Dopheide won the first prize in the Zeek Package Contest-2 held in May 2020. The package provides the ability to track and alert on outbound service usage to a list of ‘watched’ countries, and also adds the country codes for the origin and recipient hosts in one of the log files that Zeek generates called conn.log, to log all the connection attempts seen on the network. The motivation behind this work came from the discovery of few systems contacting hosts in foreign countries for package updates, and DNS services found during routine log analysis. 

Zeek on the ESnet WAN:

To augment our LAN efforts on a wider scale, we have been experimenting with monitoring the network traffic on the WAN side of the network using Zeek in order to get more visibility and to provide improved security/network services. Most of this work is experimental: iterative design changes as we use what we learn from stage 1 to stage 3 and beyond.

  • Some notable differences and challenges from typical LAN network: 
    • Data Volume: There are a large number of WAN links that run at 1-400Gb/s
    • Data Encapsulation: Data with variable length headers is problematic, so we have been employing a load balancer to address this problem. 
    • Asymmetric Data Flows: This is a hard problem to solve, especially when the network is distributed across the country. When the packets corresponding inbound and outbound flows between two network nodes follow different paths, it can be challenging to reconcile conversation activities as part of network monitoring.
    • Technical Integration: Coordinating activities between teams distributed geographically  introduces challenges, which we are developing ways to overcome.

At ESnet we thrive to push the boundaries and try innovative ways to address challenges, Zeek on the WAN is an example of that and in my next article I will discuss some ways we have been experimenting with to address above noted complex problems and specifically going into details of the research been done in addressing Asymmetric Data Flows on WAN.