Three questions with a new staff member –James Kafader with Software Engineering.

Please welcome James Kafader to ESnet! James comes to us from Internet Archive (IA), where he worked on the Archive-It team, which develops and maintains a turnkey archiving platform. Archive-It partners with external institutions and national libraries to capture data on their behalf. It is essentially the project incubator at IA and focused on high-quality and large-scale archiving. The data collected by Archive-It represents about 30% of the available captures in the global wayback machine.

Question 1: What brought you to ESnet?

In 2020, I spent a lot of time thinking about the interconnectedness of natural systems, and how they relate to the earth’s climate. It strikes me that it’s imperative, as a planet and nation, to focus on reducing the impact of climate change in short order. This line of thinking led me to dedicate my time to science, which could have a positive impact on the global climate.

Question 2: What is the most exciting thing going on in your field right now?

This is a good question. I consider myself very much a generalist in terms of how I approach software development, as well as in my overall view of reality. My view of computational systems is very conservative as well — I like to understand the algorithms involved with any new technology as intimately as possible before selecting it for use. I’d say in many ways that the most exciting thing going on in my field is renewed interest in how large-scale systems affect equitability for their participants; that is, how the networks, systems, and structures that we build affect outcomes for each of us.

Question 3: What book would you recommend?

I recently read Breath by James Nestor. It was an engaging read and helped a lot with my mood and stability, if not the most scientifically accurate thing I’ve ever read. Another favorite is Difficult Conversations by Sheila Heen, Douglas Stone, and Bruce Patton.

Creating the Tokamak Superfacility: Fusion with the ScienceDMZ

5.5 Questions with Eli Dart (ESnet), C.S. Chang, and Michael Churchill (PPPL)

In 2025, when the International Thermonuclear Experimental Reactor (ITER) generates “first plasma”, it will be the culmination of almost 40 years of effort.  First started in 1985, the project has grown to include the scientific talents of seven members (China, EU, India, Japan, Korea, Russia, and the US, with EU membership bringing the total to 35 countries) and if successful, will mark the first time that a large scale fusion reactor generates more thermal power than is used to heat isotopes of hydrogen gas to a plasma state.

ESnet is supporting this international scientific community as this dream of limitless, clean energy is pursued. When operational at full capacity, ITER will generate approximately a petabyte-per-day of data, much of which will need to be analyzed and fed back in near real-time to optimize the fusion reaction and manage distribution of data to a federated framework of geographically distributed “remote control rooms” or RCR.  To prepare for this demanding ability to distribute both data and analytics, recently ESnet’s Eli Dart and the Princeton Plasma Physics Laboratory’s (PPPL) Michael Churchill and  C.S. Chang were co-authors on a test exercise performed with collaborators at Pacific Northwest National Laboratory (PNNL), PPPL, Oak Ridge National Laboratory (ORNL), and with the Korean KREONET, KSTAR, National Fusion Research Institute, and the Ulsan National Institute of Science and Technology. This study ( successfully demonstrated the use of ESnet and the ScienceDMZ architecture as part of trans-Pacific large data transfer, and near real-time movie creation and analysis of the KSTAR electron cyclotron emission images, via links between multiple paths at high sustained speeds.

Q 1: This was a complex test, involving several sites and analytic workflows.  Can you walk our readers through the end-to-end workflow? 

End-to-end workflow of the demonstration comparing real-time streaming data from the KSTAR ECEI diagnostic to side-by-side movie from XGC1 gyrokinetic turbulence code.
End-to-end workflow of the demonstration comparing real-time streaming data from the KSTAR ECEI diagnostic to side-by-side movie from XGC1 gyrokinetic turbulence code.

Eli Dart: The data were streamed from a system at KSTAR, encoded into ADIOS format, streamed to PPPL, rendered into movie frames, and visualized at PPPL. One of the key attributes of this workflow is that it is a streaming workflow. Specifically, this means that the data passes through the workflow steps (encoding in ADIOS format, transfer, rendering movie frames, showing the movie) without being written to non-volatile storage. This allows for performance improvements, because no time is spent on storage I/O. It also removes the restriction of storage allocations from the operation of the workflow – only the final data products need to be stored (if desired). 

Q 2: A big portion of this research supports the idea of federated, near real-time analysis of data.  In order to make these data transfers performant, flexible, and adaptable enough to meet the requirements for a future ITER RCR, you had to carefully engineer and coordinate with many parties.  What was the hardest part of this experiment, and what lessons does it offer ITER?

Eli Dart: It is really important to ensure that the network path is clean. By “clean” I mean that the network needs to provide loss-free IP service for the experiment traffic. Because the fusion research community is globally distributed, the data transfers cover long distances, which greatly magnifies the negative impact of packet loss on transfer performance. Test and measurement (using perfSONAR) is very important to ensure that the network is clean, as is operational excellence to ensure that problems are fixed quickly if they arise. KREONET is an example of a well-run production network – their operational excellence contributed significantly to the success of this effort.

Q 3: One of the issues you had to work around was a firewall at one institution.  What was involved in working with their site security, and how should those working with Science DMZ work through these issues?

Eli Dart: Building and operating a Science DMZ involves a combination of technical and organizational work. Different institutions have different policies, and the need for different levels of assurance depending on the nature of the work being done on the Science DMZ. The key is to understand that security policy is there for a reason, and to work with the parties involved in the context that makes sense from their perspective. Then, it’s just a matter of working together to find a workable solution that preserves safety from a cybersecurity perspective and also allows the science mission to succeed. 

Q 4: How did you build this collaboration and how did you keep everyone on the same page, any advice you can offer other experiments facing the same need to coordinate multi-national efforts?

Eli Dart: From my perspective, this result demonstrates the value of multi-institution, multi-disciplinary collaborations for achieving important scientific outcomes. Modern science is complex, and we are increasingly in a place where only teams can bring all the necessary expertise to bear on a complex problem. The members of this team have worked together in smaller groups on a variety of projects over the years – those relationships were very valuable in achieving this result.

Q 5: In the paper you present a model for a federated remote framework workflow. Looking beyond ITER, are there other applications you can see for the lessons learned from this experiment?

C.S. Chang: Lessons learned from this experiment can be applied to many other distributed scientific, industrial, and commercial applications which require collaborative data analysis and decision making.  We do not need to look too far.  Expensive scientific studies on exascale computers will most likely be collaborative efforts among geographically distributed scientists who want to analyze the simulation data and share/combine the findings in near-real-time for speedy scientific discovery and for steering of ongoing or next simulations.  The lessons learned here can influence the remote collaboration workflow used in high energy physics, climate science, space physics, and others.

Q 5.5: What’s next? You mention quite a number of possible follow on activities in the paper? Which of these most interest you, and what might follow?

Michael Churchill: Continued work by this group has led to the recently developed  open-source Python framework, DELTA, for streaming data from experiments to remote compute centers, using ADIOS for streaming over wide-area networks, and on the receiver side using asynchronous Message Passing Interface to do parallel processing of the data streams. We’ve used this for streaming data from KSTAR to the NERSC Cori supercomputer and completing a spectral analysis in parallel in less than 10 minutes, which normally in serial would take 12 hours. Frameworks such as this, enabling connecting experiments to remote high-performance computers, will open up the quality and quantity of analysis workflows that experimental scientists can run. It’s exciting to see how this can help accelerate the progress of science around the world.

Congratulations on your success! This is a significant step forward in building the data management capability that ITER will need.  

Graduate students publish on network telemetry with ESnet

Two graduate students working with ESnet have published their papers recently in IEEE and ACM workshops.

Bibek Shrestha, a graduate student at the University of Nevada, Reno, and his advisor Engin Arslan worked with Richard Cziva from ESnet to publish a work on “INT Based Network-Aware Task Scheduling for Edge Computing”. In the paper, Bibek investigated the use of in-band network telemetry (INT) for real-time in-network task scheduling. Bibek’s experimental analysis using various workload types and network congestion scenarios revealed that enhancing task scheduling of edge computing with high-precision network telemetry can lead up to a 40% reduction in data transfer times and up to 30% reduction in total task execution times by favoring edge servers in uncongested (or mildly congested) sections of the network when scheduling tasks. The paper will appear in the 3rd Workshop on Parallel AI and Systems for the Edge (PAISE) co-conducted with IEEE IPDPS 2021 conference to be held on May 21st, 2021, in Portland, Oregon. 

Zhang Liu, a former ESnet intern and a current graduate student at the University of Colorado at Boulder, worked with the ESnet High Touch Team – Chin Guok, Bruce Mah, Yatish Kumar, and Richard Cziva – on fastcapa-ng, ESnet’s telemetry processing software. In the paper “Programmable Per-Packet Network Telemetry: From Wire to Kafka at Scale,” Zhang showed the scaling and performance characteristics of fastcapa-ng, and highlighted the most critical performance considerations that allow the pushing of 10.4 million telemetry packets per second to Kafka with only 5 CPU cores, which is more than enough to handle 170 Gbit/s of original traffic with 1512B MTU. This paper will appear in the 4th International Workshop on Systems and Network Telemetry and Analytics (SNTA 2021) held at the ACM HPCD 2021 conference in Stockholm, Sweden between 21-25 June 2021.

Congratulations Bibek and Zhang!

If you are a networked systems research student looking to collaborate with us on network measurements, please reach out to Richard Cziva. If you are interested in a summer internship with ESnet, please visit this page.

Arecibo Data Recovery: Behind the Scenes with Jason Zurawski

The dramatic collapse of the Arecibo Observatory Radio Telescope in Puerto Rico in December was a terrible loss for global science. The 305-meter dish had served for over 50 years, supporting a wide range of cosmic and earth science applications, including transmission of the famed “Arecibo Message” to globular star cluster M13 by a team led by Frank Drake and Carl Sagan in 1974.

When the 900-ton instrument platform crashed onto the observatory dish, the National Science Foundation was faced with a variety of challenges. Most immediately, how to ensure that several petabytes of historic (and now irreplaceable) data at the Arecibo Observatory (AO) data center, in the form of tapes, hard drives, and other physical media, could be preserved and moved off-site as an approximately $50M site cleanup and environmental remediation project begins to demobilize the iconic observatory.

This data recovery effort has required rapid mobilization of a team from the University of Central Florida (UCF), the Texas Advanced Computing Center  (TACC), the University of Puerto Rico (UPR), the University of Chicago, and others. A more detailed description of this overall effort has just been released here. In this blog, I will describe the key role that ESnet and the Engagement and Performance Operations Center (EPOC) played in this effort to save valuable scientific data.

My colleagues Hans Addleman (Indiana University International Networks), George Robb, and I became part of science use case discussions with AO and UCF as part of an ESnet requirements review and EPOC Deep Dive support to Arecibo in early 2020. In the summer of 2020, these efforts became much more active after the first suspension cable failed and AO began activities to migrate data storage and processing to a commercial cloud. We provided support to the Arecibo team for data movement hardware and software deployment.

With the failure of a second cable in November 2020, it became apparent that the facility had become unstable; this increased pressure on the team to find a faster solution. The UCF site management team decided that migration to the commercial cloud over the available 1Gbps connection (a previous 10Gbps connection was damaged by Hurricane Maria in 2017) would not meet requirements, so another data migration strategy was needed.

By December the  team developed an alternative data migration approach leveraging a timely offer of storage capacity at TACC. Because of the urgency, the team decided to move data using physical Network Attached Storage (NAS) appliances; data on tapes and other original sources were loaded onto the NAS at Arecibo. The NAS were then driven to data centers on the island: either at the UPR campuses located at Mayagüez or Río Piedras, or at a commercial data center on the island, each of which were connected to the global R&E network at 10Gbps. Using Globus data transfer software, the AO team then began the process of transferring the data to TACC. Using multiple devices, and by setting up a constantly moving supply line, they were able to fill a disk, transport to the better connected locations, start a transfer, take back a completed disk, and return to the AO data center to start the process all over. 

EPOC team members (specifically George) spent a lot of time working with AO and Globus technical support to tune the NAS appliances (which are usually used in commercial/enterprise settings) to be able to transfer the data at higher rates of performance than the factory settings allowed. EPOC, AO, UCF, TACC, and UPR staff also ran perfSONAR tests to ensure the entire path was able to deliver on these faster speeds that were necessary. George will be presenting a talk at Globus World in May, and those interested in more information about how this networking and disk NAS tuning was done should plan to attend.

The data transfer operation started in late December 2020 and is expected to continue through the spring of 2021, as stored data (on disks and tapes at AO) is transferred to TACC. As data flows into TACC’s storage cluster from Arecibo’s holdings, ESnet and the entire collaboration team will ensure that it is made widely available to the scientific community to perform new studies with this valuable research data. 

The destruction of the AO Radio Telescope was a catastrophe for global science; however, the quick response of the entire data recovery team helped prevent the loss of much of the valuable data collected by Arecibo over its lifetime. I’m very proud of this accomplishment: the work of the entire ESnet team and our data infrastructure ensured that we had the right capabilities at the right time to make a difference for science.

Three Questions with Joseph Nasal

Three questions with a new staff member!  Today, Joseph (Joe) Nasal, who has joined our Business Office as a Project Manager.

After graduating from Temple University, Joe began his career designing broadband Radio Frequency-hybrid fiber networks and management software for some of the first residential cable modem deployments in the country.  Early on, he also worked in defense and designed and operated private secure communications networks for federal contractors.  He spent the past two decades supporting higher education through roles in engineering, technical architecture, project management, and leadership. His work helped transform data communication at Pennsylvania State University, preparing the campus for tremendous growth in teaching and research. 

What brought you to ESnet?

I’ve been architecting and managing very large communication network design and implementation projects for most of my career.  After nearly 20 years at Penn State, it was time for a career change.  One of my close colleagues recently came to ESnet in support of Science Engagement, and when I learned through him of an opportunity to help with such exciting and important growth on a national scale I was very happy to find a place in the organization.  I’ll be operating out of my home office in State College, PA.

What is the most exciting thing going on in your field right now?

In data communications, it’s about getting more for less—more throughput, more distance, more fidelity, for less cost.  Cost is measured in units like dollars, or time, or energy, or human effort, and those of us who work in this space are always trying to optimize these resources. This is an exciting time because it seems like we’re on the cusp of training machines to give us a magnitude leap forward in efficiencies via automated processes and learning algorithms. But it’s going to take clear human vision to get us to where we want to be, which means as engineers, we will continue to have fun solving big problems. 

What book would you recommend?

The Man Who Loved Only Numbers, a biography of Paul Erdős.  Paul was one of the great mathematicians of the 20th century whose work has implications for both computer science and information theory.  He was an eccentric genius and his personal story is a fascinating one to follow. As engineers, I think it’s important to be aware of and appreciate the great thinkers who exist at the very base level of abstraction with respect to the technologies we use and build upon. 

IPv6 past, present, future with Michael Sinatra and Nick Buraglio

In March 2020, the U.S. Government Office of Management and Budget (OMB) released a draft memo outlining a required migration to IPv6 only. Memorandum M-21-07 was made official on November 19, 2020. Among other things, this memo mandates that 80% of IP-enabled assets on Federal networks are operating in IPv6-only environments by the end of FY 2025.

ESnet is in the process of planning this transition now, to ensure that we provide our users with the support and resources they need to continue their work uninterrupted and unimpeded by the transition. Practically speaking, this means for ESnet that by 2025, all of our nodes will be transitioned to IPv6 address space, and we will not support dual-stacking with IPv4 and IPv6 addresses. 

Transitioning to an IPv6-only network has been over a quarter-century in the making for ESnet.  Here’s a look back at our history with IPv6

IPv6: Past and Present

ESnet’s history of helping to develop, support, and operationalize new protocols begins well before the advent of IPv6.  

In the early 1990s, Cathy Aronson, an employee of Lawrence Livermore National Laboratory working on ESnet, helped establish a production implementation and support plan for the Open Systems Interconnect (OSI) Connectionless-mode Network Service (CLNS) suite of network protocols. Crucially, Aronson developed a scalable network addressing plan that provided a model for the utilization of the kinds of massive address spaces that OSI CLNS and, later, IPv6 would come to use. CLNS itself was a logical progression from DECnet which had been embraced and supported by ESnet’s precursors (MFEnet and HEPnet).  

As the IPv6 draft standard (RFC2460) developed in the 1990s, ESnet staff created an operational support model for the new protocol. The stakes were high; if IPv6 were to succeed in supplanting IPv4, and prevent the ill effects of IPv4 address exhaustion, it would need a smooth roll-out. Bob Fink, Tony Hain, and Becca Nitzan spearheaded early IPv6 adoption processes, and their efforts reached far beyond ESnet and the Department of Energy (DOE).  The trio were instrumental in establishing a set of operational practices and testbeds under the auspices of the Internet Engineering Task Force–the body where IPv6 was standardized–and this led to the development of a worldwide collaboration known as the 6bone.  6bone was a set of tunnels that allowed IPv6 “islands” to be connected, forming a global overlay network.  More importantly, it was a collaboration that brought together commercial and research networks, vendors, and scientists, all with the goal of creating a robust internet protocol for the future.

Not only were Fink, Hain, and Nitzan critical in this development of what would become a production IPv6 network (their names appear on a number of IETF RFCs), they would also spearhead the adoption of the protocol within ESnet and DOE. In the summer of 1996, ESnet was officially connected to the 6bone; by 1999, the Regional Internet Registries had received their production allocations of IPv6 address space. Just one month later, the first US allocation of that space was made–to ESnet.  ESnet has the distinction of being the first IPv6 allocation from ARIN – assigned on August 3, 1999, with the prefix 2001:0400::/32

Nitzan continued her pioneering work, establishing native IPv6 support on ESnet, and placing what we believe was the first workstation on a production IPv6 network. This was part of becoming the first production network in North America to adopt IPv6 in tandem with IPv4 via the use of an IPv6 “dual-stack.” As US Government requirements and mandates developed in 2005, 2012, and 2014, the ESnet team met these requirements for increased IPv6 adoption, while also providing support and consultation for the DOE community. 

Although Aronson, Fink, Hain, and Nitzan have all moved on from ESnet, a new generation of staff continued the spirit of innovation and early adoption. In the early 2010s, ESnet’s internal routing protocols were consolidated around the use of multi-topology Intermediate System to Intermediate System or IS-IS. This allowed for the deployment of flexible and disparate IPv4 and IPv6 topologies, paving the way for the creation of IPv6-only portions of ESnet, allowing the use of optimized routing protocols for the entire network.  ESnet’s acquisition strategy has long emphasized IPv6 support and feature parity between IPv4 and IPv6.  

All IPv6: Switching over, and the future

As ESnet moves into ESnet6, it is well-positioned to build and expand an IPv6-only network, while retaining legacy support for IPv4 where needed. ESnet will soon finish a two-year project to switch our management plane entirely over to IPv6

For our customers and those connected to us, here’s what this means:

  • ESnet will be ready, willing, and able to support connectors, constituents, and partners in their journey to deploying IPv6-only across our international network. 
  • ESnet planning and architecture team members have been included in the Department of Energy Integration and Product Team (DOE IPT) for migration to IPv6-only, and are supporting planning and documentation efforts for the DOE Complex.
  • We look forward to supporting our customers and users, as we all make this change to IPv6 together.

Three Questions with John Hess

John comes to ESnet’s Network Engineering team from the Corporation for Education Network Initiatives in California (CENIC), operator of the State of California Research and Education Network (CALREN).  At CENIC, John was involved in a number of projects, including the Pacific Wave, the Pacific Research Platform, and participation in Global Network Advancement Group (GNA-G) teams exploring AutoGOLE/NSI and Data Intensive Science, as well as other collaborative efforts.  

Prior to CENIC, John worked as a network engineer for UC, as well as a systems engineer during a brief stint with Cisco.  Among his interests are interconnection protocols, network performance, and data movement.

What brought you to ESnet? 

Through my activities with CENIC and Pacific Wave, I had the opportunity to collaborate with colleagues at ESnet on a variety of projects. The ESnet folks with whom I have worked consistently impressed me with their depth of expertise, willingness to share their knowledge, and their commitment to advancing the interests of researchers and multidisciplinary, data-intensive initiatives involving the national labs and institutions across the broader R&E community.  I wanted to join and contribute as part of that team.      

What is the most exciting thing going on in your field right now?

Due to the COVID-19 pandemic, there is a more general realization and sense of urgency of the need to close the digital divide. I am excited about (further) democratizing access to technology — to under-served communities, to a more diverse set of potential researchers and contributors.  As much as the advances in basic research with HEP (high energy physics), Astro-Physics, Genomics, Earth Sciences, and other domains, I am excited about advancing pervasive access to technology. 

What book would you recommend?

Burch, David. Celestial Navigation, A Complete Home Study Course, Second Edition. Seattle, Starpath Publications, 2019. 

I began sailing as a kid with our family’s 13’ SunFish on a lake near our home in NJ.  This began what has become a life-long passion for sailing and the realization of a childhood dream of someday owning and living on a sailboat.   Though my boat is equipped with a state-of-the-art  navigation system, among my current dreams is to become conversant with celestial navigation and to complete ocean passages relying exclusively on this centuries-old  technology. 

Cyber Infrastructure Engineering Lunch & Learn Series Hits 100!

In 2017 ESnet, in collaboration with the National Science Foundation, created a series of  bi-weekly talks on network engineering and research engagement topics. These “Cyberinfrastructure (CI) Engineering Lunch & Learn” presentations, held every other  Friday afternoon at 2:00pm ET, have become an important way for engineers from the  research and education community to share technical best practices for deploying and operating laboratory and campus networks. It has also served as a social event for a common community of interest especially during the pandemic. 

A representative slide from Jason’s 4 May 2020 CI Lunch & Learn Talk on “TCP Basics and Science DMZ” — networking science with a healthy dose of LoLcats.

On March 12th, ESnet’s  Jason Zurawski  – who developed and still leads the events –  will convene the 100th CI Engineering Lunch and Learn.  A complete set of recordings of past sessions is available on the EPOC YouTube Channel located here. An anniversary is always a chance to look back on what has been accomplished; here are 5 Questions with Jason to get his thoughts on the Lunch and Learn series.

Thinking back over the past 100 talks, which have particularly stuck in your mind?

The best turn-out and feedback that I receive from the participants comes from either “hot topics” or engaging speakers.  

For instance, we have had a number of popular, well attended talks on the development of the BBR protocol (going as far back as 2017). .  Other sessions  that were well attended focused on topics like perfSONAR, Science DMZ, and Data Transfer; all of these are critical to building an effective and high performing cyberinfrastructure that supports data transfers in service of global science collaborations.

Other critical talks come from innovative and important voices from the R&E community.  Hyojoon Kim from Princeton talking about P4 and how it is used on their campus to facilitate network  research (, and the perfSONAR project’s use of new measurement protocols such as TWAMP ( are great examples of these kind of talks. As of last year, many of these folks would have given a talk ‘in person’ at a conference, but have not been able to do so due to the pandemic.  We have also done a number of tutorials and project updates that remain popular.  For example, tutorials by Fatema Bannat Wala on Zeek Use Cases  and by Alan Whinery with University of Hawaii on IPv6 Deployment, have been especially notable.

Have you seen a change in attendance or role for these CI events from before the pandemic and now?

We have seen moderate (10-15%) increases for both the live and recorded sessions  during the week of a talk. We have also seen a similar increase in subscription to our membership list since its inception in 2017. .  Some of the “tutorial” content has increased viewership over time – perhaps as the pandemic lets our audience review content from home, that they were not able to previously study due to a lack of time. This is a net positive, as it points to a general trend that it is easier/more desirable to watch a video on a topic (e.g. deploying software) versus reading documentation/following instructions.  

What makes for a successful CI talk?

Passion from the speaker is very important. We want to hear from community members that are excited about what they are presenting: a research project, a new operational component, or a problem they want to solve (or have solved). Speaking from experience is also valuable, as the audience wants to know deep technical details for most of the talks. 

What do you think has been the biggest challenge keeping this series going?

We’ve  always had  willing presenters, and to date, we are always able to schedule between 20 and 30 talks over the course of the year. The primary challenge is making sure we can continue to find fresh perspectives that hit on some core values:

  • Supporting the diversity of voices (gender, ethnicity, institutional background). When reflecting on the prior 100 talks, we unfortunately skewed strongly away from these diverse categories; this is a trend that must be reversed. Recruitment to address this is already underway for 2021 and beyond.  
  • Focusing on talks that address the needs of modern CI: operational best practices, policy choices, translation of research to production, etc.  
  • Ensuring our audience is growing. These talks assist in bringing new contributors up to speed vis a vis retirements and other attrition where knowledge may not be passed down to newcomers.

What do you think will be major themes in the next 100 hundred CI sessions?

A theme we have encouraged from the start is to share what we know, and acknowledge what we don’t know. We want to see the major institutions and facilities pass on the lessons they have fought hard to learn and implement so that campuses of  smaller size with limited CI  knowledge level can benefit. Similarly, we want those individuals that are not as experienced  to be vocal and ask (potentially hard) questions to the community to drive what needs to be presented and discussed.  

I believe that policy (e.g. long term care, maintenance, upgrades, sustainability) of CI will be an ongoing concern as we approach 10 or more years of operations for some facilities. Security is always a hot area, as the threats continue and adapt over time. Technology continues to evolve and upgrade rapidly, so hearing about the ‘latest and greatest’ will also drive content and speakers for the talks.

Jason, thank you for running the CI series, and all the hard work associated with keeping a regular technical exchange going like clockwork during a pandemic. I look forward to the next 100 CI Lunch & Learn!

Defending ESnet with ZoMbis!

Zeek is a powerful open source network security monitoring software extensively used by ESnet. Zeek (formally called Bro) was initially developed by researchers at Berkeley Lab; it allows users to identify & manage cyber threats by tracking and logging network traffic activity. Zeek operates as a passive monitor, providing a holistic view of what is transpiring in the network and on all network traffic. 

In a previous post, I presented some of our efforts in approaching the WAN security using Zeek for general network monitoring, with successes and challenges found during the process. In this blog post I’ll focus on our efforts in using Zeek as  part of security monitoring for the ESnet6 management network – ZoMbis (Zeek on Management based information system).

ZoMbis on the ESnet6 management network:

Most research and educational networks employ a dedicated management network as a best practice. The management network provides a configuration command and control layer, as well as conduits for all of the inter-routing communications between the devices used to move our critical customer data. Because of the sensitive nature of these communications, the management network needs to be protected from external and general user network traffic (websites, file transfers, etc.), and our staff needs to have detailed visibility on management network activity.

At ESnet, we typically use real IP addresses for all internal network resources, and our management network is allocated a fairly large address space block advertised in our global routing table, to help protect against opportunistic hijacking attacks. By isolating our management network from user data streams, the amount of routine background noise is vastly reduced making the use of Zeek, or any network monitoring security capability, much more effective. 

The above diagram shows an overview of the deployment strategy of Zeek on the ESnet6 management network. The blue dots in the diagram show the locations that will have equipment running Zeek instances for monitoring the network traffic on the management network. The traffic from the routers on those locations is mirrored to the Zeek instances using a spanning port, and the Zeek logs generated are then aggregated in our central security information and event logging and management system (SIEM).

Scott Campbell presented ‘Using Zeek in ESnet6 Management Network Security Monitoring’ during virtual Zeek Week held last year that explained the overall strategy for deployment of Zeek on the management network in greater detail. Some ZoMbis deployment highlights are:

ESnet 6’s new management network will use only IPv6. From a monitoring perspective this change from the traditional IPv4 poses a number of interesting challenges; In particular, IPv6 traffic employs more multicast and link-local traffic for local subnet communications. Accordingly, we are in the process of adjusting and adding to Zeek’s policy based detection scripts to support these changes in network patterns. These new enhancements and custom scripts being written by our cybersecurity team to support IPv6 will be of interest to other Zeek users and we will release them to the entire Zeek community soon. 

The set of Zeek policy created for this project can be broken out into two general groups. The first of these is protocol mechanics – particularly looking closer between layer 2 and 3 where there are a number of interesting security behaviors with IPv6.  A subset of notices that these protocol mechanic policies will provide are:

  • ICMP6_RtrSol_NotMulticat – Router solicitation not multicast
  • ICMP6_RtrAnn_NotMulticat – Router announcement should be a multicast request
  • ICMP6_RtrAnn_NewMac – Router announcement from an unknown MAC
  • ICMP6_MacIPChange – If the MAC <-> IP mapping changes
  • ICMP6_NbrAdv_NotRouter – Advertisement comes from non-router
  • ICMP6_NbrAdv_UnSolicit – Advertisement is not solicited
  • ICMP6_NbrAdv_OverRide – Advertisement without override
  • ICMP6_NbrAdv_NoRequest – Advertisement without known request

The second set of Zeek policies that have been developed in support for ZoMbis involves taking advantage of predictable management network behavioral patterns – we build policy to model anticipated behaviors and let us know if something is amiss. For example looking at DNS and NTP behavior we can identify unexpected hosts and data volumes, since we know which systems are supposed to be communicating with one another, and what patterns traffic between these components should follow.

Stay tuned for the part II of this blogpost, where I will discuss ways of using Sinkholing, together with ZoMbis, to provide better understanding and visibility of unwanted traffic upon the management network.

Three Questions with Karim Benbourenane

Three questions with a new staff member on our Software Engineering – Orchestration and Core Data Team!

Karim comes to us from Carnegie Mellon University, where he served as a Software Engineer in the Network Services group. In that role, he designed, implemented, deployed, and maintained numerous applications to provide support to the campus network infrastructure. He has worked on a diverse set of network computing problems with a focus on automation and self-service utilities. Karim is proficient in a multitude of application development stacks but has a special place in his heart for those that put Python in the mix.

Karim Benbourenane in the “Pre-Pandemic-Mask” era

What brought you to ESnet?

I’ve always had a profound curiosity for the intersection of mathematics, science, and technology. Starting with a strong foundation in mathematics, I learned how to better apply my problem-solving  skills by pursuing graduate work in computational biology. It was there that I discovered how next-generation computing technologies could radically transform and elevate entire scientific fields. I’ve been seeking to utilize the skills I’ve built up over my 15 years of industry experience to help build tools for scientists, to empower them, and help them achieve discoveries in a world that is becoming ever increasingly more complex. The work being done at ESnet lines up perfectly with this goal in mind.

What is the most exciting thing going on in software engineering right now?

I would say the rapid proliferation of containerization technologies and the use of cloud infrastructure for distributed computing problems, as well as advancements in machine learning libraries and toolkits that let scientists more easily simplify the manipulation and analysis of large datasets. Many of these concepts were in their infancy or early stages only a decade ago, and now they’re everywhere and I’m happy to see how fast they’ve been adopted.

What book would you recommend?

Time Travel in Einstein’s Universe by J. Richard Gott. An accessible read for laymen like me, about how one would — given some ridiculous assumptions — go about creating various time machines.