100G DTN Experiment: Testing Technologies for Next-Generation File Transfer

ESnet has recently completed an experiment testing high-performance, file-based data transfers using Data Transfer Nodes (DTNs) on the 100G ESnet Testbed. Within ESnet, new ways to provide optimized, on-demand data movement tools to our network users are being prototyped. One such potential new data movement tool is offered by Zettar, Inc. Zettar’s “zx” product integrates with several storage technologies with an API for automation. This ESnet data movement experiment allowed us to test the use of tools like zx on our network. 

Two 100Gbps capable DTNs were deployed on the ESnet Testbed for this work, each with 8 x NVMe SSDs for fast disk-to-disk transfers, and connected using an approximately 90ms round trip time network path.  As many readers are aware, this combination of fast storage and fast networking requires careful tuning from both a file I/O and network protocol standpoint to achieve expected end-to-end transfer rates, and this evaluation was no exception. With the help of a storage throughput baseline achieved using the freely available elbencho tool, a single tuning profile for zx was found that struck an impressive performance balance when moving a sweep of hyperscale data sets (>1TB total size or >1M total files or both, see figure below) between the testbed DTNs.

A combined line chart with the measured storage throughput for each file size (blue line), together with both the Zettar zxtransfer data rates attained with a single run carried out by Zettar (orange line), and the average of five runs carried out by ESnet (green line)

To keep things interesting, the DTN software under evaluation was configured and launched within Docker containers to understand any performance and management impacts, and to establish a potential use case for more broadly deploying DTNs as-a-Service using containerization approaches. Spoiler: the testing was a great success! When configured appropriately, our evaluation has shown that modern container namespaces using performance-oriented Linux networking impart little to no impact on achievable storage and network performance at the 100Gbps scale while enabling a great deal of potential for distributed deployment of DTNs.  More critically, the problem of service orchestration and automation becomes the next great challenge when considering any large-scale deployment of dynamic data movement endpoints.

Our takeaways:

  • When properly provisioned and configured, a containerized environment has a high potential to provide an optimized, on-demand data movement service.
  • Data movers such as zx demonstrate that when modern TCP is used efficiently to move data at scale and speed, network latency becomes less of a factor – the same level of data rates are attainable over LAN, Metro, and WAN as long as packet loss rates can be effectively kept low
  • Finally, creating a holistic data movement solution demands integrated consideration of storage, computing, networking, and highly concurrent and intrinsically scale-out data mover software that incorporates a proper understanding of the variety in data movement scenarios.

For more information, a project report detailing the testing environment, performance comparisons, and best practices may be found here