22 | November | 2021 | Light Bytes: Blogging for Science

A combined team from ESnet and Lehigh University was awarded the best paper for Exploring the BBRv2 Congestion Control Algorithm for use on Data Transfer Nodes at the 8th IEEE/ACM International Workshop on Innovating the Network for Data-Intensive Science (INDIS 2021), which was held in conjunction with the 2021 IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC21) on Monday, November 15, 2021.

The team was comprised of:

Brian Tierney, Energy Sciences Network (ESnet)
Eli Dart, Energy Sciences Network (ESnet)
Ezra Kissel, Energy Sciences Network (ESnet)
Eashan Adhikarla, Lehigh University

The paper can be found here. Slides from the presentation are here. In this Q+A, ESnet spoke with the award-winning team about their research — answers are from the team as a whole.

The paper is based on extensive testing and controlled experiments with the BBR (Bottleneck Bandwidth and Round-trip propagation time), BBRv2 and the Cubic Function Binary Increase Congestion Control (CUBIC) Transmission Control Protocol (TCP) Internet congestion algorithms. What was the biggest lesson from this testing?

BBRv2 represents a fundamentally different approach to TCP congestion control. CUBIC (as well as Hamilton, Reno, and many others) are loss-based, meaning that they interpret packet loss as congestion and therefore require significant network engineering effort to achieve high performance. BBRv2 is different in that it measures the network path and builds a model of the path – it then paces itself to avoid loss and queueing. In practical terms, this means that BBRv2 is resilient to packet loss in a way that CUBIC is not. This comes through loud and clear in our data.

What part of the testing was the most difficult and/or interesting?

We ran a large number of tests in a wide range of scenarios. It can be difficult to keep track of all the test configurations, so we wrote a “test harness” in python that allowed us to keep track of all the testing parameters and resulting data sets.

The harness also allowed us to better compare results collected over real-world paths to those in our testbed environments. Managing the deployment of the testing environment though containers also allowed for rapid setup and improved reproducibility.

You provide readers with links to great resources so they can do their own testing and learn more about BBRv2. What do you hope readers will learn?

We hope others will test BBRv2 in high-performance research and education environments. There are still some things that we don’t fully understand, for example there are some cases where CUBIC outperforms BBRv2 on paths with very large buffers. It would be great for this to be better characterized, especially in R&E network environments.

What’s the next step for ESnet research into BBRv2? How will you top things next year?

We want to further explore how well BBRv2 performs at 100G and 400G. We would also like to spend additional time performing a deeper analysis of the current (and newly generated) results to gain insights into how BBRv2 performs compared to other algorithms across varied networking infrastructure. Ideally we would like to provide strongly substantiated recommendations on where it makes sense to deploy BBRv2 in the context of research and educational network applications.