Arecibo Support Wins SC21 HPCwire Readers’ Choice Award!

Arecibo dish after the collapse

As part of a team spanning 15 government, academic, and industrial partners, the Engagement and Performance Operations Center (EPOC) – a collaboration between Indiana University and ESnet – was awarded the “Best HPC Collaboration (Academia/Government/Industry)” HPCwire Readers’ Choice award on Tuesday, Nov. 16. The award, which was made at the High Performance Computing, Networking, Storage and Analysis (SC21) conference, recognizes the effort and collaboration required to move and safeguard irreplaceable data (over 50 years of astronomical observations) from the Arecibo observatory following the structural collapse of this scientific resource in 2016.

At ESnet, Ken Miller, George Robb, and Jason Zurawski supported these efforts as both full members of EPOC and ESnet staff. Both Jason and Ken divide their time between ESnet’s Science Engagement Team, while George is with ESnet’s Infrastructure Systems group. LightBytes looped up with Jason Zurawski to get his thoughts on the project and award, and an update on the Arecibo effort since our post in April 2021 on this project.

Now that data from Arecibo has been migrated to the Texas Advanced Computing Center (TACC), what happens now, and how will this data be used?

The team at the University of Central Florida has been engaged with TACC on several ways to build up the capabilities for their data analysis and sharing requirements. They are working to deploy a portal that will allow researchers access to the data, as well as build workflows to investigate and process using computation provided by TACC.

The team at Arecibo is also still going to process much older data that still resides on tape. Due to the delicate state of the media, it is carefully being read and transferred to on-island storage before being transmitted to TACC for archiving. This work will take several more months to complete.

What do you think the lessons from this effort are in terms of getting so many different organizations to work together to support this very challenging problem?

The collapse that Arecibo experienced sent ripples through the R&E community because researchers and technology professionals alike knew there was a limited window to act on replicating important observations gathered over the years. The partners in this effort were motivated to act, and that removed many barriers to putting some solutions in place. Everyone collaborated efficiently with their core competencies, and we continue to work together as the next steps for the scientific collaboration are planned.

Plans are starting to emerge for a “next generation” Arecibo based on the loss of this instrument, how might the next generation of data management resources be shaped by this collaboration?

Now that there has been some time to evaluate the work, it has also spurred UCF and Arecibo to plan for the future with respect to computation, storage, and network connectivity both in Puerto Rico and in Florida.  With these improvements planned, they will be well-positioned to serve the scientific data for years to come.  New instruments will no doubt increase the data demands by many orders of magnitude – addressing all aspects of the data pipeline now, and then gradually increasing the capabilities over time, will help to prepare for these emerging challenges. 

Congratulations to all of the organizations and staff who helped prevent the loss of this data!

Arecibo Data Recovery: Behind the Scenes with Jason Zurawski

The dramatic collapse of the Arecibo Observatory Radio Telescope in Puerto Rico in December was a terrible loss for global science. The 305-meter dish had served for over 50 years, supporting a wide range of cosmic and earth science applications, including transmission of the famed “Arecibo Message” to globular star cluster M13 by a team led by Frank Drake and Carl Sagan in 1974.

When the 900-ton instrument platform crashed onto the observatory dish, the National Science Foundation was faced with a variety of challenges. Most immediately, how to ensure that several petabytes of historic (and now irreplaceable) data at the Arecibo Observatory (AO) data center, in the form of tapes, hard drives, and other physical media, could be preserved and moved off-site as an approximately $50M site cleanup and environmental remediation project begins to demobilize the iconic observatory.

This data recovery effort has required rapid mobilization of a team from the University of Central Florida (UCF), the Texas Advanced Computing Center  (TACC), the University of Puerto Rico (UPR), the University of Chicago, and others. A more detailed description of this overall effort has just been released here. In this blog, I will describe the key role that ESnet and the Engagement and Performance Operations Center (EPOC) played in this effort to save valuable scientific data.

My colleagues Hans Addleman (Indiana University International Networks), George Robb, and I became part of science use case discussions with AO and UCF as part of an ESnet requirements review and EPOC Deep Dive support to Arecibo in early 2020. In the summer of 2020, these efforts became much more active after the first suspension cable failed and AO began activities to migrate data storage and processing to a commercial cloud. We provided support to the Arecibo team for data movement hardware and software deployment.

With the failure of a second cable in November 2020, it became apparent that the facility had become unstable; this increased pressure on the team to find a faster solution. The UCF site management team decided that migration to the commercial cloud over the available 1Gbps connection (a previous 10Gbps connection was damaged by Hurricane Maria in 2017) would not meet requirements, so another data migration strategy was needed.

By December the  team developed an alternative data migration approach leveraging a timely offer of storage capacity at TACC. Because of the urgency, the team decided to move data using physical Network Attached Storage (NAS) appliances; data on tapes and other original sources were loaded onto the NAS at Arecibo. The NAS were then driven to data centers on the island: either at the UPR campuses located at Mayagüez or Río Piedras, or at a commercial data center on the island, each of which were connected to the global R&E network at 10Gbps. Using Globus data transfer software, the AO team then began the process of transferring the data to TACC. Using multiple devices, and by setting up a constantly moving supply line, they were able to fill a disk, transport to the better connected locations, start a transfer, take back a completed disk, and return to the AO data center to start the process all over. 

EPOC team members (specifically George) spent a lot of time working with AO and Globus technical support to tune the NAS appliances (which are usually used in commercial/enterprise settings) to be able to transfer the data at higher rates of performance than the factory settings allowed. EPOC, AO, UCF, TACC, and UPR staff also ran perfSONAR tests to ensure the entire path was able to deliver on these faster speeds that were necessary. George will be presenting a talk at Globus World in May, and those interested in more information about how this networking and disk NAS tuning was done should plan to attend.

The data transfer operation started in late December 2020 and is expected to continue through the spring of 2021, as stored data (on disks and tapes at AO) is transferred to TACC. As data flows into TACC’s storage cluster from Arecibo’s holdings, ESnet and the entire collaboration team will ensure that it is made widely available to the scientific community to perform new studies with this valuable research data. 

The destruction of the AO Radio Telescope was a catastrophe for global science; however, the quick response of the entire data recovery team helped prevent the loss of much of the valuable data collected by Arecibo over its lifetime. I’m very proud of this accomplishment: the work of the entire ESnet team and our data infrastructure ensured that we had the right capabilities at the right time to make a difference for science.