Graduate students publish on network telemetry with ESnet

Two graduate students working with ESnet have published their papers recently in IEEE and ACM workshops.

Bibek Shrestha, a graduate student at the University of Nevada, Reno, and his advisor Engin Arslan worked with Richard Cziva from ESnet to publish a work on “INT Based Network-Aware Task Scheduling for Edge Computing”. In the paper, Bibek investigated the use of in-band network telemetry (INT) for real-time in-network task scheduling. Bibek’s experimental analysis using various workload types and network congestion scenarios revealed that enhancing task scheduling of edge computing with high-precision network telemetry can lead up to a 40% reduction in data transfer times and up to 30% reduction in total task execution times by favoring edge servers in uncongested (or mildly congested) sections of the network when scheduling tasks. The paper will appear in the 3rd Workshop on Parallel AI and Systems for the Edge (PAISE) co-conducted with IEEE IPDPS 2021 conference to be held on May 21st, 2021, in Portland, Oregon. 

Zhang Liu, a former ESnet intern and a current graduate student at the University of Colorado at Boulder, worked with the ESnet High Touch Team – Chin Guok, Bruce Mah, Yatish Kumar, and Richard Cziva – on fastcapa-ng, ESnet’s telemetry processing software. In the paper “Programmable Per-Packet Network Telemetry: From Wire to Kafka at Scale,” Zhang showed the scaling and performance characteristics of fastcapa-ng, and highlighted the most critical performance considerations that allow the pushing of 10.4 million telemetry packets per second to Kafka with only 5 CPU cores, which is more than enough to handle 170 Gbit/s of original traffic with 1512B MTU. This paper will appear in the 4th International Workshop on Systems and Network Telemetry and Analytics (SNTA 2021) held at the ACM HPCD 2021 conference in Stockholm, Sweden between 21-25 June 2021.

Congratulations Bibek and Zhang!


If you are a networked systems research student looking to collaborate with us on network measurements, please reach out to Richard Cziva. If you are interested in a summer internship with ESnet, please visit this page.

Arecibo Data Recovery: Behind the Scenes with Jason Zurawski

The dramatic collapse of the Arecibo Observatory Radio Telescope in Puerto Rico in December was a terrible loss for global science. The 305-meter dish had served for over 50 years, supporting a wide range of cosmic and earth science applications, including transmission of the famed “Arecibo Message” to globular star cluster M13 by a team led by Frank Drake and Carl Sagan in 1974.

When the 900-ton instrument platform crashed onto the observatory dish, the National Science Foundation was faced with a variety of challenges. Most immediately, how to ensure that several petabytes of historic (and now irreplaceable) data at the Arecibo Observatory (AO) data center, in the form of tapes, hard drives, and other physical media, could be preserved and moved off-site as an approximately $50M site cleanup and environmental remediation project begins to demobilize the iconic observatory.

This data recovery effort has required rapid mobilization of a team from the University of Central Florida (UCF), the Texas Advanced Computing Center  (TACC), the University of Puerto Rico (UPR), the University of Chicago, and others. A more detailed description of this overall effort has just been released here. In this blog, I will describe the key role that ESnet and the Engagement and Performance Operations Center (EPOC) played in this effort to save valuable scientific data.

My colleagues Hans Addleman (Indiana University International Networks), George Robb, and I became part of science use case discussions with AO and UCF as part of an ESnet requirements review and EPOC Deep Dive support to Arecibo in early 2020. In the summer of 2020, these efforts became much more active after the first suspension cable failed and AO began activities to migrate data storage and processing to a commercial cloud. We provided support to the Arecibo team for data movement hardware and software deployment.

With the failure of a second cable in November 2020, it became apparent that the facility had become unstable; this increased pressure on the team to find a faster solution. The UCF site management team decided that migration to the commercial cloud over the available 1Gbps connection (a previous 10Gbps connection was damaged by Hurricane Maria in 2017) would not meet requirements, so another data migration strategy was needed.

By December the  team developed an alternative data migration approach leveraging a timely offer of storage capacity at TACC. Because of the urgency, the team decided to move data using physical Network Attached Storage (NAS) appliances; data on tapes and other original sources were loaded onto the NAS at Arecibo. The NAS were then driven to data centers on the island: either at the UPR campuses located at Mayagüez or Río Piedras, or at a commercial data center on the island, each of which were connected to the global R&E network at 10Gbps. Using Globus data transfer software, the AO team then began the process of transferring the data to TACC. Using multiple devices, and by setting up a constantly moving supply line, they were able to fill a disk, transport to the better connected locations, start a transfer, take back a completed disk, and return to the AO data center to start the process all over. 

EPOC team members (specifically George) spent a lot of time working with AO and Globus technical support to tune the NAS appliances (which are usually used in commercial/enterprise settings) to be able to transfer the data at higher rates of performance than the factory settings allowed. EPOC, AO, UCF, TACC, and UPR staff also ran perfSONAR tests to ensure the entire path was able to deliver on these faster speeds that were necessary. George will be presenting a talk at Globus World in May, and those interested in more information about how this networking and disk NAS tuning was done should plan to attend.

The data transfer operation started in late December 2020 and is expected to continue through the spring of 2021, as stored data (on disks and tapes at AO) is transferred to TACC. As data flows into TACC’s storage cluster from Arecibo’s holdings, ESnet and the entire collaboration team will ensure that it is made widely available to the scientific community to perform new studies with this valuable research data. 

The destruction of the AO Radio Telescope was a catastrophe for global science; however, the quick response of the entire data recovery team helped prevent the loss of much of the valuable data collected by Arecibo over its lifetime. I’m very proud of this accomplishment: the work of the entire ESnet team and our data infrastructure ensured that we had the right capabilities at the right time to make a difference for science.

New Ground-Shaking Science with Dark Fiber

As a Network Engineer at ESnet, I am no stranger to the importance of designing and maintaining a robust fiber-optic network. To operate a network that will “enable and accelerate scientific discovery by delivering unparalleled network infrastructure, capabilities, and tools,” ESnet has acquired an impressive US continental footprint of more than 21,000 kilometers of leased fiber-optic cable. We spend a great deal of effort designing and sourcing redundant fiber-optic paths to support network data connectivity between scores of DOE Office of Science facilities and research collaborators across the country.

But network data transfer is only one of the uses for fiber-optic cable. What about using buried fiber-optic cable for some truly “ground-shaking” science? The answer is “Yes, absolutely!” – and I was fortunate to play a part in exploring new uses for fiber-optic cable networks this past year.

Back in 2017, the majority of our 21,000 km fiber footprint was still considered “dark fiber,” meaning it was not yet in use. At that time, ESnet was actively working on the design to upgrade from our current production network “ESnet5” to our next-generation network “ESnet6,” but we hadn’t yet put our fiber into production.

At the same time, Dr. Jonathan Ajo-Franklin, then graduate students Nate Lindsey and Shan Dou, and the Berkeley Lab’s Earth and Environmental Science Area (EESA) were exploring the use of distributed acoustic sensing (DAS) technology to detect seismic waves by using laser pulses across buried fiber optic cable. The timing was perfect to try and expand on the short-range tests that Dr. Ajo-Franklin and his team had been performing at the University of California’s Richmond Field Station by using a section of the unused ESnet dark fiber footprint in the West Sacramento area for more extensive testing. ESnet’s own Chris Tracy worked with Dr. Ajo-Franklin and team to demonstrate how the underground fiber-optic cables running from West Sacramento northwest toward Woodland in California’s Central Valley made an excellent sensor platform for early earthquake detection, monitoring groundwater, and mapping new sources of potential geothermal energy.

The Sacramento ESnet Dark Fiber Route (left) and seismic events recorded on the array from around the world including the massive M 8.1 earthquake in Chiapas, Mexico.

Fast forward to May 2019, and Dr. Ajo-Franklin was heading up a new collaborative scientific research project for the DOE’s Geothermal Technology Office based on his prior DAS experimentation successes using ESnet fiber. The intent was to map potential geothermal energy locations in the California Imperial Valley south of the Salton Sea, near Calipatria and El Centro. The team, including scientists in EESA, Lawrence Livermore National Laboratory (LLNL), and Rice University needed a fiber path to conduct the experiment. It would make sense to assume that ESnet’s fiber footprint, which runs through that area, would be an excellent candidate for this experiment. Fortunately for ESnet’s other users, but unfortunately for the DAS team, by 2018 the ESnet6 team was already “lighting” this previously dark fiber. 

However, just because ESnet fiber in the Imperial Valley was no longer a candidate for DAS-based experiments, that didn’t mean there weren’t ways to gain access to unused dark fiber. For every piece of fiber that has been put into production to support ESnet6, there are dozens if not hundreds of other fibers running right alongside it. When fiber-optic providers install new fiber paths, they pull large cables consisting of many individual fibers to lease or sell to as many customers as possible. Because the ESnet fiber footprint was running right through the Imperial Valley, we knew that there was likely unused fiber in the ground, and only had to find a provider that would be willing to lease a small section to Berkeley Lab for Dr. Ajo-Franklin’s experiment. 

Making the search a little more complicated, the DAS equipment utilized for this experiment has an effective sensing range that is limited to less than 30 kilometers. Most fiber providers expect to lease long sections of fiber connecting metropolitan areas. For example, the fiber circuits that run through the Imperial Valley are actually intended to connect metropolitan areas of Arizona to large cities in Southern California. Finding a provider that would be willing to break up a continuous 600 km circuit connecting Phoenix to Los Angeles just to sell a 30 km piece for a year-long research project would be a difficult task.  

One of my contributions to the ESnet6 project was sourcing new dark fiber circuits and data center colocation spaces to “fill out” our existing footprint and get ready for our optical system deployments. Because of those efforts, I knew that there were often entire sections of fiber that had been damaged across the country and would likely not be repaired until there was a new customer that wanted to lease the fiber. I was asked to assist Dr. Ajo-Franklin and his team to engineer a new fiber solution for the experiment. I just had to find someone willing to lease us one of these small damaged sections.

After speaking with many providers in the area, the communications company Zayo was able to find a section of fiber starting in Calipatria, heading south through El Centro and then west to Plaster City, that was a great candidate for DAS use. This section of fiber had been accidentally cut near Plaster City and was considered unusable for networking purposes. Working with Zayo, we were able to negotiate a lease on this “broken” fiber span along with a small amount of rack space and power to house the DAS equipment that Dr. Ajo-Franklin’s team would need to move forward with their research.  

The Imperial Valley Dark Fiber Array: (A) Team Co-PI Veronica Rodriguez Tribaldos (LBNL) turning on the DAS system. (B) The ILA used to house the equipment in Calipatria. (C) The Zayo fiber section currently being used in the experiment. (D) The corresponding DAS data showing a magnitude 2.6 earthquake located near the Salton Sea, to the north. 

This cut fiber segment was successfully “turned up” for the project on November 10, 2020 by a team including Co-PI Veronica Rodriguez Tribaldos, Michelle Robertson, and Todd Wood (EESA/LBNL), and seismic data collection equipment is now up and running. The figure above (D) shows some great initial data recorded on the array, a small earthquake many miles to the north. There will be many more articles and reports from the Imperial Valley Dark Fiber Team as they continue to gather data and perform their experiments, and I’m sure we’ll begin to see fiber across the country put to use for this type of sensing and research.

I’ve had a great experience working with the different groups that were assembled for this project. By seeing how new technologies and methods are being developed to use fiber-optic cable for important research outside of computing science, I’ve developed a greater appreciation for how our labs and universities are tackling some of our biggest energy and public safety challenges.

Into the Medical Science DMZ

iStock-629606180
Speeding research. The Medical Science DMZ expedites data transfers for scientists working on large-scale research such as biomedicine and genomics while maintaining federally-required patient privacy.

In a new paperLawrence Berkeley National Laboratory (Berkeley Lab) computer scientist Sean Peisert and Energy Sciences Network (ESnet) researcher Eli Dart and their collaborators outline a “design pattern” for deploying specialized research networks and ancillary computing equipment for HIPAA-protected biomedical data that provides high-throughput network data transfers and high-security protections.

“The original Science DMZ model provided a way of securing high-throughput data transfer applications without the use of enterprise firewalls,” says Dart. “You can protect data transfers using technical controls that don’t impose performance limitations.”

Read More at Science Node: https://sciencenode.org/feature/into-the-science-dmz.php 

Sean-and-Eli
Left: Eli Dart, ESnet Engineer | Right:  Sean Peisert, Berkeley Lab Computer Scientist

CENIC Honors Astrophysics Link to NERSC via ESnet

unnamed-8
A star-forming region of the Large Magellanic Cloud (Credit: European Space Agency via the Hubble Telescope)

An astrophysics project connecting UC Santa Cruz’s Hyades supercomputer cluster to NERSC via ESnet and other networks won the CENIC 2018 Innovations in Networking Award for Research Applications announced last week.

Through a consortium of Science DMZs and links to NERSC via CENIC’s CalREN and the DOE’s ESnet, the connection enables UCSC to carry out the high-speed transfer of large data sets produced at NERSC, which supports the Dark Energy Spectroscopic Instrument (DESI) and AGORA galaxy simulations, at speeds up to five times previous rates. These speeds have the potential to be increased by 20 times the previous rates in 2018. Peter Nugent, an astronomer and cosmologist from the Computational Research Division, was pivotal in the effort. Read UC Santa Cruz’s press release.

Berkeley Lab and ESnet Document Flow, Performance of 56 Terabytes Climate Data Transfer

Visualization by Prabhat (Berkeley Lab).
The simulated storms seen in this visualization are generated from the finite volume version of NCAR’s Community Atmosphere Model. Visualization by Prabhat (Berkeley Lab).

In a recent paper entitled “An Assessment of Data Transfer Performance for Large‐Scale Climate Data Analysis and Recommendations for the Data Infrastructure for CMIP6,” experts from Lawrence Berkeley National Laboratory (Berkeley Lab) and ESnet (the Energy Sciences Network, document the data transfer workflow, data performance, and other aspects of transferring approximately 56 terabytes of climate model output data for further analysis.

The data, required for tracking and characterizing extratropical storms, needed to be moved from the distributed Coupled Model Intercomparison Project (CMIP5) archive to the National Energy Research Supercomputing Center (NERSC) at Berkeley Lab.

The authors found that there is significant room for improvement in the data transfer capabilities currently in place for CMIP5, both in terms of workflow mechanics and in data transfer performance. In particular, the paper notes that performance improvements of at least an order of magnitude are within technical reach using current best practices.

To illustrate this, the authors used Globus to transfer the same raw data set between NERSC and Argonne Leadership Computing Facility (ALCF) at Argonne National Lab.

Read the Globus story: https://www.globus.org/user-story-lbl-and-esnet
Read the paper: https://arxiv.org/abs/1709.09575

National Science Foundation & Department of Energy’s ESnet Launch Innovative Program for Women Engineers

Women in Networking @SC (WINS) Kicks off this week in Salt Lake City!

WINS Participants
(Left to Right) Julia Locke (LANL), Debbie Fligor (SC15 WINS returning participant, University of Illinois at Urbana-Champaign), Jessica Schaffer (Georgia Tech), Indira Kassymkhanova (LBNL), Denise Grayson (Sandia), Kali McLennan (Univ. of Oklahoma), Angie Asmus (CSU). Not in photo:  Amber Rasche (N. Dakota State) and Julie Staats (CENIC).

Salt Lake City, UT – October 26, 2016 – The University of Corporation for Atmospheric Research (UCAR) and The Keystone Initiative for Network Based Education and Research (KINBER) together with the Department of Energy’s (DOE) Energy Science Network (ESnet) today announce the official launch of the Women in Networking at SC (WINS) program.

Funded through a grant from the National Science Foundation (NSF) and directly from ESnet, the program funds eight early to mid-career women in the research and education (R&E) network community to participate in the 2016 setup, build out and live operation of SCinet, the Supercomputing Conference’s (SC) ultra high performance network. SCinet supports large-scale computing demonstrations at SC,  the premier international conference on high performance computing, networking, data storage and data analysis and is attended by over 10,000 of the leading minds in these fields.

The SC16 WINS program kicked off this week as the selected participants from across the U.S., headed to Salt Lake City, the site of the 2016 conference to begin laying the groundwork for SCinet inside the Salt Palace Convention Center. The WINS participants join over 250 volunteers that make up the SCinet engineering team and will work side by side with the team and their mentors to put the network into full production service when the conference begins on November 12. The women will return to Salt Lake City a week before the conference to complete the installation of the network.

“We are estimating that SCinet will be outfitted with a massive 3.5 Terabits per second (Tbps) of bandwidth for the conference and will be built from the ground up with leading edge network equipment and services (even pre-commercial in some instances) and will be considered the fastest network in the world during its operation,” said Corby Schmitz, SC16 SCinet Chair.

The WINS participants will support a wide range of technical areas that comprise SCinet’s incredible operation, including wide area networking, network security, wireless networking, routing, network architecture and other specialties. 

20161024_112607
Several WINS participants hard at work with their mentors configuring routers & switches

“While demand for jobs in IT continues to increase, the number of women joining the IT workforce has been on the decline for many years,” said Marla Meehl, Network Director from UCAR and co-PI of the NSF grant. “WINS aims to help close this gap and help to build and diversify the IT workforce giving women professionals a truly unique opportunity to gain hands-on expertise in a variety of networking roles while also developing mentoring relationships with recognized technical leaders.”

Funds are being provided by the NSF through a $135,000 grant and via direct funding from ESnet supported by Advanced Scientific Computing Research (ASCR) in DOE Office of Science. Funding covers all travel expenses related to participating in the setup and operation of SCinet and will also provide travel funds for the participants to share their experiences at events like The Quilt Member Meetings, Regional Networking Member meetings, and the DOE National Lab Information Technology Annual Meeting.

“Not only is WINS providing hands-on engineering training to the participants but also the opportunity to present their experiences with the broader networking community throughout the year. This experience helps to expand important leadership and presentations skills and grow their professional connections with peers and executives alike,” said Wendy Huntoon, president and CEO of KINBER and co-PI of the NSF grant.

The program also represents a unique cross-agency collaboration between the NSF and DOE.  Both agencies recognize that the pursuit of knowledge and science discovery that these funding organizations support depends on bringing the best ideas from people of various backgrounds to the table.  

“Bringing together diverse voices and perspectives to any team in any field has been proven to lead to more creative solutions to achieve a common goal,” says Lauren Rotman, Science Engagement Group Lead, ESnet. “It is vital to our future that we bring every expert voice, every new idea to bear if our community is to tackle some of our society’s grandest challenges from understanding climate change to revolutionizing cancer treatment.”

2016 WINS Participants are:

  • Denise Grayson, Sandia National Labs (Network Security Team), DOE-funded
  • Julia Locke, Los Alamos National Lab (Fiber and Edge Network Teams), DOE-funded
  • Angie Asmus, Colorado State (Edge Network Team), NSF-funded
  • Kali McLennan, University of Oklahoma (WAN Transport Team), NSF-funded
  • Amber Rasche, North Dakota State University (Communications Team), NSF-funded
  • Jessica Shaffer, Georgia Institute of Tech (Routing Team), NSF-funded
  • Julia Staats, CENIC (DevOps Team), NSF-funded
  • Indira Kassymkhanova, Lawrence Berkeley National Lab (DevOps and Routing Teams), DOE-funded

The WINS Supporting Organizations:
The University Corporation for Atmospheric Research (UCAR)
http://www2.ucar.edu/

The Keystone Initiative for Network Based Education and Research (KINBER)
http:www.kinber.org

THe Department of Energy’s Energy Sciences Network (ESnet)
http://www.es.net

How the World’s Fastest Science Network Was Built

Created in 1986, the U.S. Department of Energy’s (DOE’s) Energy Sciences Network (ESnet) is a high-performance network built to support unclassified science research. ESnet connects more than 40 DOE research sites—including the entire National Laboratory system, supercomputing facilities and major scientific instruments—as well as hundreds of other science networks around the world and the Internet.

Funded by DOE’s Office of Science and managed by the Lawrence Berkeley National Laboratory (Berkeley Lab), ESnet moves about 51  petabytes of scientific data every month. This is a 13-step guide about how ESnet has evolved over 30 years.

Step 1: When fusion energy scientists inherit a cast-off supercomputer, add 4 dialup modems so the people at the Princeton lab can log in. (1975)

Online3

Step 2: When landlines prove too unreliable, upgrade to satellites! Data screams through space. (1981)

18ogxd

Step 3: Whose network is best? High Energy Physics (HEPnet)? Fusion Physics (MFEnet)?  Why argue? Merge them into one-Energy Sciences Network (ESnet)-run by the Department of Energy!  Go ESnet! (1986)

ESnetListicle

Step 4: Make it even faster with DUAL Satellite links! We’re talking 56 kilobits per second! Except for the Princeton fusion scientists – they get 112 Kbps! (1987)

Satellite

Step 5:  Whoa, when an upgrade to 1.5 MEGAbits per second isn’t enough, add ATM (not the money machine, but Asynchronous Transfer Mode) to get more bang for your buck. (1995)

18qlbh

Step 6: Duty now for the future—roll out the very first IPv6 address to ensure there will be enough Internet addresses for decades to come. (2000)

18s8om

Step 7: Crank up the fastest links in the network to 10 GIGAbits per second—16 times faster than the old gear—a two-generation leap in network upgrades at one time. (2003)

18qlnc

Step 8: Work with other networks to develop really cool tools, like the perfSONAR toolkit for measuring and improving end-to-end network performance and OSCARS (On-Demand Secure Circuit and Advance Reservation), so you can reserve a high-speed, end-to-end connection to make sure your data is delivered on time. (2006)

18qn9e

Step 9: Why just rent fiber? Pick up your own dark fiber network at a bargain price for future expansion. In the meantime, boost your bandwidth to 100G for everyone. (2012)

18on55

Step 10: Here’s a cool idea, come up with a new network design so that scientists moving REALLY BIG DATASETS can safely avoid institutional firewalls, call it the Science DMZ, and get research moving faster at universities around the country. (2012)

18onw4

18oo6c

Step 11: We’re all in this science thing together, so let’s build faster ties to Europe. ESnet adds three 100G lines (and a backup 40G link) to connect researchers in the U.S. and Europe. (2014)

18qnu6

Step 12: 100G is fast, but it’s time to get ready for 400G. To pave the way, ESnet installs a production 400G network between facilities in Berkeley and Oakland, Calif., and even provides a 400G testbed so network engineers can get up to speed on the technology. (2015)

18oogv

Step 13: Celebrate 30 years as a research and education network leader, but keep looking forward to the next level. (2016)

ESnetFireworks

Berkeley Lab Staff to Present Super-facility Science Model at Internet2 Conference

Berkeley Lab staff from five divisions will share their expertise in a panel discussion on “Creating Super-facilities: a Coupled Facility Model for Data-Intensive Science at the Internet2 Global Summit to be held April 26-30 in Washington, D.C. The panel was organized by Lauren Rotman of ESnet and includes Alexander Hexemer of the Advanced Light Source (ALS), Craig Tull of CRD, David Skinner of NERSC and Rune Stromsness of the IT Division.

The session will highlight the concept of a coupled science facility or “super-facility,” a new model that links together experimental facilities like the ALS with computing facilities like NERSC via a Science DMZ architecture and advanced workflow and analysis software, such as SPOT Suite developed by Tull’s group. The session will share best practices, lessons learned and future plans to expand this effort.

Also at the conference, ESnet’s Brian Tierney will speak in a session oh “perfSONAR: Meeting the Community’s Needs.” Co-developed by ESnet, perfSONAR is a tool for end-to-end monitoring and troubleshooting of multi-domain network performance. The session will give an overview of the perfSONAR project, including an overview of the 3.4 release, a preview of the 3.5 release, an overview of the product plan, and an overview of perfSONAR training plan.

Across the Universe: Cosmology Data Management Workshop Draws Stellar Crowd

CrossConnects1ESnet’s Eli Dart (left), Salman Habib (center) of Argonne National Lab and Joel Brownstein of the University of Utah compare ideas during a workshop break.

ESnet and Internet2 hosted last week’s CrossConnects Workshop on “Improving Data Mobility & Management for International Cosmology,” a two-day meeting ESnet Director Greg Bell described as the best one yet in the series. More than 50 members of the cosmology and networking research community turned out for the event hosted at Lawrence Berkeley National Laboratory, while another 75 caught the live stream from the workshop.

The Feb. 10-11 workshop provided a forum for discussing the growing data challenges associated with the ever-larger cosmological and observational data sets, which are already reaching the petabyte scale. Speakers noted that network bandwidth is no longer the bottleneck into the major data centers, but storage capacity and performance from the network to storage remain a challenge. In addition, network connectivity to telescope facilities is often limited and expensive due to the remote location of the facilities. Science collaborations use a variety of techniques to manage these issues, but improved connectivity to telescope sites would have a significant scientific benefit in many cases.

In his opening keynote talk, Peter Nugent of Berkeley Lab’s Computational Research Division said that astrophysics is transforming from a data-starved to a data-swamped discipline. Today, when searching for supernovae, one object in the database consists of thousands of images, each 32 MB in size. That data needs to be processed and studied quickly so when an object of interest is found, telescopes around the world can begin tracking it in less than 24 hours, which is critical as the supernovae are at their most visible for just a few weeks. Specialized pipelines have been developed to handle this flow of images to and from NERSC.

Salman Habib of Argonne National Laboratory’s High Energy Physics and the Mathematics and Computer Science Divisions opened the second day of the workshop, focused on cosmology simulations and workflows. Habib leads DOE’s Computation-Driven Discovery for the Dark Universe project. Habib pointed out that large-scale simulations are critical for understanding observational data and that the size and scale of simulation datasets far exceed those of observational data. “To be able to observe accurately, we need to create accurate simulations,” he said. Simulations will soon create 100 petabyte sets of raw data, and the limiting factor for handling these will be the amount of available storage, so smaller “snapshots” of the datasets will need to be created. And while one person can run the simulation itself, analyzing the resulting data will involve the whole community.

Reijo Keskitalo of Berkeley Lab’s Computational Cosmology Center described how computational support for the Planck Telescope has relied on HPC to generate the largest and most complete simulation maps of the cosmic microwave background, or CMB. In 2006, the project was the first to run on all 6,000 CPUs of Seaborg, NERSC’s IBM flagship at the time. It took six hours on the machine to produce one map. Now, running on 32,000 CPUs on Edison, the project can generate 10,000 maps in just one hour.

Mike Norman, head of the San Diego Supercomputer Center, offered that high performance computing can become distorted by “chasing the almighty FLOP,” or floating point operations per second. “We need to focus on science outcomes, not TOP500 scores.”

Over the course of the workshop, ESnet Director Greg Bell noted that observation and simulation are no longer separate scientific endeavors.

The workshop drew a stellar group of participants. In addition to the leading lights mentioned above, attendees included Larry Smarr, founder of NCSA and current leader of the California Institute for Telecommunications and Information Technology, a $400 million academic research institution jointly run by the University of California, San Diego and UC Irvine; and Ian Foster, who leads the Computation Institute at the University of Chicago and is a senior scientist at Argonne National Lab. Foster is also recognized as one of the inventors of grid computing.

The next step for the workshop organizers is to publish a report and identify areas for further study and collaboration. Looming over them will be the thoughts of Steven T. Myers of the National Radio Astronomy Observatory after describing the data challenges coming with the Square Kilometer Array radio telescope: “The future is now. And the data is scary. Be afraid. But resistance is futile.”