Attending SC15? Get a Close-up Look at Virtualized Science DMZs as a Service
ESnet, NERSC and RENCI are pooling their expertise to demonstrate “Virtualized Science SMZs as a Service” at the SC15 conference being held Nov. 15-20 in Austin. They will be giving the demos at 2:30-3:30 p.m. Tuesday and Wednesday and 1:30-2:30 p.m. Thursday in the RENCI booth #181.
Here’s the background: Many campuses are installing ScienceDMZs to support efficient large-scale scientific data transfers. There’s a need to create custom configurations of ScienceDMZs for different groups on campus. Network function virtualization (NFV) combined with compute and storage virtualization enables a multi-tenant approach to deploying virtual ScienceDMZs. It makes it possible for campus IT or NREN organizations to quickly deploy well-tuned ScienceDMZ instances targeted at a particular collaboration or project. This demo shows a prototype implementation of ScienceDMZ-as-a-Service using ExoGENI racks (ExoGENI is part of NSF GENI federation of testbeds) deployed at StarLight facility in Chicago and at NERSC.
The virtual ScienceDMZs deployed on-demand in these racks use the SPOT software suite developed at Lawrence Berkeley National Laboratory to connect to a data source at Argonne National Lab and a compute cluster at NERSC to provide seamless end-to-end high-speed data transfers of data acquired from Argonne’s Advanced Photon Source (APS) to be processed at NERSC. The ExoGENI racks dynamically instantiate necessary compute virtual resources for ScienceDMZ functions and connect to each other on-demand using ESnet’s OSCARS and Internet2’s AL2S system.
ESnet’s Eli Dart (left), Salman Habib (center) of Argonne National Lab and Joel Brownstein of the University of Utah compare ideas during a workshop break.
ESnet and Internet2 hosted last week’s CrossConnects Workshop on “Improving Data Mobility & Management for International Cosmology,” a two-day meeting ESnet Director Greg Bell described as the best one yet in the series. More than 50 members of the cosmology and networking research community turned out for the event hosted at Lawrence Berkeley National Laboratory, while another 75 caught the live stream from the workshop.
The Feb. 10-11 workshop provided a forum for discussing the growing data challenges associated with the ever-larger cosmological and observational data sets, which are already reaching the petabyte scale. Speakers noted that network bandwidth is no longer the bottleneck into the major data centers, but storage capacity and performance from the network to storage remain a challenge. In addition, network connectivity to telescope facilities is often limited and expensive due to the remote location of the facilities. Science collaborations use a variety of techniques to manage these issues, but improved connectivity to telescope sites would have a significant scientific benefit in many cases.
In his opening keynote talk, Peter Nugent of Berkeley Lab’s Computational Research Division said that astrophysics is transforming from a data-starved to a data-swamped discipline. Today, when searching for supernovae, one object in the database consists of thousands of images, each 32 MB in size. That data needs to be processed and studied quickly so when an object of interest is found, telescopes around the world can begin tracking it in less than 24 hours, which is critical as the supernovae are at their most visible for just a few weeks. Specialized pipelines have been developed to handle this flow of images to and from NERSC.
Salman Habib of Argonne National Laboratory’s High Energy Physics and the Mathematics and Computer Science Divisions opened the second day of the workshop, focused on cosmology simulations and workflows. Habib leads DOE’s Computation-Driven Discovery for the Dark Universe project. Habib pointed out that large-scale simulations are critical for understanding observational data and that the size and scale of simulation datasets far exceed those of observational data. “To be able to observe accurately, we need to create accurate simulations,” he said. Simulations will soon create 100 petabyte sets of raw data, and the limiting factor for handling these will be the amount of available storage, so smaller “snapshots” of the datasets will need to be created. And while one person can run the simulation itself, analyzing the resulting data will involve the whole community.
Reijo Keskitalo of Berkeley Lab’s Computational Cosmology Center described how computational support for the Planck Telescope has relied on HPC to generate the largest and most complete simulation maps of the cosmic microwave background, or CMB. In 2006, the project was the first to run on all 6,000 CPUs of Seaborg, NERSC’s IBM flagship at the time. It took six hours on the machine to produce one map. Now, running on 32,000 CPUs on Edison, the project can generate 10,000 maps in just one hour.
Mike Norman, head of the San Diego Supercomputer Center, offered that high performance computing can become distorted by “chasing the almighty FLOP,” or floating point operations per second. “We need to focus on science outcomes, not TOP500 scores.”
Over the course of the workshop, ESnet Director Greg Bell noted that observation and simulation are no longer separate scientific endeavors.
The workshop drew a stellar group of participants. In addition to the leading lights mentioned above, attendees included Larry Smarr, founder of NCSA and current leader of the California Institute for Telecommunications and Information Technology, a $400 million academic research institution jointly run by the University of California, San Diego and UC Irvine; and Ian Foster, who leads the Computation Institute at the University of Chicago and is a senior scientist at Argonne National Lab. Foster is also recognized as one of the inventors of grid computing.
The next step for the workshop organizers is to publish a report and identify areas for further study and collaboration. Looming over them will be the thoughts of Steven T. Myers of the National Radio Astronomy Observatory after describing the data challenges coming with the Square Kilometer Array radio telescope: “The future is now. And the data is scary. Be afraid. But resistance is futile.”
Registration is now open for a workshop on “Improving Data Mobility and Management for International Cosmology” to be held Feb. 10-11 at Lawrence Berkeley National Laboratory in California. The workshop, one in a series of Cross-Connects workshops, is sponsored the by the Dept. of Energy’s ESnet and Internet2.
Early registration is encouraged as attendance is limited and the past two workshops were filled and had waiting lists. Registration is $200 including breakfast, lunch and refreshments for both days. Visit the Cross-Connects Workshop website for more information.
Cosmology data sets are already reaching into the petabyte scale and this trend will only continue, if not accelerate. This data is produced from sources ranging from supercomputing centers—where large-scale cosmological modeling and simulations are performed—to telescopes that are producing data daily. The workshop is aimed at helping cosmologists and data managers who struggle with data workflow, especially as the need for real-time analysis of cosmic events increases.
Renowned cosmology experts Peter Nugent and Salman Habib will give keynote speeches at the workshop.
Nugent, Senior Scientist and Division Deputy for Science Engagement in the Computational Research Division at Lawrence Berkeley National Laboratory, will deliver a talk on “The Palomar Transient Factory” and how observational data in astrophysics, integrated with high-performance computing resources, benefits the discovery pipeline for science.
Habib, a member of the High Energy Physics and Mathematics and Computer Science Divisions at Argonne National Laboratory, a Senior Member of the Kavli Institute for Cosmological Physics at the University of Chicago, and a Senior Fellow in the Computation Institute, will give the second keynote on “Cosmological Simulations and the Data Big Crunch.”
Registration is now open for the Focused Technical Workshop on Improving Data Mobility & Management for International Climate Science to be held July 14 – 16 in Boulder, Colo. The workshop is part of a series sponsored by ESnet and Internet2 and is co-sponsored by Indiana University, the National Center for Atmospheric Research (NCAR) and the National Oceanic and Atmospheric Administration (NOAA).
The workshop, hosted by NOAA, brings together network experts with scientists in the domain of international climate sciences to discuss their most pressing network-related issues and requirements. The format is designed to encourage lively, interactive discussions with the goal of developing a set of tangible next steps for supporting this data-intensive science community.