Astro-DISC: Astronomy and cosmology applications of distributed super computing Purpose A toolkit for massive astronomical and cosmological computations on large clusters, which will include software tools and algorithmic methods. Universe The visible universe is large: • 28 billion light years • 10 million galaxy superclusters • 25 billion galaxy clusters • 350 billion large galaxies • 7 trillion dwarf galaxies • 30 sextillion (3 ∙ 1022) stars Data sets Sky surveys: Sloan Digitalof Skyobjects Survey (2000–2008): •• Billions 230 million objects, 50 TByte (stars, galaxies, …) • Pan-STARRS (started in 2008): Half-order of magnitude • Multiple imageslarger than Sloan • Large Synoptic Survey Telescope (2016): of the same object Order of magnitude larger than Sloan Simulations: McWilliamsof Center at CMU: •• Billions objects Black holes and dark matter, • Multiple runs 15B particles, 14 TByte / run •• Multiple time points LANL Coyote universe: 1B particles, 1 TByte / run, 30 runs • Many other projects Data sets The sizes of modern survey and simulation datasets are between 1 and 100 billion objects. Even larger sets are coming soon. Their analysis requires distributed computing. Astronomers vs. computer scientists DISC Cloud cluster at Carnegie Mellon Sixty-four nodes Each node: • Eight 2.83GHz cores • Four 1TB disks • 16 GByte memory 10GBit / second network Specific problems • Friends of Friends: Identification of galaxy clusters • Correlation functions: Analyzing distribution of distances between galaxies • Spatial matching: Identification of observed objects in the catalog 40% 60% distance observed object catalog More problems • Quasar detection: Identifying quasars based on the five passband fluxes • Particle history: Tracking the history of particles in astrophysics simulations Future work • Distributed computation for other standard astronomy problems: Density distribution, photometric calibration, asteroid detection,... • General-purpose astronomy toolkit: Massive spatial indices of celestial objects, integrated with distributed algorithms