Next Generation Sky Surveys: Astronomical Opportunities and Computational Challenges test Bob Mann Wide-Field Astronomy Unit School of Physics & Astronomy University of Edinburgh 1 Outline Survey Astronomy 101 Next Generation Sky Surveys Astronomical Opportunities Computational Challenges eSI Theme Summary and Conclusions 2/30 Observational astronomy 3/30 Observational astronomy Old Style New Style Many small programmes Target specific objects Manual data reduction Few large surveys Map large areas of sky Automated pipelines Data ends up in Data ends up in astronomer’s desk drawer queryable database Cold nights in the dome Days at the computer4/30 What is driving these changes? Policy: “common user instruments” Software & archive part of instrument project Economics: More science per night of telescope time Technology: Detectors capable of higher throughput IT can handle the resultant higher data rates 5/30 How big is a sky survey dataset? 1. How big is the sky? dΩ=sinθ dθ dφ ∫dΩ= 4π steradians = 4π (180/π)2 square degrees = 41,253 square degrees c.f. area of full moon ~ 0.2 square degrees 6/30 How big is a sky survey dataset? 2. How detailed a map? Resolution of ground-based images limited by “seeing” “Point-source” disk ~ 0.5 arcsec (1 arcsec= 1/3600th of a degree) Sample images adequately few 100 million pixels per square degree (i.e. cover full moon with few 10s of million pixels) 7/30 How big is a sky survey dataset? 3. How much storage? 2-4 Bytes per pixel adequate for dynamic range Full sky image map: few x 10 TB Catalogue ~10% of image size Full sky catalogue: few TB 8/30 Comparing survey systems Figure-of-merit: étendue = A x Ω Field of View Area of telescope primary mirror Quantifies speed to map a given area of sky to a given depth under fixed observing conditions Conventional optics: A Ω 9/30 Three generations of sky surveys 1. The Photographic Era: 1950-2000 Schmidt Telescope Digitisation SuperCOSMOS: 1 plate = 2GB image, 105-106 objects Hubble Guide Star Catalog Ω: huge 2,500 SuperCOSMOS A: per modest requests day 10/30 Three generations of sky surveys 2. First Born-Digital Era: 1995-2015 1997-2001: 2000-2014: 2005-2012: 2009-2015: 2MASS (near-IR) SDSS (optical) UKIDSS (near-IR) VISTA (near-IR) Smaller AΩ than Schmidts, but digital detectors much more sensitive that photographic emulsions 11/30 Three generations of sky surveys 3. Synoptic surveys: 2009-2030 Map observable sky every few nights: huge AΩ 320 Etendue (m 2 deg 2) 280 Pan-STARRS: PS1-2009; PS2-2012 LSST Mass production of detectors: 240 PS4-2015?; PS16-?? can200afford to cover large Ω LSST: 2017-2027 160 Pan-STARRS: 1.4 PS4 Gigapixel camera LSST: 3.2 Gigapixel camera 80 120 PS1 SDSS 40 World’s largest camera in civilian use 0 LSST PS4 PS1 Subaru CFHT SDSS MMT DES x0.3 VISTA 4m VST VISTA IR SNAP x2 12/30 Three generations of sky surveys: Data Volumes Schmidt surveys: ~60 years of observing time ~10 years of digitisation by SuperCOSMOS ~20TB of image data VISTA: ~20TB of image data per year LSST: ~20TB of image data per night for a decade How come? - “Full sky image map: few x 10 TB” 13/30 Astronomical discovery space Area Temporal Resolution Polarization Wavelength Angular Resolution Depth Different science goals require coverage Surveys covering a larger region of this of different regionsmore of this space goals 14/30 space can address science Examples Area LSST Temporal Resolution Wavelength Depth Area Gaia Area Euclid LSST: Gaia: Euclid: • Five • Wide optical wavelength bands coverage • Large area • Large • Large area area Angular • Good image quality • Deep • Good positional Resolution accuracy •~1000 •~100visits visitsper perfield field Temporal Resolution Wavelength Angular Resolution 15/30 Summary of Survey Astronomy 101 Systematic survey astronomy > 50 years old UK world-leaders throughout this history Progress through advances in detector technology Photographic Digital Cheap(er) Digital Multi-dimensional discovery space Specific science goals target specific regions of it High-grasp telescopes cover greater volume: more science Data volumes increasing dramatically Importance of computation increasing as a result 16/30 Outline Survey Astronomy 101 Next Generation Sky Surveys Astronomical Opportunities Computational Challenges eSI Theme Summary and Conclusions 17/30 Next Generation Sky Surveys Ground-based Pan-STARRS: PS1, PS2, PS4, … Dark Energy Survey LSST Space-based Gaia Euclid All large international projects UK share in each would be 10s of £M Can we afford a significant role in all of them? 18/30 Outline Survey Astronomy 101 Next Generation Sky Surveys Astronomical Opportunities Computational Challenges Illustrate with LSST eSI Theme Summary and Conclusions 19/30 Astronomical Opportunities Survey science is statistical in nature Describing properties of populations Need e.g. clustering of galaxies stellar populations within galaxies large samples Detecting outliers from those populations e.g. very distant quasars very low mass stars Rare Need to sample large volume 20/30 Science with LSST Four themes Probing dark energy & dark matter Taking an inventory of the solar system Exploring the transient optical sky Mapping the Milky Way Quantity scientific goals from themes Parameterize survey system Mirror size, pixel scale, cadence of observations Optimise system parameters Ivezic et al: http://arxiv.org/abs/0805.2366 21/30 LSST: opportunities & challenges Opportunities Challenges ~60PB of image data ~6PB of catalogue How to ship and store all the data? How to keep up with data processing? How to find transients in real-time? How to provide data to user community? How to recognise new classes of variable? 22/30 Catalogue will contain 10 billion stars 10 billion galaxies 1 million supernovae 5 million asteroids New phenomenae! Example challenges: 1. Data Management Users will want to analyse subsets of LSST data that are too large to download Must run data analysis code at the data centre Relational model doesn’t support all sorts of astronomical analysis well “SciDB”: generalisation of relational model based on multidimensional arrays Better coupling to analysis code 23/30 Example challenges: 2. Data Analysis Many classes of transient require rapid follow-up observations for identification Requirement: issue alerts for transient discovery within 1 minute of observation being made High-performance data reduction system - both hardware and software: ~2TB/hour data rate Real-time pattern-matching algorithms, yielding few false positives 24/30 It’s clear that astronomers need interaction with computer scientists, but is the converse true? 25/30 Jim Gray’s answer 26/30 Outline Survey Astronomy 101 Next Generation Sky Surveys Astronomical Opportunities Computational Challenges eSI Theme Summary and Conclusions 27/30 Three strands Scientific Prioritisation We can’t afford significant roles in all surveys: which should we go for? Data Management Can we retain the RDBMS-based approach we’re used to? – or do we need “Sci-DB”? Data Analysis Can we produce scalable algorithms for the kinds of analysis we want to run? 28/30 Goals of the theme Prepare a Road Map for future survey astronomy in the UK for STFC Identify those computational topics where further R&D is required Engage computer science community in addressing those problems 29/30 Summary & Conclusions Next generation of sky surveys are different in kind Enabling new kinds of science – time domain Requiring new computational techniques To prepare for them the astronomy community must Agree on its priorities amongst them Assess the feasibility of the desired options Identify the problems needing additional R&D Engage the computer science community in solving them This Theme should make progress on all these Many thanks to eSI for giving us the opportunity to do so! 30/30