Overview of SKA & ASKAP Sky Survey Data Management Workshop Research A/Prof Kevin Vinsen 1 1E+12! 1E+11! 1E+10! 1E+09! 100000000! 10000000! 1000000! 100000! 10000! 1000! 100! Numbers/! night! 1500 1600 1700 1800 Year! 1900 2000 Doubling time < 1 year 3 distance 10 1 0 102 =100 x bigger =1,000,000 square m =1 square km 10000 square m 1000 square m Who will build the SKA? 20 Countries working together in a Global Project Cost = $3 billion AUD Adapted from Quinn 5 Where to build it? ! Away from radio interference (cities, towns and people!) ! Flat open spaces for 10s - 1000s km ! Dry and geological stable ! Good global location for astronomy ! Access to high technology industry and infrastructure ! Access to a technical and scientific community ! Stable economy and government Southern Africa Western Australia Adapted from Quinn 6 South Africa & Australia Kenya Zaire Mozambique Namibia Botswana Mauritius Madagascar South Africa Final international decision on site by 2012 How quiet do we need to be? Energy of a falling snowflake < 30 microjoules Energy collected by ALL radio telescopes, ever, less than a falling snowflake Murchison Radio-astronomy Observatory (MRO) Murchison Radio-astronomy Observatory (MRO) Video courtesy of CSIRO 13 An illustration of the speed of the Pathfinder • Ilana’s image of Centaurus • Required 1200 hours observing on the Australia Telescope Compact Array in Narrabri • The Pathfinder will take about 10 minutes Adapted from Cornwell 16 ASKAP A1 17 SKA Facts • The SKA could detect airport radars on planets 50 light years away • The dishes of the SKA will produce 20 times the current global internet traffic • The aperture arrays in the SKA will produce 250 times the current global Internet traffic. • If the raw data produced by SKA were saved it would require about one thousand million 1Gb memory sticks per day. • SKA processing power is equivalent to 1 billion top range PCs 18 Petascale Data Flow ASKAP 70 Tb/s, MWA 320 Gb/s, SKA 7Pb/s ASKAP 1.5 PF/s, SKA >1 EF/s ASKAP 5GB/s, MWA 8 GB/s, SKA 300 GB/s - 1PB/s ASKAP 13 PB, SKA 800 - 8000 PB ASKAP 100 TF/s, SKA 30 PF/s ASKAP & MWA 6 TB/day, SKA 900 TB/day ASKAP 100 TF/s, SKA 30 - 300 PF/s ASKAP 3 PB/year, SKA 18 - 180 PB/year CSIRO ICRAR Adapted from Cornwell & Quinn 19 SKA ICT Challenges Adapted from Cornwell 20 Grand Challenge Space 1020 b/s SKA ASKAP 0 0 0 1020 FLOPS 1010 Google 1020 bytes The unique SKA challenge Sum of all PCs LSST LHC Bandwidth • SKA Pathfinder Cubes ~ 5 TB which implies 500 sec read time at 10GB/sec • typical survey consists of ~1000 cubes = 5-6 days read time • would like 100-1000 GB/sec for on-demand processing single cubes and cube groups Bandwidth • SKA Pathfinder Cubes ~ 36 TB which implies 3600 sec read time at 10GB/sec • typical survey consists of ~1000 cubes = 42 days read time • would like 100-1000 GB/sec for on-demand processing single cubes and cube groups ≈500 billion Galaxies 23 Data Storage • Database Structure and Performance • LSST project • 1,000 Billion Objects, 1012 rows • Map-reduce, Big table (Google), SciDB, share-nothing open source SQL • SKA pathfinders ~10 million objects • SDSS project • Balance of compute power and query complexity (Amdahl number) Graywulf Project 24 Powering the beast • Storage power consumption 2-5 Megawatt/Exabyte • Computing power consumption 350 MW/Exaflop • SKA Central Processing ~ 50MW = $150 million /year • Need GREEN solutions and better power efficiency 25 Making discoveries • SKA data will need world-leading (unique) processing and storage facilities • These facilities will not be replicated and will need to be accessed and used remotely • “Move only what you need” • process and query remotely and move the results - not the raw data Discovery fabric • Joining of HPC, high Long Term Storage performance storage and Database Technology = REAL TIME Resource • Remote query and data discovery • Virtual Observatory developments and standards are critical High Availability Storage/DB On demand processing VO Services and Query Interface The EXA-Scale SKA Challenges • Gathering, storing and processing at the EXA scale = SPECIAL FACILITIES in the 2020s • Accessing and mining the data needs new DB approaches and the marriage of DB, storage and HPC • Access from “home” means the VO and resource hierarchies • We need to do better on powering the beasts and become greener Producing the current annual data product of all mankind in one day My thanks to the following: • ICRAR – Prof Peter Quinn – Dr Chris Harris – Prof Andreas Wicenec – Pete Wheeler • CSIRO – Tim Cornwell – Ben Humphries • iVEC – Guy Robinson 32 Thanks for listening • Questions – I’m deaf (the hearing aids aren’t a fashion statement) & on the other side of the planet – Please speak clearly. • Kevin.Vinsen@icrar.org 35