Observational Astrophysics Data Management Issues: Wide-field Time-Domain Surveys Kem Cook Lawrence Livermore National Laboratory and National Optical Astronomy Observatory For the LSST Collaboration KHC SLAC March 16, 2004 Wide-field Surveys • CCD technology advances have resulted in 64 Mpixel arrays common and a few 300+Mpixel arrays • Modern precision optics producing well corrected 1+ square degree fields • Super Novae, microlensing, Kuiper Belt Objects, Potentially Hazardous Asteroids: all require wide field surveys, but in the time domain • Huge focal planes, dedicated surveys yield huge data sets Microlensing (MACHO, OGLE, EROS) > 20 Tbyte 2MASS 14 Tbyte (little time resolution) Sloan Digital Sky Survey 20 Tbyte (little time resolution) 2 • Some surveys need area and sacrifice time—maybe they don’t need to! KHC SLAC March 16, 2004 Why focus on Wide-field, Timedomain surveys? Endorsed by three separate National Academy reports Astronomy & Astrophysics in the New Millenium New Frontiers in the Solar System Connecting Quarks with the Cosmos. 3 KHC SLAC March 16, 2004 The Large Synoptic Survey Telescope – Massively Parallel Astronomy Survey the entire sky every 2-3 nights, to simultaneously detect: Potentially hazardous near earth asteroids Tracers of the formation of the solar system Fireworks in the heavens – GRBs, quasars… Periodic and transient phenomena Dark Matter via Weak gravitational lensing Thousands of supernovae per year, in multiple passbands ...... 4 KHC SLAC March 16, 2004 LSST Requirements In order to cover 15,000 square degrees in 2 days with two visits requires: reach 24 V mag in <20 sec over ~5-10 sq deg field 100x current capabilities 5 KHC SLAC March 16, 2004 LSST static survey of 10,000 sq deg • stellar population studies • a billion galaxies • 100,000 mass clusters • large scale mass structures • physics of dark energy 6 KHC SLAC March 16, 2004 LSST will probe the time domain. • LSST will discover ~ 200,000 supernovae that will trace departures from the smooth Hubble flow at small redshift and measure cosmological parameters at large redshift. • LSST will enable photometric monitoring of 100 million stars to detect extrasolar planets by transits and microlensing. • LSST will measure proper motions of millions of near-by stars and parallax distance determinations to about 1 Kpc • LSST will probe the unknown variable universe (GRBs, …) LSST performs massively parallel science. 7 KHC SLAC March 16, 2004 LSST Planetary Reach Increase inventory of solar system x100 10,000 NEAs, 90% complete >250m Over 10 million MBAs Cometary nucleii >15km @ Saturn Extend size-n of comets to <100m TNOs beyond 100AU rare new objects 8 KHC SLAC March 16, 2004 Science requirements for speed • Near-Earth asteroids: ~10 sec exposures, due to trailing loss & matching confusion. • Optical bursters: <20 sec, due to sources. • Weak lensing: <20 sec, for control of PSF systematics. • Photometry: 1-20 sec exposures for transfer standards and dynamic range. 9 KHC SLAC March 16, 2004 Relative survey power 160 Time (x10) Stellar Galactic (x2) Figure of Merit 120 80 40 0 LSST SNAP PanSubaru STARRS CFHT SDSS MMT 10 KHC SLAC March 16, 2004 How to effect the LSST Basic idea: 8.4m diameter primary mirror Survey entire sky every few nights Subtract images to find variability 7 square degree field of view > 12 Terabytes of data per night Real-time analysis Add images to go deeper 11 First light in 2011 KHC SLAC March 16, 2004 LSST Focal Plane • 55cm diameter • 2.3GPixels on 10 micron centers • 10 sec exposures and 2 sec readout • 1 Mpix/sec max read rate for low noise • Array of CCD or hybrid CMOS 12 KHC SLAC March 16, 2004 LSST Data Rates • 2.3 billion pixels read out in less than 2 sec, every 12 sec • 1 pixel = 2 Bytes (raw) • Over 3 GBytes/sec peak raw data from camera • Real-time processing and transient detection: < 10 sec • Dynamic range: 4 Bytes / pixel • > 0.6 GB/sec average in pipeline • 5000 floating point operations per pixel • 2 TFlop/s average, 9 TFlop/s peak 13 KHC SLAC March 16, 2004 Large Synoptic Survey Telescope Information Flow Diagram LSST Image Pipeline Photometrically & Astrometrically Calibrated Template Register and Convolve Science Goals Sky Optics Detectors Camera Controller Flatten Crosstalk, Linearity, etc. Diagnostics (CTE, noise) Camera QA CTE, noise... Calibration Library Image Archive Image QA Source Detection Astrometry WCS update Image Subtraction Difference Object Detection Astrometrically Calibrated Flatfielded Image Known Variable Table Photometry Pipeline Classifier Image coaddition Known? Scheduler Telescope Controller Astrometric Catalog Table Image QA Source and noise insertion for detection efficiency determination Fake object table Photometric Standards Calculate Phot. Calib. Coeff’s Phot. Calibration Table N Co-added Image Archive Analysis Packages 1-N QA Alert Table Detections Table Alert Protocol Source insertion for detection efficiency determination Fake Object Table Aggregation Utility Match and Orbits Table 14 KHC SLAC March 16, 2004 What comes out of the pipelines? • 4 Pbyte images, 4 Tbyte of DB per year • 100 billion objects with about 70-100 attributes for each epoch • Alerts for transient sources • Orbits for moving objects Access spans need for single data point to correlations across the whole data set! 15 KHC SLAC March 16, 2004 LSST Data Processing Infrastructure – circa 2012 Archive Campus – LSST Science Center Archive Computing Center 2 PB disk 20 PB tape 20+ TFLOP 103 km 4 Gb/sec GRID, Internet 2 Portal Portal Users Users Mirror Sites 103 km 4 Gb/sec GRID, Internet 2 Telescope site (20TB/night) Real-time processing Telescope Site Portal User 10 TB disk 1 TFLOP Focal plane 100 TB disk 10 TFLOP Alert Processing Center 1-102 km 10 Gb/sec 10-1 kM 20 Gb/sec (2.3GPixels@2B/Pixel in 2 Sec) 16 KHC SLAC March 16, 2004 Software Rate limiting aspect of many past projects Budget-busting potential Has traditionally been written by practicing scientists Management challenge is to capture best of this Optimize interaction between astron. & s/w professionals Need informed choices as project scope evolves HEP has experience with this type of project Typically evolves over the project Must be able to re-reduce data with new algorithms Results change with re-processing The developers are best able to maintain the code 17 KHC SLAC March 16, 2004 LSST Data Management • LSST goal is to provide useful, timely and accurate data to a diversity of communities: • • • • Scientists Data archive engineers Educators and their students General public • Analysis software, data management and access tools are major challenges • Data management system will be developed and deployed in a distributed environment 18 KHC SLAC March 16, 2004 Data management challenges Analysis Pipelines Database Access Change detection Aggregation of “detections” into “objects” Useful and effective data access Schema Fusion with other data sets Classification of variability Optimal co-adding Calibrations Optimal indexing Data pedigree Scope NVO interface Detection Efficiencies 19 KHC SLAC March 16, 2004 Example Algorithm Challenge • LSST will provide detections of all moving objects across the entire sky • These need to be sorted and “aggregated” into objects with orbits... Nobservations * Nobjects is a big number! Operationally, how do you execute an SQL query that knows about Kepler’s laws? 20 KHC SLAC March 16, 2004 How does one find unexpected correlations in space and time MACHO found an impossibly large number of low signal-to-noise microlensing events clustered in a small region of the sky using object based photometry Concentric The events rings of were ‘microlensing’ clustered in centered onof a region SN1987a! nebulosity MACHO —star discovered forming the light echo. activity? 21 KHC SLAC March 16, 2004 SN 1987a light echo via Image Subtraction 22 KHC SLAC March 16, 2004 Current LSST precursor projects for developing data management tools ESSENCE supernova survey SuperMacho microlensing survey Deep Lens Survey 4 meter telescopes ½ sq deg cameras Real-time analysis LSST prototyping Throughput AW=4 m2 deg2 LSST will capture lessons from these. 23 KHC SLAC March 16, 2004 What is needed in the next 5 years? • New bridge projects to stress data management system development Current precursor projects end in 2 years Need scientific participation for developers • Need to capture HEP experience in distributed software development • Efficient parallel file systems • Fault tolerant real-time systems for alerts • Efficient database algorithms for space-time queries • Orbit calculation center with capability to handle millions of orbit calculations • Change in sociology to data mining rather than observing 24 KHC SLAC March 16, 2004 LSST is a new approach for Astronomy • Genuinely public access through multipurpose data set -- no proprietary data period whatsoever • Merges 3 enabling technologies • Private-Public (Multi-agency) partnership • Widely distributed development • Parallel execution of multiple compelling science programs, from solar system to cosmology 25 KHC SLAC March 16, 2004 26 KHC SLAC March 16, 2004