Sky Survey Data Management Bob Mann Wide-Field Astronomy Unit University of Edinburgh 1 Logistics Introductions Wireless Toilets Dinner Tea, coffee & lunch breaks Presentations 2/11 e-Science Institute Mission “To facilitate the e-Science community” First phase: community building Training events: lectures and hands-on Now: supporting the community Focus on longer-term issues – esp. research Themes – series of connected workshops 3/11 Theme Programme Kick-Off Meeting The Transient Sky Sky Survey Data Management Future Directions of N-Body Simulations in Cosmology Sky surveys and data mining Weak lensing shape measurement 4/11 Outcomes from Theme Presentations on the web New understanding & collaborations Special issue of New Astronomy Reviews ~6-8 mini-review articles of ~10-15 pages each Targeting major topics addressed during Theme Who’d like to co-author one on data management? 5/11 Motivations for this workshop Next generation of sky surveys will be different qualitatively, as well as quantitatively “SDSS model” is being stretched & must break SDSS model (also GALEX, UKIDSS, VISTA, PS1?,…) Static catalogues in relational database on single server Data accessed via webforms: VO access available, but less well used, as less useful Most users download data to desktop for analysis Large statistical analyses done with downloaded copies But, data volumes will soon be too great for this… (when? – VISTA and PS1 mark the boundary) 6/11 What will be different? Network speed not keeping up with data volumes At some point people will stop being able to download the size of dataset they want to work with Analysis code must be run at the data centre Catalogues too large for a single-server RDBMS Partitioning databases over a cluster poses new problems: not all RDBMS do this well Science drivers changing Time domain – enabled by great increased in étendue Weak lensing – does this fit within standard pipeline? 7/11 Data management issues Do we stick with standard RDBMSs? Are there better technologies? How do we support new science drivers? What are their requirements in detail? What role does the VO play in all this? What about forthcoming radio surveys? 8/11 Workshop Programme Three components The state of the art What we do now – and where it’s starting to fail Planning for future sky surveys What we’ll have to do & where the problems lie Enabling technologies What new technologies and techniques might help (Practical constraints have messed up order somewhat) 9/11 Data management issues Do we stick with standard RDBMSs? Are there betterVDFS technologies? – Nigel Hambly Astro-Wise – Gijs Verdoes Kleijn MonetDB – Martin Kersten How do we support science drivers? SDSSnew – Ani Thakar SciDB – Jacek Becla What are their requirements in detail? Hadoop – Miles Obsorne PS1 Transients – Ken Smith What role does the VO play in all this? Euclid – Tom Kitching Virtual Observatory – Keith Noddle LSST – Timstatus Axelrod What about forthcoming radio surveys? ASKAP and SKA – Kevin Vinsen LOFAR – Michael Wise 10/11 Workshop programme Plenty of time for discussion 45 min slots, but speakers aiming for ~30-35 All tomorrow afternoon for discussion Discuss as we go along and identify topics to discuss at greater length tomorrow With an eye to New Astronomy Reviews paper 11/11