Astronomical data curation and the Wide-Field Astronomy Unit Bob Mann Wide-Field Astronomy Unit

advertisement
Astronomical data curation and
the Wide-Field Astronomy Unit
Bob Mann
Wide-Field Astronomy Unit
Institute for Astronomy
School of Physics
University of Edinburgh
(rgm@roe.ac.uk)
Outline
 Who we are
Introduction to the Wide-Field Astronomy Unit
 What we do
Sky survey data curation: past, present and future
Data curation and the Virtual Observatory
 What we could do with you
What WFAU could do for the DCC
What the DCC could do for WFAU
Questions
2/15
Outline
 Who we are
 Introduction to the Wide-Field Astronomy Unit
 What we do
Sky survey data curation: past, present and future
Data curation and the Virtual Observatory
 What we could do with you
What WFAU could do for the DCC
What the DCC could do for WFAU
Questions
3/15
Wide-Field Astronomy Unit
 Funded to curate optical and near-infrared sky
survey data for UK (and European) community
 Based at Royal Observatory Edinburgh
 ~35 years of sky survey data curation at ROE
 Evolving data holdings:
 Photographic plates
 Digital scans of photographic plates
 Born-digital data
 WFAU formed in 1999: group moved into UoE
 Currently 12 grant-funded + 2 academic staff
 Mix of astronomers, IT professionals & hybrids
4/15
Outline
 Who we are
Introduction to the Wide-Field Astronomy Unit
 What we do
 Sky survey data curation: past, present and future
 Data curation and the Virtual Observatory
 What we could do with you
What WFAU could do for the DCC
What the DCC could do for WFAU
Questions
5/15
Sky survey data life-cycle: e.g. WFCAM
 Images taken at telescope
 UKIRT, in Hawaii
 Data reduction pipeline run in Cambridge
 Removes instrumental signatures
 Produces final, clean images
 Detects and characterises sources in images
On per
night
basis
 Data transferred to Edinburgh
 Ingest source catalogues and image metadata into
relational database, store image files on disk
 Combine data from multiple nights: new images, cats.
 Publish release databases via web interface
6/15
WFAU’s main survey archives
 Past: SuperCOSMOS
 Based on digital scans of photographic plates
 Database: ~5TB: largest tables ~109 rows
 Images: ~35,000 user requests (10GB) per month
 Present (2005-2012): WFCAM
 Near-infrared: ~700 registered users
 ~500 million rows of database results per month
 ~125GB of flat file image data per month
 Near-future (2008-2020): VISTA
 ~3 x data rates/volume of WFCAM
7/15
WFAU’s future plans
 Large Synoptic Survey Telescope
 US-led public/private project
 We’re trying to get UK to buy into it
 Data challenges immense
 WFCAM takes ~20TB of image data per year
 LSST will take ~20TB of image data per night:
~60PB images, ~8PB database (2016-2025)
 LSST stimulating a lot of data management R&D
in the US:
 Commercial: Google
 Academic: “Sci-DB” (M. Stonebraker, D. DeWitt)
8/15
The Virtual Observatory
 Goal: an interoperable federation of all the
world’s astronomical data resources
 International Virtual Observatory Alliance
 Coordinates VO development worldwide
 Acts as W3C-like standards body for the VO
 AstroGrid:
 Only project to have developed a full VO system
9/15
Virtual Observatory components
 Registry
 Metadata for all data published to the VO
 Standard data access protocols
 For tabular data, images, spectra, time series, etc
 Standard web service wrappers for application code
 Enabling asynchronous calls, workflow, etc
 Distributed data storage system
 Presenting transparent aggregated logical view to user
10/15
Curation challenges for WFAU
 More data analysis services in the data centre
 Data volumes too large for user download
 WFAU must provide data analysis services & hardware
 Integration of data and knowledge
 Third-party annotations which can be used in queries
 “Object X in database Y is a quasar”
 “X-ray source A is the same object as radio source B”
 Better linkage between archives and online literature
 Keeping staff up to date on technologies/techniques
 Mostly learn by doing – do we make best choices?
11/15
Outline
 Who we are
Introduction to the Wide-Field Astronomy Unit
 What we do
Sky survey data curation: past, present and future
Data curation and the Virtual Observatory
 What we could do with you
 What WFAU could do for the DCC
 What the DCC could do for WFAU
 Questions
12/15
WFAU and DCC:
What we can do for you
 Case studies, exemplars, etc
 WFAU is a well-established, competent group
 Astronomy is a relatively small, cohesive community,
used to interdisciplinary collaboration
 Astronomers are early adopters of IT and recognise
value of data curation
 VO is a rich, functional e-Science infrastructure
 Collaborations to date:
 Raj Bose – distributed annotation service
 James Cheney – paper on data centre security
13/15
WFAU and DCC:
What you can do for us
 Policy advice
 Increasingly need to convince research councils of
benefits of long term data curation – cost/benefit
 Technical advice – from DCC or its Associates
 Should we use iRODS for LSST?
 Do any XML databases have decent performance?
 Do the VO metadata standards make sense?
 Curation manual
 When will the rest appear?
 Training
 e.g. NeSC course on relational database design
14/15
WFAU and DCC:
Questions
 What is the DCC’s model for collaboration?
 Can’t collaborate with everyone on everything
 Scientists & digital librarians live in different worlds:
how do you bridge that divide?
 Interdisciplinary work requires sustained interaction
 What do you want from scientific data curators?
 What can you offer us in return?
 Few of my colleagues know anything about the DCC
 Does that surprise you?
15/15
Download