Introduction to Sky Survey Problems Bob Mann

advertisement
Introduction to Sky Survey
Problems
Bob Mann
Introduction to sky survey
database problems
Astronomical data
Astronomical databases
– The Virtual Observatory – concept & status
– Large sky survey databases
Spatial indexing in astronomical databases
Case Study: SDSS & SkyServer
Observational Astronomy
Electromagnetic spectrum
ROSAT ~keV DSS Optical 2MASS 2µ IRAS 25µ IRAS 100µ
GB 6cm NVSS 20cm WENSS 92cm
Astronomical data – in original form
Optical
– Image: array of pixel values
X-ray
– Event list: positions, arrival times, energies
of all detected photons
Radio
– Interferometric visibilities: sparse Fourier
transform of a region of the sky
Very different types of data
Astronomical data – in final form
Most research done using catalogue data
– i.e. tables of attributes of detected sources –
mainly discrete sources (stars, galaxies, etc)
– Data compression
Catalogue - few% of image data volume
– Amenable to representation in relational DB
Natural indexing by location in sky
Astronomical Databases
Sky survey archives
– Homogeneous data, standard reduction pipeline
– “Science Archive” – do science on DB
Telescope archives
– Semi-indexed collections of raw data files from all
observations taken – heterogeneous
– Download data for reduction and analysis
Specialist data centres – collections of catalogues
Bibliographic databases– scans of major journals
The Virtual Observatory
Concept:
– Interoperable federation of all the world’s
significant astronomical databases
– Facilitate multi-wavelength astronomy
Status:
– Several projects underway – AstroGrid in UK
– 5+ years’ work to create a fully working VO
The VO sets the context for the design of
new sky survey databases
AstroGrid: www.astrogrid.org
Consortium:
– Edinburgh, Leicester, Cambridge, RAL, MSSL,
Jodrell Bank, Queens Belfast
3 year (~£4M) project:
– 1 yr Phase A Study – finished end of 2002
– 2 yr Phase B Implementation – to end 2004
Web (later Grid) service framework; in Java
Currently building web services, portals, etc
- researching OGSA and OGSA-DAI
Large sky survey databases
Major science driver for AstroGrid – and VO
– New science – mining multi-wavelength data
Largest are optical/near-infrared sky surveys
Largest of these hosted in Edinburgh:
– current - SuperCOSMOS, SDSS (mirror)
– future - WFCAM, VISTA
– Each yield 1-10TB of catalogue data in RDBMS
Spatial queries in astronomy
Two important types:
– Select entries (with predicate) in area of sky
– Match entries (esp. between two tables)
Second is special case of first
– i.e. both boil down to “point-within-distance-ofpoint”
– but distances in two cases can be very different
Advantage in using a hierarchical spatial
indexing scheme
– Perform spatial query at appropriate granularity
Spatial Indexing
in Astronomy
The Celestial Sphere
Many coordinate systems
Most common is the
equatorial system, with
Right Ascension and
Declination as analogues
of Longitude & Latitude
Spatial indexing in
astronomical databases
Basic DBMS indexes are 1-D – e.g. B-trees
Some DBMSs support general 2-D indexing
– Usually using R-trees (or variants) – rectangles:
astronomical experiments not too successful: [Clive]
Some DBMSs have native spatial indexing
– Little knowledge of this in astronomy - want to know more
But
The Celestial Sphere is a sphere(!)
– Many geographical spatial DBs use planar projections
So, astronomers have felt the need to develop
spatial indexing prescriptions of their own
Hierarchical Triangular Mesh HTM
Developed by Sloan survey archive team at JHU
Start with projection of octahedron on sphere
and subdivide triangles at their midpoints
Generate unique pixel ID code based on
position in the sky and level in hierarchy
– can index that with B-tree
Hierarchical Equal Area IsoLatitude Pixelisation (HEALPix)
Developed by Kris Gorski (now JPL/Caltech)
Start with division of sphere into twelve equal area
curvilinear quadrilaterals,
then divide each into four
Like HTM, produces a
pixel code on which a
B-tree index can be made
(Ian – HEALPix in Oracle?)
Sky survey DB case study:
SkyServer for SDSS
Sloan Digital Sky Survey (SDSS):
– first of new generation of sky surveys
US-led team, dedicated telescope & camera
Image half of northern sky in 5 optical bands
Then obtain optical
spectra for 1,000,000
galaxies
Estimated ~1TB of
catalogue data
SDSS Archive
First of new generation of sky survey archives
– Represents the state-of-the-art in sky survey databases
Developed by Alex Szalay’s team at Johns Hopkins
Project started in earnest in about 1996
– OODBMSs seen as the coming thing
– SDSS chose Objectivity/DB for their archive:
~15 staff-years of effort later, they’d rewritten
much of the DBMS themselves…and then jumped
ship and started using MS SQL Server! SkyServer (in collaboration with Jim Gray, MS
Research)
SkyServer design considerations
Power & flexibility to pose arbitrary queries
Simple – astronomers ignorant of SQL!
Hide messy spherical trigonometry
– Distance on sphere between (a1,d1) and (a2,d2) is
given in SQL by
2.0*asin(sqrt(square(sin(0.5*(radians(d1-d2)))) +
cos(radians(d1))*cos(radians(d2))*
square(sin(0.5*(radians(a1-a2)))))
– Don’t want users typing this
– Don’t really want DBMS to evaluate expressions like
this often
SkyServer spatial queries
Simple table-valued functions exposed to user:
– E.g. select count(*)
from fGetNearbyObjEq(a,d,radius)
(a,d)=(Right Ascension, Declination)
Functions call SQL Server Extended Stored
Procedure
– HTM index manipulation routines, implemented in a
Dynamically Linked Library (DLL)
– DLL generated from HTM package in C++
Lessons from HTM
implementation in SkyServer
SQL is not great for spherical trigonometry
– Messy to write, slow to compute
Have to define stored procedures/functions
– Expose a clean interface to users
– Let them pose queries the way they want to
Replace trig operations by integer arithmetic
– Library of HTM index operations underneath
Precompute tables of neighbouring objects
– Far fewer spatial match operations at query time
Problems with this approach
How easy to develop stored procedures, etc?
– Needs detailed knowledge of DBMS
– Extended Stored Procedure calls slow
How well will query optimiser use HTM?
– …less well than built-in spatial index?…
…but that might be poorly suited to astronomical
applications…
How easy to implement all this in DBMSs
other than SQL Server?
But this works reasonably well in practice!
Download