The Virtual Observatory: Core Capabilities and Support for Statistical Analyses in Astronomy T

advertisement
THE US NATIONAL VIRTUAL OBSERVATORY
The Virtual Observatory: Core
Capabilities and Support for
Statistical Analyses in Astronomy
Robert Hanisch
US National Virtual Observatory
Space Telescope Science Institute
25 Jan 2006
1
Overview
The objective of the Virtual Observatory is to enable
new science by greatly enhancing access to data and
computing resources. The VO is intended to make it
easy to locate, retrieve, and analyze data from
archives and catalogs worldwide.
• Motivation; who’s involved; technical challenges and
approach; capabilities
• Statistics in the VO
• Future
25 Jan 2006
2
Astronomy is
facing a data avalanche
Multi-Terabyte
(soon: multiPetabyte) sky
surveys and
archives over a
broad range of
wavelengths
1 microSky (DPOSS)
Billions of sources,
hundreds of
attributes per source
1 nanoSky (HDF-S)
25 Jan 2006
3
The changing face
of observational astronomy
• Large digital sky surveys are becoming dominant source of data
in astronomy: > 100 TB, growing rapidly
– SDSS, 2MASS, DPOSS, GSC, FIRST, NVSS, RASS, IRAS,
QUEST, GALEX, SST; CMBR experiments; Microlensing
experiments; NEAT, LONEOS, and other searches for Solar system
objects
– Digital libraries: ADS, astro-ph, NED, CDS, NSSDC
– Observatory archives: HST, CXO, space and ground-based
– Future: PanSTARRS, LSST, and other synoptic surveys;
astrometric missions, GW detectors
• Data sets orders of magnitude larger, more complex, more
homogeneous than in the past
• Roughly 1 TB/Sky/band/epoch
– Human Genome is < 1 GB, Library of Congress ~ 20 TB
25 Jan 2006
4
Threads of the the VO Fabric
Data Standards
The Internet
Survey astronomy
Digital detectors
Archival Research
Moore’s Law
Multiwavelength
astrophysics
NGC3104
Temporal astronomy
25 Jan 2006
5
Who is the
National Virtual Observatory?
• US NVO development project, funded by NSF
Information Technology Program and managed by
NSF Astronomy Division, is in final year of 5-year
project
• Funding is $10M+ over the 5 years
• 17 organizations (astro, CS, IT) involved
– JHU (PI Alex Szalay), STScI, Caltech (Astronomy, IPAC,
CACR), HEASARC, SAO, NRAO, NOAO, NCSA, SDSC,
FNAL, USNO, et al.
• Collaboration with Gemini, LSST, et al.
25 Jan 2006
6
International collaboration
• NVO is co-founder of the
International Virtual Observatory Alliance
• IVOA now has 16 member projects
• Adopted a standards process based on W3C
• Forum for discussion and sharing of experience
http://ivoa.net
25 Jan 2006
7
Interoperability
VizieR: Contains more
than 4000 astronomical
catalogues consisting of
one or several tables.
Problem: as the catalogues come from
many different sources, the original
descriptions are very heterogeneous: “Give
me all tables containing the V magnitude in
the Johnson system.”
144 different names for Johnson V !
25 Jan 2006
8
Interoperability challenges
•
•
•
•
•
•
•
•
Metadata standards
Data discovery
Data requests
Data delivery
Units
Database queries
Distributed applications; web services
Authentication and authorization
25 Jan 2006
9
Architecture
Portals, User Interfaces, Tools
NVO Data Access Layer
Computational
Services
Virtual Data
HTTP, Web, & Grid Services
Registries
Data Models, UCDs, Metadata
NVO Resource
Discovery
Catalogs, Archives, Collections, Models
25 Jan 2006
10
NVO Components
• Data discovery and location
– Resource Registry: Organizations, archives, tables, databases,
services
– Footprint Services
• Data access
– Simple tables and observation logs: Cone Search
– Images: Simple Image Access Protocol (SIAP)
– Spectra: Simple Spectrum Access Protocol (SSAP)
VOTables and FITS used to exchange data throughout the VO
– Databases: SkyNode, with Astronomical Data Query Language
(ADQL)
– Transient events: VOEvent protocol
– Data models, Space-Time Coordinates (STC)
25 Jan 2006
11
NVO Components
• Distributed data storage
– VOStore, VOSpace
– Authentication and authorization
• Distributed computing
– Web services
– Grid services
– Scalability
25 Jan 2006
12
NVO Applications
•
•
•
•
•
•
•
•
•
•
Registry Interface
DataScope
Coverage Maps
Open SkyQuery
WESIX (SExtractor)
WCS Fixer
Spectrum Services
VOEvent Net
Montage mosaics
Integration with legacy software systems
25 Jan 2006
13
NVO Registry Portal
Find source catalogs, image archives, and other
astronomical resources registered with the NVO
A Registry is a distributed database of
Virtual Observatory resources:
primarily access services for catalog,
image, and spectral data, but also
descriptions of organizations and data
collections. There are several
coordinated registry implementations
that share information by harvesting
each other's resources. This registry is
at STScI in Baltimore, MD.
Searches for resources can be done by keyword, or advanced queries can be
expressed in the SQL language. The registry is open for humans through web
forms, or machines through SOAP web services.
25 Jan 2006
14
DataScope
Discover and explore data in the Virtual Observatory
Using the NVO DataScope scientists can
discover and explore hundreds of data
resources available in the Virtual
Observatory. DataScope uses the VO
registry and VO access protocols to link to
archives and catalogs around the world.
Users can immediately discover what is
known about a given region of the sky: they
can view survey images from the radio
through the X-ray, explore archived
observations from multiple archives, find
recent articles describing analysis of data in
the region, find known interesting or peculiar
objects and survey datasets that cover the region. A summary page provides a quick
précis of all of the available data. Users can download images and tables for further
analysis on their local machines, or they can go directly to a growing set of VO
enabled analysis tools, including Aladin, OASIS, VOPlot and VOStat.
25 Jan 2006
15
Open SkyQuery
Cross-match your data with numerous catalogs
OpenSkyQuery allows
you to cross-match
astronomical catalogs
and select subsets of
catalogs with a general
and powerful query
language. You can also
import a personal catalog
of objects and crossmatch it against selected
databases.
25 Jan 2006
16
Spectrum Services
Search, plot, and retrieve SDSS, 2dF, and other spectra
The Spectrum Services web site is
dedicated to spectrum related VO
services. On this site you will find
tools and tutorials on how to access
close to 500,000 spectra from the
Sloan Digital Sky Survey (SDSS DR1)
and the 2 degree Field redshift survey
(2dFGRS). The services are open to
everyone to publish their own spectra
in the same framework. Reading the
tutorials on XML Web Services, you
can learn how to integrate the 45 GB
spectrum and passband database
with your programs with few lines of
code.
25 Jan 2006
17
Web Enabled Source Identification
with Cross-Matching (WESIX)
Upload images to SExtractor and cross-correlate the
objects found with selected survey catalogs.
This NVO service does source
extraction and cross-matching
for any astrometric FITS image.
The user uploads a FITS
image, and the remote service
runs the SExtractor software for
source extraction. The
resulting catalog can be crossmatched with any of several
major surveys, and the results
returned as a VOTable. The
web page also allows use of
Aladin or VOPlot to visualize
results.
25 Jan 2006
18
Coverage Maps
View catalog
coverage maps
and source
inventories for a
position or object
of interest.
The NVO Sky Statistics
Service generates source
counts, coverage maps, and
links to downloadable data
for catalog holdings
available through the NVO
protocols, including IRSA,
NED and CDS VizieR
25 Jan 2006
19
WCS Fixer
Repair image coordinates in images with inaccurate or
misaligned coordinate systems.
25 Jan 2006
20
VOEvent Net
Explore the multiwavelength sky in
the vicinity of transient events.
25 Jan 2006
21
Montage Mosaics
Make mosaics
from 2MASS,
DPOSS, or
SDSS images.
25 Jan 2006
22
VO Tools
• VOTable display and analysis
– VOPlot, TOPCAT, Mirage
• Image display and analysis
– Aladin, OASIS
– Other standard display tools for downloaded data
• Spectrum display and analysis
– VOSpec, SpecView
25 Jan 2006
23
Statistics and the VO…
25 Jan 2006
24
From single objects…
• Observations of small, carefully selected samples
(often with a priori prejudices) of objects in one or a
few wavelength bands
25 Jan 2006
25
…to large-scale statistical studies
• Multi-wavelength data for millions of objects, allowing
us to:
– Discover significant patterns from the analysis of statistically
rich and unbiased image/catalog databases
– Understand complex astrophysical systems via confrontation
between data and sophisticated numerical simulation
25 Jan 2006
26
VO-aware statistics tools: VOPlot
25 Jan 2006
27
VO-aware statistics tools: VOPlot
25 Jan 2006
28
VO-aware statistics tools: VOPlot
25 Jan 2006
29
VO-aware statistics tools: TOPCAT
25 Jan 2006
30
VO-aware statistics tools: TOPCAT
25 Jan 2006
31
VO-aware statistics tools: TOPCAT
25 Jan 2006
32
VO-aware statistics tools: TOPCAT
25 Jan 2006
33
25 Jan 2006
34
25 Jan 2006
35
Box plots
Pairwise plots
K-means
25 Jan 2006
36
Statistics in the VO
• What more is needed?
– Scalability: Extend analysis to 10^9 measurements or
more
– Operate on data where it resides
– VOSpace
• Probabilistic cross-matching
2 


1
1
2
2
2
2
2
2

x

x

y

y

z

z


x

y

z
1








n
n
n
n
2 n
2
25 Jan 2006
37
Statistics in the VO
• Image stacking
– White et al. (2006) detect radio counterparts to SDSS
QSOs at level 30X below rms noise via median stacking of
41,000 FIRST images
25 Jan 2006
38
Next steps
• NVO Facility
– Joint NASA/NSF program, operations to begin in 2007
– Partnership between NASA data centers, major groundbased observatories, university research groups.
Need to define the requirements for VO-enabled statistical
analysis.
• VO concept is being adopted broadly
– AGU special session on VOs
– NASA solicitation for Virtual Observatories for Solar and
Space Physics Data (VxOs); AISRP support for VO
technology
25 Jan 2006
39
http://us-vo.org/
25 Jan 2006
40
Download