VIRTUAL OBSERVATORY TECHNOLOGIES Tamás Budavári / The Johns Hopkins University 7/30/2010

advertisement
VIRTUAL OBSERVATORY
TECHNOLOGIES
7/30/2010
Tamás Budavári / The Johns Hopkins University
Moore’s Law, Big Data!
Tamás Budavári
2
7/30/2010
Outline
3
Tamás Budavári

SQL for Big Data
 Computing

Database and GPU integration
 CUDA from

SQL
Data intensive Web services
 Behind

where the bytes are
the scenes
Working examples
 Sloan
Digital Sky Survey
 Virtual Observatory tools and services
7/30/2010
The Virtual Observatory
4
Tamás Budavári
“The Virtual Observatory is a framework that enables new
astronomical research by greatly enhancing access to
worldwide data and computing resources.” http://us-vo.org/




How it works
How to build it
How to use it
What’s next
7/30/2010
Hierarchy of Services
5
Tamás Budavári

Atomic services
 Access
to observations, simulations
 Access to models

Higher level services
 Combine

for more functionality
User and analysis tools
 Can
be a high level service, too
7/30/2010
Heterogeneous Datasets
6
Tamás Budavári

Blobs: images, spectra, etc...


Access, transfer
Catalogs

Fast searches, indexes
7/30/2010
Structured Query Language
7
Tamás Budavári

SQL`92 standard
 Almost
in English
SELECT <columns>
FROM <table>
WHERE <conditions>

Astronomical Data Query Language
 An
extended subset
 GIS-like spatial
7/30/2010
Structured Query Language
8
Tamás Budavári

SQL`92 standard
 Almost
in English
SELECT RA, Dec
FROM Stars
WHERE r < 15

Astronomical Data Query Language
 An
extended subset
 GIS-like spatial
7/30/2010
Joining Tables
9
Tamás Budavári

Sources in observations fields: 2 tables
SELECT f.FieldID, …
s.ObjID, s.RA, s.Dec, …
FROM Fields AS f
INNER JOIN Sources AS s
ON s.FieldID=f.FieldID
WHERE f.ExpTime > 1000
AND s.Rmag > 16
7/30/2010
Calculations in SQL
10
Tamás Budavári

Computed columns
Use J-H in SELECT and/or WHERE
 Similarly functions, e.g., POWER(10,-0.4*Rmag)


Grouping
SELECT FieldID, AVG(J), STDEV(J)
FROM Sources
GROUP BY FieldID
 Can

use for histograming, etc…
E.g., SDSS Catalog Archive here
7/30/2010
Surveys in Astronomy
11
Tamás Budavári

Sloan Digital Sky Survey 2001-2008
 8TB
Catalog Archive Server
 Custom tools and indices

Upcoming Surveys
 PanSTARRS:
100TB 2010 LSST: 1PB+ 201?
New Moore’s Law
12
Tamás Budavári


In the number of cores
Faster than ever (for now)
7/30/2010
New Programming Paradigm
13
Tamás Budavári

100s of cores – 27k parallel threads per GPU
 Running

Forget the fancy old algorithms
 Built

a billion threads a second
on wrong assumptions
Today CPU is free, RAM is slow
 GPU
has >50GB/s bandwidth
 Still difficult to occupy the cores
7/30/2010
Hybrid Architecture
14
Tamás Budavári
launch
run
sync
7/30/2010
Extending SQL Server
15
Tamás Budavári

Dedicated service for direct access
 Shared
memory IPC w/ on-the-fly data transform
IPC
SQL
7/30/2010
Extending SQL Server
16
Tamás Budavári

Dedicated service for direct access
 Shared
memory IPC w/ on-the-fly data transform
IPC
SQL
7/30/2010
Spatial Statistics
17
Tamás Budavári

Correlation functions
 From

pair-counts
State of the art
 Dual-tree

8 bins
traversal
High resolution bins?
 Just
like brute force
7/30/2010
Sloan DR7
18
Tamás Budavári
800800 bins
All Done Inside the Database
19
Tamás Budavári

Pair counts computed on GPU
 Returns 2D
histogram as a table (i, j, cts)

Calculate the correlation fn in SQL

Can also do async parallel GPU jobs
7/30/2010
All Done Inside the Database
20
Tamás Budavári

Pair counts computed on GPU
 Returns 2D
histogram as a table (i, j, cts)

Calculate the correlation fn in SQL

Can also do async parallel GPU jobs
7/30/2010
21
Distributed Data
Data at the Projects
22
Tamás Budavári

Exponential growth
Projects last 3-5 years, data sent upwards at the end
 Data will never be centralized


Most data at projects
More responsibility on projects
 Bring analysis close to the data

7/30/2010
23
Tamás Budavári
7/30/2010
Data Federation
24
Tamás Budavári

Metcalfe’s Law
 Utility
of computer networks grows as the
number of possible connections: O(N2)

The Virtual Observatory
 The
federation of N astronomy archives has
utility O(N2), i.e. possibilities for making discoveries
The whole is more than the sum of the parts
7/30/2010
Interoperability Challenges
25
Tamás Budavári








Metadata standards
Data discovery
Data requests
Data delivery
Units
Database queries
Distributed applications
Authentication and authorization
7/30/2010
US National Virtual Observatory
26
Tamás Budavári

NVO Research 2002-2007
 NSF
ITR Program: $10M for 5 years
 17 organizations: Astro, CS, IT

VAO Facility 2010 NSF
$20M for 5 years
 Operational phase!
http://us-vo.org/
7/30/2010
http://ivoa.net/
7/30/2010
http://ivoa.net/
7/30/2010
IVOA Specifications
29
Tamás Budavári
7/30/2010
First Standards
30
Tamás Budavári

VOTable
 Universal
container for tables (in XML)
 First VO standard (from the DTD era)

ConeSearch
 Simple
catalog access based on location
 First VO standard interface (http get)

Many implemented them!
7/30/2010
Early Standards
31
Tamás Budavári

Simple Image Access Protocol (SIAP)
 Http
request, similar to opening a web page
 Returns links to the matching images in votable
 Assumes we know how to deal with FITS images

Universal Content Descriptor (UCD)
 Crystallized set
of keywords from literature
 For data discovery – not queries
7/30/2010
Components
32
Tamás Budavári

Discovery


Directory, Sky coverage


Tables, Catalogs
Images, Spectra
Events
Distributed Storage


VOSpace
Authentication
Distributed Computing

Access





Messaging


Web & Grid services
VOStat
SAMP, VOPipe
User Interfaces



Aladin
Topcat
Mirage, etc…
7/30/2010
33
VO Examples
VO Applications and Services
NVO Quick Start
34
Tamás Budavári
7/30/2010
Ready, Steady…
35
Tamás Budavári
7/30/2010
DataScope
36
Tamás Budavári



Collect info in VO

On a particular object

Or a part of the sky

GRBs, transients, etc.
VO plotting tools

FITS images

Catalog data
And more…
7/30/2010
Bandpass Services
37
Tamás Budavári

Public repository




Web site



Search by keyword or eff
Extract in various formats
Register & submit yours
On-the-fly plotting
Easy access to all
Web services

To code against
7/30/2010
Spectrum Services
38
Tamás Budavári

Public repository




Web site





SDSS, 2dF spectra, etc
Spatial and SQL search
Register & submit yours
On-the-fly plotting
Building composites
De-reddening
Line analysis
Web services
7/30/2010
Open SkyQuery
39
Tamás Budavári

SkyNode interface to archives

Implements ADQL returns VOTable

Basic node understands “REGION”
Full node understands “XMATCH”


SkyQuery portal


Knows the SkyNodes from Registry
Understands federated query
http://openskyquery.net/
WESIX
40
Tamás Budavári
Web Enabled Source-Identification with Crossmatching
Higher level astronomy
services built on other
existing VO services:
SExtractor service and
Open SkyQuery
Result can be sent to
plotting tool for quick
inspection.
http://nvogre.astro.washington.edu:8080/wesix/
7/30/2010
VOStat
41
Tamás Budavári

Enabling R
 For
VO data
7/30/2010
Sky Coverage
42
Tamás Budavári

Discovery
Transients: VOEvent
43
Tamás Budavári
7/30/2010
Help!
44
Tamás Budavári
45
VO for Developers
Automated tools for analysis
Advanced services
Web Services
46
Tamás Budavári

Simple HTTP requests
 ConeSearch
 Simple

Image Access
Standard SOAP and REST
 Interoperable across
platforms
 IVOA compliant XML messages
 Programming toolkits exist
7/30/2010
Command Line: VO-CLI
47
Tamás Budavári

VOTool
7/30/2010
Command Line: VO-CLI
48
Tamás Budavári

VOTool
7/30/2010
49
Future
New features
Better integration
VOSpace 2.0
50
Tamás Budavári

Storage instances soon everywhere
 Save
intermediate data products
 Arrange for their transfer to other places

VOPipe
 Chain
VOSpaces for data flow between services
 Async execution of custom processing steps
7/30/2010
Summary
51
Tamás Budavári

More and Moore data: new opportunities
 No
central data store but at projects
 On-site processing: CPU + GPU

Hierarchical Services
 Standardized
interfaces
 Data federation

New “VxOs”
 VaO:
Virtual Astronomical Observatory
 VsO,
7/30/2010
Sites to Explore
52
Tamás Budavári
7/30/2010
53
Tamás Budavári
7/30/2010
Download