AstroGrid Datacenters AstroGrid Consortium Review Dec 2004 Martin Hill (AstroGrid@ROE)

advertisement
AstroGrid Datacenters
AstroGrid Consortium Review
Dec 2004
Martin Hill (AstroGrid@ROE)
Outline



Challenge
Approach
Developed:
•
•
•
•
•

Storepoints
Describing data
Query Language
Status
Versioning
Software: Publisher’s AstroGrid Library
Problem Challenge Outline





Large datasets (to Petabytes)
So?
Distributed; Science comes from combining
Bandwidth rising slower than
No/few established suitable standards
• FITS images/‘tables’. Ambiguous headers. Ambiguous
subformat, eg spectra.
• VOTable introduced. Ambiguous subformat eg spectra
vs catalogue. Verbose.


No/few established common terms
Involves Scientists…
Approach: ‘Publisher’s AstroGrid
Library’

General solution to:
• Discover problems faced, accumulate solutions in
software
• Experimentally publish sets and types (not host).
• Many smaller datasets owned by people without web
skills (eg solar) so:




Need 'easy‘/’unskilled’ installation
Able to proxy; 3rd parties can publish data without
requiring more work from owner (eg VizieR, Trace)
‘Free’ website, range of standard interfaces
Danger: too general (any query against any
dataset producing any results).
Existing Solutions



Common task: publish RDBMs to web
Accumulated tools & skill-sets
No combined solution offering:
• Standard interface (eg query language)
• Scientific values (errors, units)
• Spatial querying (common)
• VO Metadata for query and results
Developing Standards



Resource metadata
Query language (ADQL/s, ADQL/x)
Web interfaces

Working beyond standards
 Feeding research to IVOA

Parallel development

• In the VO: eg Starlink, NVO, VizieR
• External: SRB, Taverna, GridPP monitor
• Convergence
Protocols & Interfaces


Human – web pages
SOAP
• Toolkit Incompatibilities
• Streaming awkward (via Toolkits)
• Longer term benefits?

‘Raw Http post’ (eg servlets, CGI)
• Simpler
• More existing skills amongst Astronomers



Mixed (eg SIAP, SkyNode)
 Don’t Choose – Implement
Mix & Match, Plug & Play:
Releasing



Deploy early – if temporarily
Independent & Integrated Access
Versioning:
• Servers & clients, ie new clients can still use old servers,
and new servers work with old clients.
• Add and ‘deprecate’, don’t change
• Delete intelligently


(Remove quickly unused i/fs, eg CEA if CEA upgrades,
JSPs)
Need hosts…
• Hosts need hardware
• Publishers need to know their data
Describing Data


Registry ‘Resource’ documents
IVO Tabular Sky Service
• Units, UCDs



Solar vs Sky vs…
Images vs Catalogues
Concept extended for ‘RdmsMetadata’
• UCD1+ -> Dictionaries & Ontologies
• Relationships (simple: errors)


Queryable
Mirrors vs Copies
Query Language



SQL -> ADQL/xml
Defined common functions – CIRCLE
& XMATCH (sky not solar)
Working on:
• XQL
• Units
• Investigating: UCDs instead of columns
• Cross-dataset querying
Results


Query+Metadata+RawResults =
VoResults
FITS vs VOTable vs HDF vs CSV vs
HTML vs…
•  All of them

Results -> queryable data -> inputs
Data Analysis
(Clive Page)

Faster  feasible
• < 10^6s OK. 10^8 not…

Joins
• Polar coordinate matches (+ HTM, HealPix).
• Cross-match algorithms

Distributed queries
• Breaking down query
• Moving the right data
• Combining the results
Status




Readily available
Debugging; developer
Debugging; astronomer
Inform User
Storepoints

No data persistence at PALs
• Web server machines not data storage
ones
• Large result sets
• No workspace, memory models, etc


 Streaming outputs
SRB, GridFTP not ready.
Identifying Storepoints

Concepts
MySpace
SRB
FTP
GridFTP
SRB
GridFTP


FTP
MySpace
Community
HomeSpace
VoSpace
(Registered)
SRB
HTTP
 FTP, File, MySpace + extend.
3rd iteration; 2nd in use
JSP
SkyNode CEA
Cone AstroGrid
Axis
SIAP
Web
Service
interface
Datacenter Implementation
Data Service Architecture
VOTable
/XML/CSV
zip/plain
email/file/ftp
/myspace
Plugin Manager
SQL
Querier
Plug-in
SQL Results
Slinger
Internet
Query
Language
Connection & Authorisation
Manager
Astronomical
Data
Example SQL-based catalogue datacenter
SQL
Database
Publishers’ AstroGrid Library





‘Easy to publish to the VO’
Web Application, includes:
•
•
•
SOAP (AstroGrid, CEA, prepped for SkyNode)
CGI (SIAP, NVO-cone search, SSA)
HTML pages (cone search, query builder, status monitor)
•
•
•
•
•
Asynchronous (‘stateful’) & Synchronous Queries
Queues
Comprehensive Status (incl historical)
Variety results
Fully ‘Streamed’ – no curation issues
•
•
•
RDBMS (JDBC)
FITS file collection
eXist (XML)
•
•
Metadata Generators
Ready-made website access
Features
Server ‘Plugins’, including:
Helper Tools
Situation Now

Installed:
• SuperCOSMOS Science Archive (RDBMS)



astrogrid.roe.ac.uk:8080/pal-ssa/
astrogrid.roe.ac.uk:8080/pal-twomass/
astrogrid.roe.ac.uk:8080/pal-usnob/
• 6dF – Spectra

grendel12.roe.ac.uk:8080/pal-6df/
• Wide Field Survey
• TRACE (FITS files, Solar, under test)

Proxy (bespoke special plugins)
• All NVO-cone-compatible DBs (test)
• VizieR

Evaluated/ing at:
• ESO
• RAL (solar)
• JBO (Merlin)

Reviewing Query Language, metadata documents, etc
Future








Quality…
Metadata ‘wizards’
Sell to hosts; deploy to Leicester, JBO,
ESO, RAL, The World....
Explicit and Investigative Queries
Distributed queries & combining results
(NVO Exec plans)
Full SIA, SSA interface
More user & admin web pages
Local authorisation
Download