AstroGrid Datacenters AstroGrid Consortium Review Dec 2004 Martin Hill (AstroGrid@ROE) Outline Challenge Approach Developed: • • • • • Storepoints Describing data Query Language Status Versioning Software: Publisher’s AstroGrid Library Problem Challenge Outline Large datasets (to Petabytes) So? Distributed; Science comes from combining Bandwidth rising slower than No/few established suitable standards • FITS images/‘tables’. Ambiguous headers. Ambiguous subformat, eg spectra. • VOTable introduced. Ambiguous subformat eg spectra vs catalogue. Verbose. No/few established common terms Involves Scientists… Approach: ‘Publisher’s AstroGrid Library’ General solution to: • Discover problems faced, accumulate solutions in software • Experimentally publish sets and types (not host). • Many smaller datasets owned by people without web skills (eg solar) so: Need 'easy‘/’unskilled’ installation Able to proxy; 3rd parties can publish data without requiring more work from owner (eg VizieR, Trace) ‘Free’ website, range of standard interfaces Danger: too general (any query against any dataset producing any results). Existing Solutions Common task: publish RDBMs to web Accumulated tools & skill-sets No combined solution offering: • Standard interface (eg query language) • Scientific values (errors, units) • Spatial querying (common) • VO Metadata for query and results Developing Standards Resource metadata Query language (ADQL/s, ADQL/x) Web interfaces Working beyond standards Feeding research to IVOA Parallel development • In the VO: eg Starlink, NVO, VizieR • External: SRB, Taverna, GridPP monitor • Convergence Protocols & Interfaces Human – web pages SOAP • Toolkit Incompatibilities • Streaming awkward (via Toolkits) • Longer term benefits? ‘Raw Http post’ (eg servlets, CGI) • Simpler • More existing skills amongst Astronomers Mixed (eg SIAP, SkyNode) Don’t Choose – Implement Mix & Match, Plug & Play: Releasing Deploy early – if temporarily Independent & Integrated Access Versioning: • Servers & clients, ie new clients can still use old servers, and new servers work with old clients. • Add and ‘deprecate’, don’t change • Delete intelligently (Remove quickly unused i/fs, eg CEA if CEA upgrades, JSPs) Need hosts… • Hosts need hardware • Publishers need to know their data Describing Data Registry ‘Resource’ documents IVO Tabular Sky Service • Units, UCDs Solar vs Sky vs… Images vs Catalogues Concept extended for ‘RdmsMetadata’ • UCD1+ -> Dictionaries & Ontologies • Relationships (simple: errors) Queryable Mirrors vs Copies Query Language SQL -> ADQL/xml Defined common functions – CIRCLE & XMATCH (sky not solar) Working on: • XQL • Units • Investigating: UCDs instead of columns • Cross-dataset querying Results Query+Metadata+RawResults = VoResults FITS vs VOTable vs HDF vs CSV vs HTML vs… • All of them Results -> queryable data -> inputs Data Analysis (Clive Page) Faster feasible • < 10^6s OK. 10^8 not… Joins • Polar coordinate matches (+ HTM, HealPix). • Cross-match algorithms Distributed queries • Breaking down query • Moving the right data • Combining the results Status Readily available Debugging; developer Debugging; astronomer Inform User Storepoints No data persistence at PALs • Web server machines not data storage ones • Large result sets • No workspace, memory models, etc Streaming outputs SRB, GridFTP not ready. Identifying Storepoints Concepts MySpace SRB FTP GridFTP SRB GridFTP FTP MySpace Community HomeSpace VoSpace (Registered) SRB HTTP FTP, File, MySpace + extend. 3rd iteration; 2nd in use JSP SkyNode CEA Cone AstroGrid Axis SIAP Web Service interface Datacenter Implementation Data Service Architecture VOTable /XML/CSV zip/plain email/file/ftp /myspace Plugin Manager SQL Querier Plug-in SQL Results Slinger Internet Query Language Connection & Authorisation Manager Astronomical Data Example SQL-based catalogue datacenter SQL Database Publishers’ AstroGrid Library ‘Easy to publish to the VO’ Web Application, includes: • • • SOAP (AstroGrid, CEA, prepped for SkyNode) CGI (SIAP, NVO-cone search, SSA) HTML pages (cone search, query builder, status monitor) • • • • • Asynchronous (‘stateful’) & Synchronous Queries Queues Comprehensive Status (incl historical) Variety results Fully ‘Streamed’ – no curation issues • • • RDBMS (JDBC) FITS file collection eXist (XML) • • Metadata Generators Ready-made website access Features Server ‘Plugins’, including: Helper Tools Situation Now Installed: • SuperCOSMOS Science Archive (RDBMS) astrogrid.roe.ac.uk:8080/pal-ssa/ astrogrid.roe.ac.uk:8080/pal-twomass/ astrogrid.roe.ac.uk:8080/pal-usnob/ • 6dF – Spectra grendel12.roe.ac.uk:8080/pal-6df/ • Wide Field Survey • TRACE (FITS files, Solar, under test) Proxy (bespoke special plugins) • All NVO-cone-compatible DBs (test) • VizieR Evaluated/ing at: • ESO • RAL (solar) • JBO (Merlin) Reviewing Query Language, metadata documents, etc Future Quality… Metadata ‘wizards’ Sell to hosts; deploy to Leicester, JBO, ESO, RAL, The World.... Explicit and Investigative Queries Distributed queries & combining results (NVO Exec plans) Full SIA, SSA interface More user & admin web pages Local authorisation