Vocabulary Workshop, RAL, February 25, 2009 NERC DataGrid NERC DataGrid Vocabulary Server Description Outline Vocabulary Server: NERC DataGrid Data model Implementation Content Usage Development path Vocabulary Server Data Model NERC DataGrid The fundamental building block of the data model is a term, which is equivalent to a SKOS “concept” Each term has: Key: a semantically neutral string that forms the basis of a URN Label: a human-readable name for the concept Alternative label: used for abbreviations Definition: more verbose explanation of the concept Vocabulary Server Data Model The terms are aggregated into lists equivalent to SKOS ‘collections’ NERC DataGrid Each list is given a semantically neutral identifier (4-byte string) Lists may aggregated in ‘Superlists’ Each ‘Superlist’ is given a semantically opaque identifier (bytes 1-3 of the component list identifiers) Vocabulary Server Data Model The ‘Superlist’ concept was inherited from 1980s BODC infrastructure NERC DataGrid It has no parallel in any knowledge representation standard It is has the unpleasant side effect of giving terms alternative possible URNs Its deprecation is becoming a priority Vocabulary Server Implementation Server back end is an Oracle relational database NERC DataGrid All terms are stored in a single table List and superlist aggregations implemented as a 2-level indexing table hierarchy Heavily defended by constraints and triggers Fully automated timestamps and update ‘fingerprints’ Fully automated audit trails Fully automated list and superlist versioning Vocabulary Server Implementation NERC DataGrid Term URLs, list URLs and API calls invoke Java applications that submit SQL queries and wrap up the output as XML documents Vocabulary Server Implementation Why not XML? NERC DataGrid Grew out of an integral part of the BODC Oracle infrastructure Experiments with XML – particularly OWL – technology did not go well Maintenance tools seem less effective Navigation difficulties through very large XML documents Performance issues with lists containing 20000+ terms XML has benefits such as access to inference engines, so worth persevering Answer might be to have operational XML builds from a relational back end Vocabulary Server Content Server Contents (2009-02-10) NERC DataGrid 76 public superlists 125 public lists 124701 public terms 80987 public mappings (RDF triples) Some of the subject areas covered Parameters Platforms Instruments Coverage terms Geographic keywords Vocabulary Server Usage Server Usage for 2008 (2009 to 2009-02-10 in brackets) NERC DataGrid 4793116 (607172) total hits 56232 (7134) vocabulary catalogue downloads 78708 (10233) vocabulary term/list downloads 1367 (433) vocabulary map downloads 2479 (73) term searches 1501 (74) term verifications Rest of total is robots mining semantic links (getRelatedRecordByTerm method) VS Development Path Version 1.1 current operational version NERC DataGrid Version 1.2 currently under development Transparent upgrade (no change to WSDL) Bug fix and activation of versioned list serving Additional service API providing list content upgrade functionality to authenticated, authorised external users VS Development Path Version 2.0 currently being designed NERC DataGrid Revisit back end design Governance labelling Deprecation support Introduce more XML technology? Introduce formally-registered, truly permanent URNs Single RESTful API giving both read and write access through appropriate HTTP methods Output document revision to SKOS 2008 VS Development Path NERC DataGrid Whatever happens with V2.0 we will not annoy a large and very active user base through change Both versions will therefore run in parallel until V1.2 calls are no longer logged