Vocabulary Server Data Model The

advertisement
Vocabulary Workshop, RAL, February 25, 2009
NERC DataGrid
NERC DataGrid
Vocabulary Server
Description
Outline
Vocabulary Server:
NERC DataGrid





Data model
Implementation
Content
Usage
Development path
Vocabulary Server Data Model
NERC DataGrid
 The fundamental building block of the
data model is a term, which is
equivalent to a SKOS “concept”
 Each term has:
 Key: a semantically neutral string that forms
the basis of a URN
 Label: a human-readable name for the
concept
 Alternative label: used for abbreviations
 Definition: more verbose explanation of the
concept
Vocabulary Server Data Model
 The terms are aggregated into lists
equivalent to SKOS ‘collections’
NERC DataGrid
 Each list is given a semantically neutral
identifier (4-byte string)
 Lists may aggregated in ‘Superlists’
 Each ‘Superlist’ is given a semantically
opaque identifier (bytes 1-3 of the
component list identifiers)
Vocabulary Server Data Model
 The ‘Superlist’ concept was inherited
from 1980s BODC infrastructure
NERC DataGrid
 It has no parallel in any knowledge
representation standard
 It is has the unpleasant side effect of
giving terms alternative possible URNs
 Its deprecation is becoming a priority
Vocabulary Server Implementation
 Server back end is an Oracle relational
database
NERC DataGrid
 All terms are stored in a single table
 List and superlist aggregations implemented
as a 2-level indexing table hierarchy
 Heavily defended by constraints and triggers
 Fully automated timestamps and update
‘fingerprints’
 Fully automated audit trails
 Fully automated list and superlist versioning
Vocabulary Server Implementation
NERC DataGrid
 Term URLs, list URLs and API calls
invoke Java applications that submit
SQL queries and wrap up the output as
XML documents
Vocabulary Server Implementation
 Why not XML?
NERC DataGrid
 Grew out of an integral part of the BODC
Oracle infrastructure
 Experiments with XML – particularly OWL –
technology did not go well
 Maintenance tools seem less effective
 Navigation difficulties through very large XML
documents
 Performance issues with lists containing 20000+
terms
 XML has benefits such as access to
inference engines, so worth persevering
 Answer might be to have operational XML
builds from a relational back end
Vocabulary Server Content
 Server Contents (2009-02-10)
NERC DataGrid




76 public superlists
125 public lists
124701 public terms
80987 public mappings (RDF triples)
 Some of the subject areas covered





Parameters
Platforms
Instruments
Coverage terms
Geographic keywords
Vocabulary Server Usage
 Server Usage for 2008 (2009 to 2009-02-10 in
brackets)
NERC DataGrid
 4793116 (607172) total hits
 56232 (7134) vocabulary catalogue downloads
 78708 (10233) vocabulary term/list downloads
 1367 (433) vocabulary map downloads
 2479 (73) term searches
 1501 (74) term verifications
 Rest of total is robots mining semantic links
(getRelatedRecordByTerm method)
VS Development Path
 Version 1.1 current operational version
NERC DataGrid
 Version 1.2 currently under development
 Transparent upgrade (no change to WSDL)
 Bug fix and activation of versioned list
serving
 Additional service API providing list content
upgrade functionality to authenticated,
authorised external users
VS Development Path
 Version 2.0 currently being designed
NERC DataGrid
 Revisit back end design
 Governance labelling
 Deprecation support
 Introduce more XML technology?
 Introduce formally-registered, truly
permanent URNs
 Single RESTful API giving both read and
write access through appropriate HTTP
methods
 Output document revision to SKOS 2008
VS Development Path
NERC DataGrid
 Whatever happens with V2.0 we will not
annoy a large and very active user base
through change
 Both versions will therefore run in
parallel until V1.2 calls are no longer
logged
Download