Vocabulary registries and services Doug Tudhope Hypermedia Research Unit University of Glamorgan Ecoterm, FAO, Rome, Oct 2009 Presentation (acknowledge Kora Golub, UKOLN on TRSS) 1. JISC Terminology Registry Scoping Study (TRSS) – – – – Architecture options Use cases Some major registry projects briefly reviewed Issues (governance) 2. Terminology services at Glamorgan – SOAP based services – HTTP based services – List of work on services 1. TRSS Project context • UK JISC funded - Terminology Registry Scoping Study Background JISC 2006 review on Terminology Services and Technology http://www.jisc.ac.uk/media/documents/programmes/capital/terminology_services_and_technology_review_sep_06.pdf • Partners – UKOLN (Kora Golub, PI) – University of Glamorgan (Doug Tudhope) – Non-funded: OCLC Office of Research, USA • TRSS project 2008, published July 2009 http://www.ukoln.ac.uk/projects/trss • TRSS final report http://www.jisc.ac.uk/media/documents/programmes/sharedservices/trss-report-final.pdf Overall approach • Relatively short 6 month timescale • Review previous and current projects and documentation • Consultation with key services, projects and executives across digital library, research and learning domains – 28 responses collected TRSS final report • Many of the actual recommendations in the report are UK specific. However report includes more general material • discussion on types of registries, their scope and architecture, standards, governance, • review of functionality and use cases • review of some KOS registry initiatives and implementations • review of KOS metadata with some recommendations on an expanded metadata set Definitions • Terminologies – Controlled vocabularies often referred to as terminologies with regard to registries and web services • Terminology services – Web services: return/apply vocabularies and their content • Terminology registry – lists, describes, and points to sets of vocabularies – can hold vocabulary information: member terms, concepts and relationships, provide terminology services, for both human inspection and m2m access Architecture • Option 1: Registry provides metadata for each vocabulary and links to vocabulary owner/provider • Option 2: Registry provides metadata on (and links to) any available terminology services • Option 3: Registry provides access to vocabulary content (by downloading or providing access to vocabulary’s concepts, terms and relationships) • orthogonal (independent) facets which can be combined Collected use cases (from literature and respondents) under general headings of TR functionality • • • • Creation, modification and maintenance (Option 3) Aquisition and publication (Option 1, 3) Cataloguing: Indexing/classification/annotation (Options 2,3) Integration (Options 2, 3) – • Access, search and discovery (Options 1,2,3) – • Both at vocabulary and concept/service level Use (Options 2,3) – • Including mapping, merging and semantic interoperabilty terminology service providing support for a wider application Archiving and preservation (Option 3) Basic rationale for a TR in immediate JISC context • Main rationale for the near term recommendation of report (Option 1) is in providing a service to assist discovery of existing vocabularies, or the most recent version of a given vocabulary. • Several TRSS respondents and many use cases describe variants of a scenario, involving a user from a particular subject domain looking to see if a vocabulary with certain properties already exists. • This may be for purposes of supporting access to a new repository or collection (via search and browse services). It may be to assist the design of a new vocabulary by first looking to see if anything similar already exists. Need for metadata • The features of a vocabulary that afford discovery vary (widely) according to the user’s search criteria. • The user may have a rough idea of a particular vocabularies title. The user may require a vocabulary covering a particular subject domain (to greater or lesser degree of specificity). It may be critical that the vocabulary is free to use. It may be important that the vocabulary be available in a particular language. The depth or breadth of topic coverage may be an issue. etc. • To assist discovery a rich set of metadata should be available for the vocabulary. Some existing TRs For details see TRSS report • Taxonomy Warehouse – Option 1, interactive access – claims to host more than 670 taxonomies (73 subject domains) from 288 publishers in 39 languages • Cendi Terminology Locator – Option 1, interactive access – Points to terminology resources of CENDI federal science research agencies, spanning agriculture to medicine to the environment …Existing TRs… • Lexaurus Bank ( originated as BECTA Vocabulary Bank) – Options 1, 2, 3, interactive and m2m access – supports creating, editing and maintenance of educational vocabularies supporting UK National Curriculum • BioPortal and OBO Foundry – Options 1, 2, 3, interactive and m2m access – US OBO – over 60 life-science ontologies – UK BioPortal – search and browsing access to its ontologies and experimental data …Existing TRs… • FAO KOS Registry – Options 1, 2, 3, interactive access (and m2m access to Agrovoc) – Holds over 90 KOS, in areas related to agriculture and administration • NERC Data Grid's Vocabulary Server – Options 1, 2, 3, m2m access – The British Oceanographic Data Centre (BODC) has a TR which supports interoperability of scientific datasets in 43 international data centres – with more than 100 vocabularies …Existing TRs • NSDL registry – Options 1 and 3, interactive access – SKOS-based TR, with an integrated metadata registry – 29 vocabularies, mainly educational so far • OCLC's Terminology Services Pilot – Options 1, 2, 3, interactive and m2m access – Current vocabularies held include FAST, GSAFD, LC AC SH, LCSH, MeSH, TGM And also various broadly related initiatives, including • eXtended MetaData Registry (XMDR) • ISO/IEC 11179 Metadata Registries family of standards • JISC IE Service Registry (IESR) • JISC IE Metadata Schema Registry (IEMSR) • Species 2000 and Catalog of Life Metadata Review of KOS metadata, including from • NKOS Registry 1998 • NKOS Registry 2001 • CENDI • Ecoterm (Environmental Terminology and KOS) • Food and Agriculture Organization (FAO) of UN • Hodge et al. 2007 (10th OFMR) • National Science Digital Library Registry • ISO 11179 (Information Technology - Metadata registries (MDR)) • OCLC Terminology Services • SPECTRUM Terminology Bank • Taxonomy Warehouse • Vocman (Becta Vocabulary Bank) and taking into consideration Ontology Metadata Vocabulary (OMV) Metadata – proposed extended metadata set details in TRSS report – interested in feedback 1 General information – Vocabulary name, author or editor, type etc. 2 Scope and usage – Subjects covered, purpose, rating etc. 3 Characteristics – Type of terms, relationships etc. 4 Terms and conditions – Availability etc. 5 Provider – Contact name etc. Governance • Includes both Technical and Content governance Content governance varies with architecture Option and may include: • Validation of correctness of content • Maintaining vocabulary representations supported according to appropriate versions of standards • Versioning of the vocabulary intellectual content • Need for selection of vocabularies? – – • • • process/criteria for evaluating whether to accept offered vocabularies Reviewing metadata returned by vocabulary owners Promotion of the TR and its services Education and training in the resources and services. Emerged as a concern if content held in the registry (Option 3) One of the reasons for short term recommendation of Option 1 for a general vocabulary situation Issues • • Metadata set core/optional for TR? cost/benefit in how rich a metadata set to recommend – a richer set might be more useful but deter vocabulary providers • Metadata for terminology services? • Relationship with ontology and language community registries? • When is Option 3 feasible? eg considering governance issues It may be easier for well defined, coherent communities 2. Terminology Services … can be applied at all stages of the search process. Services include resolving search terms to controlled vocabulary, disambiguation services, offering browsing access, offering mapping between vocabularies, query expansion, query reformulation, combined search and browsing. These can be applied as immediate elements of the enduser interface or can underpin services behind the scenes, according to context. JISC review on Terminology Services and Technologies, 2006 Potential for SKOS-based programmatic services SKOS Services at Glamorgan We took as starting point a subset of • SKOS API (Application Program Interface) a deliverable of SWAD-Europe Thesaurus Activity http://www.w3.org/2001/sw/Europe/reports/thes designed to provide programmatic access to SKOS vocabularies • Our focus is on the functionality of the services which could be implemented via various lower level protocols Issues How to package the functionality, what are common patterns of use? How to implement the services in different lower level protocols? SKOS Web Service and Client Applications SKOS Web Service Windows based client application Web browser based components (‘widgets’) SKOS Client Applications Given a string (cove), GetConcept finds matches in the controlled vocabularies of all SKOS concept schemes registered with the server. Shows an example of a match with the ‘entry vocabulary’ of effective synonyms (eg bays) for different SKOS schemes Web Service Client SKOS Services: possible examples •GetTopmostConcepts •GetConceptSchemes •GetConcept •GetAllConceptRelatives •GetAllConceptsByPath •GetConceptsMatchingKeyword •ExpandConcept Display details of selected concept. Here illustrating the semantic expansion service returning ‘semantically close’ concepts to cove SKOS Client - Widgets Concept Search Concept Schemes Concept Details Concept Expansion Current work Semantic Tools for Archaeology Resources (STAR) research project English Heritage thesauri converted to SKOS SKOS based terminology services Browsing Query expansion others have used in (DELOS and ArcheoTools) research projects http://hypermedia.research.glam.ac.uk/kos/terminology_services/ Recently developed URL based web service call interface for SKOS services in ongoing JISC tag suggestion project Fast, scalable, platform neutral JSON data structures returned Related KOS-based web services (non-exhaustive list) http://hypermedia.research.glam.ac.uk/kos/terminology_services/links/ Contact Information Doug Tudhope School of Computing University of Glamorgan Pontypridd CF37 1DL Wales, UK dstudhope@glam.ac.uk http://hypermedia.research.glam.ac.uk/