Thesaurus-based access to multimedia collections

Vocabulary registries and services
Doug Tudhope
Hypermedia Research Unit
University of Glamorgan
Ecoterm, FAO, Rome, Oct 2009
(acknowledge Kora Golub, UKOLN on TRSS)
1. JISC Terminology Registry Scoping Study (TRSS)
Architecture options
Use cases
Some major registry projects briefly reviewed
Issues (governance)
2. Terminology services at Glamorgan
– SOAP based services
– HTTP based services
– List of work on services
1. TRSS Project context
UK JISC funded - Terminology Registry Scoping Study
Background JISC 2006 review on Terminology Services and Technology
– UKOLN (Kora Golub, PI)
– University of Glamorgan (Doug Tudhope)
– Non-funded: OCLC Office of Research, USA
TRSS project 2008, published July 2009
TRSS final report
Overall approach
Relatively short 6 month timescale
Review previous and current projects and documentation
Consultation with key services, projects and executives across digital
library, research and learning domains
– 28 responses collected
TRSS final report
Many of the actual recommendations in the report are UK specific.
However report includes more general material
• discussion on types of registries, their scope and architecture,
standards, governance,
• review of functionality and use cases
• review of some KOS registry initiatives and implementations
• review of KOS metadata
with some recommendations on an expanded metadata set
• Terminologies
– Controlled vocabularies often referred to as terminologies
with regard to registries and web services
• Terminology services
– Web services: return/apply vocabularies and their content
• Terminology registry
– lists, describes, and points to sets of vocabularies
– can hold vocabulary information: member terms, concepts
and relationships, provide terminology services, for both
human inspection and m2m access
Option 1: Registry provides metadata for each vocabulary and links
to vocabulary owner/provider
Option 2: Registry provides metadata on (and links to) any available
terminology services
Option 3: Registry provides access to vocabulary content (by
downloading or providing access to vocabulary’s concepts, terms
and relationships)
orthogonal (independent) facets which can be combined
Collected use cases (from literature and respondents)
under general headings of TR functionality
Creation, modification and maintenance (Option 3)
Aquisition and publication (Option 1, 3)
Cataloguing: Indexing/classification/annotation (Options 2,3)
Integration (Options 2, 3)
Access, search and discovery (Options 1,2,3)
Both at vocabulary and concept/service level
Use (Options 2,3)
Including mapping, merging and semantic interoperabilty
terminology service providing support for a wider application
Archiving and preservation (Option 3)
Basic rationale for a TR
in immediate JISC context
Main rationale for the near term recommendation of report (Option 1) is
in providing a service to assist discovery of existing vocabularies, or the
most recent version of a given vocabulary.
Several TRSS respondents and many use cases describe variants of a
scenario, involving a user from a particular subject domain looking to
see if a vocabulary with certain properties already exists.
This may be for purposes of supporting access to a new repository or
collection (via search and browse services). It may be to assist the
design of a new vocabulary by first looking to see if anything similar
already exists.
Need for metadata
The features of a vocabulary that afford discovery vary (widely)
according to the user’s search criteria.
The user may have a rough idea of a particular vocabularies title. The
user may require a vocabulary covering a particular subject domain (to
greater or lesser degree of specificity). It may be critical that the
vocabulary is free to use. It may be important that the vocabulary be
available in a particular language. The depth or breadth of topic
coverage may be an issue.
To assist discovery a rich set of metadata should be available for the
Some existing TRs
For details see TRSS report
Taxonomy Warehouse
– Option 1, interactive access
– claims to host more than 670 taxonomies (73 subject domains)
from 288 publishers in 39 languages
Cendi Terminology Locator
– Option 1, interactive access
– Points to terminology resources of CENDI federal science research
agencies, spanning agriculture to medicine to the environment
…Existing TRs…
Lexaurus Bank ( originated as BECTA Vocabulary Bank)
– Options 1, 2, 3, interactive and m2m access
– supports creating, editing and maintenance of educational
vocabularies supporting UK National Curriculum
BioPortal and OBO Foundry
– Options 1, 2, 3, interactive and m2m access
– US OBO – over 60 life-science ontologies
– UK BioPortal – search and browsing access to its ontologies and
experimental data
…Existing TRs…
FAO KOS Registry
– Options 1, 2, 3, interactive access (and m2m access to Agrovoc)
– Holds over 90 KOS, in areas related to agriculture and
NERC Data Grid's Vocabulary Server
– Options 1, 2, 3, m2m access
– The British Oceanographic Data Centre (BODC) has a TR which
supports interoperability of scientific datasets in 43 international
data centres
– with more than 100 vocabularies
…Existing TRs
NSDL registry
– Options 1 and 3, interactive access
– SKOS-based TR, with an integrated metadata registry
– 29 vocabularies, mainly educational so far
OCLC's Terminology Services Pilot
– Options 1, 2, 3, interactive and m2m access
– Current vocabularies held include FAST, GSAFD, LC AC SH,
And also various broadly related initiatives, including
• eXtended MetaData Registry (XMDR)
• ISO/IEC 11179 Metadata Registries family of standards
• JISC IE Service Registry (IESR)
• JISC IE Metadata Schema Registry (IEMSR)
• Species 2000 and Catalog of Life
Review of KOS metadata, including from
• NKOS Registry 1998
• NKOS Registry 2001
• Ecoterm (Environmental Terminology and KOS)
• Food and Agriculture Organization (FAO) of UN
• Hodge et al. 2007 (10th OFMR)
• National Science Digital Library Registry
• ISO 11179 (Information Technology - Metadata registries (MDR))
• OCLC Terminology Services
• SPECTRUM Terminology Bank
• Taxonomy Warehouse
• Vocman (Becta Vocabulary Bank)
and taking into consideration Ontology Metadata Vocabulary (OMV)
Metadata – proposed extended metadata set
details in TRSS report – interested in feedback
1 General information
– Vocabulary name, author or editor, type etc.
2 Scope and usage
– Subjects covered, purpose, rating etc.
3 Characteristics
– Type of terms, relationships etc.
4 Terms and conditions
– Availability etc.
5 Provider
– Contact name etc.
Includes both Technical and Content governance
Content governance varies with architecture Option and may include:
• Validation of correctness of content
• Maintaining vocabulary representations supported
according to appropriate versions of standards
• Versioning of the vocabulary intellectual content
• Need for selection of vocabularies?
process/criteria for evaluating whether to accept offered vocabularies
Reviewing metadata returned by vocabulary owners
Promotion of the TR and its services
Education and training in the resources and services.
Emerged as a concern if content held in the registry (Option 3)
One of the reasons for short term recommendation of Option 1
for a general vocabulary situation
Metadata set core/optional for TR?
cost/benefit in how rich a metadata set to recommend
a richer set might be more useful but deter vocabulary providers
Metadata for terminology services?
Relationship with ontology and language community registries?
When is Option 3 feasible?
eg considering governance issues
It may be easier for well defined, coherent communities
2. Terminology Services
can be applied at all stages of the search process. Services include
resolving search terms to controlled vocabulary, disambiguation
services, offering browsing access, offering mapping between
vocabularies, query expansion, query reformulation, combined search
and browsing. These can be applied as immediate elements of the enduser interface or can underpin services behind the scenes, according to
JISC review on Terminology Services and Technologies, 2006
 Potential for SKOS-based programmatic services
SKOS Services
at Glamorgan
We took as starting point a subset of
SKOS API (Application Program Interface)
a deliverable of SWAD-Europe Thesaurus Activity
designed to provide programmatic access to SKOS vocabularies
Our focus is on the functionality of the services
which could be implemented via various lower level protocols
How to package the functionality, what are common patterns of use?
How to implement the services in different lower level protocols?
SKOS Web Service and Client Applications
SKOS Web Service
Windows based client
Web browser based
SKOS Client
Given a string (cove), GetConcept finds
matches in the controlled vocabularies of all
SKOS concept schemes registered with the
Shows an example of a match with the ‘entry
vocabulary’ of effective synonyms (eg bays) for
different SKOS schemes
Web Service Client
SKOS Services:
possible examples
Display details of selected concept.
Here illustrating the semantic expansion service
returning ‘semantically close’ concepts to cove
SKOS Client - Widgets
Concept Search
Concept Schemes
Concept Details
Concept Expansion
Current work
Semantic Tools for Archaeology Resources (STAR) research project
English Heritage thesauri converted to SKOS
SKOS based terminology services
Query expansion
others have used in (DELOS and ArcheoTools) research projects
Recently developed URL based web service call interface
for SKOS services in ongoing JISC tag suggestion project
Fast, scalable, platform neutral
JSON data structures returned
Related KOS-based web services (non-exhaustive list)
Contact Information
Doug Tudhope
School of Computing
University of Glamorgan
Pontypridd CF37 1DL
Wales, UK