A web-based repository service for vocabularies and alignments in the
Cultural Heritage domain
Lourens van der Meij
Antoine Isaac
Claus Zinn
• Authors not here
• Projects
Using SW techniques for CH data
Focus on vocabularies and alignments
• Knowledge Organization Systems (KOS) like thesauri are used to describe cultural objects
• Many different KOSs are used in different institutions
• Merging them in one global vocabulary is not realistic nor desirable
Semantic matching as a solution to tackle semantic heterogeneity
Application cases
• Semantic search and browsing
• (Re-)Indexing
Overall functions
• Uniform access to vocabularies
• Access & management of alignments
Experiment idea: test SW techniques for flexibility, ease of re-use and linking models & data
Existing RDF best practices: SKOS animals
NT cats cats
UF domestic cats
RT wildcats
BT animals
SN used only for domestic cats domestic cats
USE cats wildcats
Existing RDF best practices: SKOS
Crucial features for a repository
• Vocabulary membership
• Cross-vocabulary mapping properties
From Ontology Alignment Evaluation Initiative
• Mapping cells
– 2 entities being matched
– 1 relation type (any!)
– 1 measure
– Provide hook for annotations
• Alignments between ontologies as set of cells
– Can also be annotated http://oaei.ontologymatching.org
• Need for dedicated middleware: some reqs beyond basic data access are not met by standard SPARQL
– Full-text search on labels
– Ranking of results
– Access control/authentication
– Query complexity control
– LoD data publication strategy
– Other data exchange formats (JSON)
• APIs are also a good way to structure practices in a domain
• API is inspired by both SKOS and OAEI APIs
• But dedicated to simple vocabularies
Not fully-fledged ontologies
• Dedicated to vocabularies and alignments
More than usual terminology repositories
• Alignments are for simple vocabularies
Restricting OAEI-based functions to SKOS mappings
Distributed service architecture
• Allowing to serve either vocabularies or alignments or both
Fitting different stakeholder missions/interests
• One service can sit on several others
Distribution thought as a scalability-enabler
Sends reassuring message re. access control
Distributed service architecture
Plus: many alignments automatically created in the STITCH project
Driven by “business” interests
E.g., KB has a list of relevant KOSs in its context
Doelgroep
-audience
NUR
Biblion
UNESCO class.
Brinkman
NBC class.
GTT
Dutch
Public
Libraries
Dutch
Booktrade
KB
Deposit
Coll.
KB overlap between book collections
(thickness indicates degree of overlap)
Vertical adjustment between a coll. and KOSs denotes KOSs' being used to describe that coll.
KB
Corporatie
+ Persoon
KB
Scientific
Coll.
BISAC subject codes
DDC
Dewey decimal class.
LCSH subject headings
LC
(US Nat.
Lib)
LC authority file
RAMEAU subject headings
BnF
(French
Nat. Lib) other classifications
SWD subject headings domain/ discipline classifications subject thesauri / subj. heading lists
DNB
(German
Nat. Lib) book collection datasets
Autorités
BNF
Personen namen datei person/ corporation data
Johan Stapel
Vocabulary and alignment browser
Deployment (1)
Deployment (2)
RAMEAU (French NL) as linked data
• Interlinked with LCSH (Library of Congress)
• Soon to SWD (German NL)
• Using manual mappings from the MACS project http://stitch.cs.vu.nl/repository
Deployment (3)
STITCH re-indexing prototype (ISWC 2009)
• Plugged onto KB cataloguing system
• Middleware is still useful
– To match real application requirements
– To gather communities of practice around new usages
• But SW tools really help building it
• Relevance of existing models like SKOS
– Only one part of SKOS unused (collections) and one extension required (concept scheme groups)
– Disclaimer: we were involved in SKOS
• Interest from the Cultural Heritage domain
• Some basic middleware functions like full-text search are now tackled by vendor-specific SPARQL ext.
We prefer it that way
• Working out the distributed architecture is difficult
Progress on federated RDF repositories can be useful
• Versioning/changes MUST be addressed at a finegrained level (concepts)
Maybe the issue with the least mature solutions!
Already started!
CATCHplus: continuing CATCH efforts, bringing them even closer to production
New repository and interface
• Refinement of HTTP API
E.g., Possibility to search for pairs of related concepts, with constraints
Closer to SPARQL, but still limiting complexity
• Based on Openlink Virtuoso
– Disk-based implementation can handle huge datasets
– Built-in LOD function & full-text features
• Architecture is no longer distributed, for now!
Difficult conflict between requirements
– Some clients had requirements for SPARQL
– Federated SPARQL query is (was?) not yet mature
• Named graphs are being experimented
– For representing KOS data bundles (file upload)
– For contextualizing triples (one shortcoming of SKOS/RDF)
http://stitch.cs.vu.nl/repository