Slides

advertisement

A web-based repository service for vocabularies and alignments in the

Cultural Heritage domain

Lourens van der Meij

Antoine Isaac

Claus Zinn

• Authors not here

• Projects

Using SW techniques for CH data

Focus on vocabularies and alignments

• Knowledge Organization Systems (KOS) like thesauri are used to describe cultural objects

• Many different KOSs are used in different institutions

• Merging them in one global vocabulary is not realistic nor desirable

Semantic matching as a solution to tackle semantic heterogeneity

Eliciting needs for a repository

Application cases

• Semantic search and browsing

• (Re-)Indexing

Overall functions

• Uniform access to vocabularies

• Access & management of alignments

Experiment idea: test SW techniques for flexibility, ease of re-use and linking models & data

Existing RDF best practices: SKOS animals

NT cats cats

UF domestic cats

RT wildcats

BT animals

SN used only for domestic cats domestic cats

USE cats wildcats

Existing RDF best practices: SKOS

Crucial features for a repository

• Vocabulary membership

• Cross-vocabulary mapping properties

Existing RDF best practices: OAEI

From Ontology Alignment Evaluation Initiative

• Mapping cells

– 2 entities being matched

– 1 relation type (any!)

– 1 measure

– Provide hook for annotations

Alignments between ontologies as set of cells

– Can also be annotated http://oaei.ontologymatching.org

Existing RDF best practices: OAEI

Need for a service API?

• Need for dedicated middleware: some reqs beyond basic data access are not met by standard SPARQL

– Full-text search on labels

– Ranking of results

– Access control/authentication

– Query complexity control

– LoD data publication strategy

– Other data exchange formats (JSON)

• APIs are also a good way to structure practices in a domain

API design

• API is inspired by both SKOS and OAEI APIs

• But dedicated to simple vocabularies

Not fully-fledged ontologies

• Dedicated to vocabularies and alignments

More than usual terminology repositories

• Alignments are for simple vocabularies

Restricting OAEI-based functions to SKOS mappings

Distributed service architecture

• Allowing to serve either vocabularies or alignments or both

Fitting different stakeholder missions/interests

• One service can sit on several others

Distribution thought as a scalability-enabler

Sends reassuring message re. access control

Distributed service architecture

CATCH service implementation

CATCH service implementation

Plus: many alignments automatically created in the STITCH project

Driven by “business” interests

E.g., KB has a list of relevant KOSs in its context

Doelgroep

-audience

NUR

Biblion

UNESCO class.

Brinkman

NBC class.

GTT

Dutch

Public

Libraries

Dutch

Booktrade

KB

Deposit

Coll.

KB overlap between book collections

(thickness indicates degree of overlap)

Vertical adjustment between a coll. and KOSs denotes KOSs' being used to describe that coll.

KB

Corporatie

+ Persoon

KB

Scientific

Coll.

BISAC subject codes

DDC

Dewey decimal class.

LCSH subject headings

LC

(US Nat.

Lib)

LC authority file

RAMEAU subject headings

BnF

(French

Nat. Lib) other classifications

SWD subject headings domain/ discipline classifications subject thesauri / subj. heading lists

DNB

(German

Nat. Lib) book collection datasets

Autorités

BNF

Personen namen datei person/ corporation data

Johan Stapel

Vocabulary and alignment browser

Deployment (1)

Deployment (2)

RAMEAU (French NL) as linked data

• Interlinked with LCSH (Library of Congress)

• Soon to SWD (German NL)

• Using manual mappings from the MACS project http://stitch.cs.vu.nl/repository

Deployment (3)

STITCH re-indexing prototype (ISWC 2009)

• Plugged onto KB cataloguing system

Lessons learnt

• Middleware is still useful

– To match real application requirements

– To gather communities of practice around new usages

• But SW tools really help building it

• Relevance of existing models like SKOS

– Only one part of SKOS unused (collections) and one extension required (concept scheme groups)

– Disclaimer: we were involved in SKOS 

• Interest from the Cultural Heritage domain

(Changing landscape of) Issues

• Some basic middleware functions like full-text search are now tackled by vendor-specific SPARQL ext.

We prefer it that way 

• Working out the distributed architecture is difficult

Progress on federated RDF repositories can be useful

• Versioning/changes MUST be addressed at a finegrained level (concepts)

Maybe the issue with the least mature solutions!

Already started!

Future work

CATCHplus: continuing CATCH efforts, bringing them even closer to production

New repository and interface

Current work

• Refinement of HTTP API

E.g., Possibility to search for pairs of related concepts, with constraints

Closer to SPARQL, but still limiting complexity

• Based on Openlink Virtuoso

– Disk-based implementation can handle huge datasets

– Built-in LOD function & full-text features

Current work

• Architecture is no longer distributed, for now!

Difficult conflict between requirements

– Some clients had requirements for SPARQL

– Federated SPARQL query is (was?) not yet mature

• Named graphs are being experimented

– For representing KOS data bundles (file upload)

– For contextualizing triples (one shortcoming of SKOS/RDF)

Thanks!

http://stitch.cs.vu.nl/repository

Download