Slides

advertisement
Semantic Web and Linked Data for
cultural heritage materials
Approaches in Europeana
Antoine Isaac
Vrije Universiteit Amsterdam
Europeana
DANS Linked Data and RDF workshop, Den Haag, July 28th 2010
A web of cultural heritage data?
?
?
The current portal
Towards semantic search: facets
Building a search engine on top of metadata is difficult
Intrinsic quality problems: correctness, coverage
Especially when data is so heterogeneous
100s of formats
From flat 5-fields records to 100-nodes XML trees
Language issue!
We currently use a simple interoperability format
Quick-win showing quickly its limits
Semantic ThoughtLab:
experimenting solutions
We can better use institutions’ original metadata
Accommodate their different practices
Data structures and semantics
Access objects via a semantic layer of vocabularies for
subjects, persons, places…
Towards semantics-enabled search
Building a "semantic layer" to help accessing content
Towards semantics-enabled search
• Enhance access to Europeana content by semantics
– Query expansion, clustering of results
• Exploiting various types of relations
– "located in", "lived in", "is more specific concept"…
• Semantics are already there, in metadata and "controlled
vocabularies" used in metadata
– Thesauri, classifications…
• Requires to make it properly machine-accessible
Prototype: Europeana Thought Lab
http://europeana.eu/portal/thought-lab.html
Semantic auto-completion
Clustering of results
Baseline: matching concepts' label
Metadata for the object
Controlled place name from a
vocabulary at the Rijskmuseum
A "more specific Egypte"?
A "more specific Egypte"?
Metadata for the object
A place more specific than the Egypt one
Semantic information on the Giza
place in the Rijskmuseum Vocabulary
Following other relations
Following other relations - creator
Metadata for the object
Controlled person name from a
vocabulary at the Rijskmuseum
Following other relations - match
Information on Gustave Le Gray
from the Rijskmuseum Vocabulary
Matched to a "Gustave Le Gray"
from another Vocabulary
Following other relations – death place
Information on Gustave Le Gray from
the Union List of Artist Names (Getty)
Following other relations – death place
Information on Cairo from the Thesaurus
of Geographic Names (Getty)
Matched to "Cairo" from another vocabulary…
• A hell of relations?
• Well, they were in the original data, we just had to make them
explicit!
• Cultural Heritage institution often have a wealth of metadata
to share and exploit
Enabling bits & pieces
Exploiting semantic links in CH vocabularies
Rijksmuseum thesaurus:
Concept “Giza” narrower than concept “Egypte”
Mapping/alignment between CH vocabularies
Louvre’s “Égypte” equivalent to Rijksmuseum’s “Egypte”
Enrichment of existing metadata
The string “Egypt” in a metadata record indicates the concept of
Egypt defined in Rijksmuseum thesaurus
SKOS, Knowledge Organization Systems
and Linked Data
SKOS allows representing (simple) KOS data as RDF
animals
NT cats
cats
UF domestic cats
RT wildcats
BT animals
SN used only for domestic cats
domestic cats
USE cats
wildcats
SKOS, KOSs and LD
SKOS allows bridging across KOSs from different contexts
http://www.w3.org/2004/02/skos/
SKOS is used!
• Many Libraries – not a surprise!
•
•
•
Swedish National Library’s Libris catalogue and thesaurus http://libris.kb.se/
Library of Congress’ vocabularies, including LCSH http://id.loc.gov/
DNB’s Gemeinsame Normdatei (incl. SWD subject headings) http://d-nb.info/gnd/
Documentation at https://wiki.d-nb.de/display/LDS
•
•
•
•
BnF’s RAMEAU subject headings http://stitch.cs.vu.nl/
OCLC’s DDC classification http://dewey.info/ and VIAF http://viaf.org/
STW economy thesaurus http://zbw.eu/stw
National Library of Hungary’s catalogue and thesauri http://oszkdk.oszk.hu/resource/DRJ/404
(example)
• Other fields
•
•
•
•
•
•
•
•
•
Wikipedia categories through Dbpedia http://dbpedia.org/
New York Times subject headings http://data.nytimes.com/
IVOA astronomy vocabularies http://www.ivoa.net/Documents/latest/Vocabularies.html
GEMET environmental thesaurus http://eionet.europa.eu/gemet
UMTHES
Agrovoc http://aims.fao.org/
Linked Life Data http://linkedlifedata.com/
Taxonconcept http://www.taxonconcept.org/
UK Public sector vocabularies http://standards.esd.org.uk/ (e.g., http://id.esd.org.uk/lifeEvent/7 )
KOS Alignments?
Quite many of them are linked to some other resource
• LCSH, SWD and RAMEAU interlinked through MACS mappings
• GND linked to DBpedia and VIAF
• Libris linked to LCSH
• Agrovoc to CAT, NAL, SWD, GEMET
• NYT to freebase, DBpedia, Geonames
• dbPedia links are overwhelming
Hungary, STW, TaxonConcept, GND…
Enabling bits & pieces (c’ed)
Appropriate data model for objects
Generic constructs for creation, title, subject, etc. that are useful
for querying
Flexible data model
SW ontology linking features allow to keep close to original data
while having the generic notions above
Formal semantics, metadata schemas
and querying
?y
• The query:
vra:subject
skos:broader
?x
rma:Egypt
• The existing description:
rma:Cairo
rma:depicts
rma:gezicht_in_cairo
skos:broader
rma:Egypt
• Why is there a match?
For the Europeana ontology, every rma:depicts statement implies a
vra:subject statement
Where are the challenges?
• Semantic conversion of data
– Using appropriate data models
– Enriching legacy metadata
• Semantic alignments
– Between description ontologies
vra:depicts rdfs:subPropertyOf dc:subject
– Between concepts in controlled vocabularies
iconclass:bird skos:closeMatch ddc:bird
Alignment of semantic references
Where are the challenges?
• Semantic alignment (c'ed)
– Find correspondences between large vocabularies
– In a multilingual context
• Scalability
– Plugging the semantic features into the Europeana production
environment
The Europeana Data Model (EDM)
with input from Carlo Meghini, Guus Schreiber, Stefan
Gradmann, Maxx Dekkers, Steffen Hennicke, Viktor de
Boer et al. from Europeana V1
Rationale of EDM
• Precursor: ESE (Europeana Semantic Elements)
– represents lowest common denominator for object metadata
• convert datasets to Dublin-Core like standard
– forces interoperability
– major drawback: original metadata is lost
– most values are simple strings
• EDM goals
– preserve original data while still allowing for interoperability
– Semantic Web representation
• A community-driven effort
– Core experts, validation by representatives of various CH domains
EDM requirements & principles
1. Distinction between “provided object” (painting, book,
program) and digital representation
2. Distinction between object and metadata record
describing an object
3. Allow for multiple records for same object, containing
potentially contradictory statements about an object
4. Support for objects that are composed of other objects
5. Standard metadata format that can be specialized
6. Standard vocabulary format that can be specialized
7. EDM should be based on existing standards
EDM basics
• OAI ORE for organization of metadata about an object
• Dublin Core for metadata representation
• SKOS for vocabulary representation
+ Links to CIDOC-CRM and other shared ontologies
Dublin Core
• EDM uses the latest version of DCMI Metadata Terms for
a core of semantically interoperable properties
– And for backward compatibility, cf. ESE
• Specified with an RDF model
• Specialization of 15 original DC elements
• Can be specialized itself
– see requirement -> this is a crucial distinction with ESE
• Used in the richest way possible
– Pointers to resources
SKOS: vocabulary publication on the Web
• Already seen…
OAI ORE
• Specification:
http://www.openarchives.org/ore/1.0/toc.html
• Specified with an RDF model
• Four key notions (RDF classes)
– Object: the book/painting/program being described
– Aggregation: organizes object information from a particular
provider (museum, archive, library)
– Proxy: the object as viewed in a metadata record
– Digital representation: some digital form of the object with a Web
address
The Example - 1
41
The Example - 2
42
Aggregation organizes data of a provider
provenance
metadata
digital representation
aggregation
object
43
Proxy: metadata record for an object
proxy
object
metadata
44
Multiple aggregations = multiple providers
aggregation
of DMF
aggregation
of Louvre
45
Multiple aggregations = multiple providers
DMF proxy
DMF title
The “real” painting
Louvre Proxy
Louvre title
46
Europeana is “just” a special provider
with processed/enriched metadata
Europeana
aggregation
Europeana landing
page
enriched
metadata
47
A flexible model: different semantic grains
•
•
•
Cf. goal: “preserve original data while still allowing for interoperability”
Keep data expressed as close as possible to original model
Using mappings to more interoperable level
A flexible model: objects, events and the
rest
•
•
Preserving and exploiting original data also means being
compatible with descriptions beyond simple object level
Also crucial for semantic enrichment
A flexible model: object and events (2)
• Classes and Properties for event-, agent-, place-centric
modeling
• Instances of (local) vocabularies using skos:Concept
• Using RDF, EDM allows any kind of network to be
attached to a provided object.
A flexible model: object and events (3)
Advanced modeling in EDM
•
Relations between provided objects
– Part-whole links for complex (hierarchical) objects
– Derivation and versioning relations
– Relations between provided objects, for instance artistic derivation
between works;
• ens:isRepresentationOf
• ens:isNextInSequence
Linked data and cultural heritage?
The case for linked data in cultural heritage
Not just a more sophisticated way to represent data!
• Ease of getting data from external sources
– Just going to the URI and fetch the RDF there
• Ease of publishing data
– Linked data as a dissemination channel for Europeana data
• Ease of linking across datasets
– Linked data as a dissemination channel for Europeana data
• Object identification as cornerstone
– Records are just a side feature!
Encouraging open linked data adoption
From a movement supported by researchers
To much wider awareness
Open government initiatives, libraries…
Continuing effort: show benefits of collaborating to a
cultural heritage data web
Library Linked Data W3C incubator
http://www.w3.org/2005/Incubator/lld
Linked Library Cloud beginning 2008
[Ross Singer, Code4Lib2010]
http://code4lib.org/conference/2010/singer
Linked Library Cloud mid-2010
Plus:
• Germany NL
• Hungary NL
• STW
• GEMET
• NYT
• Agrovoc
[Ross Singer, Code4Lib2010] http://code4lib.org/conference/2010/singer
Is that a surprise?
Not really, let’s have a look at a real-world case…
KOS & collection environment @KB
Johan Stapel, Koninklijke Bibliotheek
A broad range of datasets
• That describe the same objects
• Or related objects
• Which are about similar subjects
• Which were made by the same persons
• Or related persons
• In the same places
• Etc…
Thanks!
aisaac@few.vu.nl
Europeana.eu team
Web and Media lab @ Vrije Universiteit Amsterdam
http://wiki.cs.vu.nl/web-media
EuropeanaConnect project
http://www.europeanaconnect.eu/
Download