www.europeanaconnect.eu Multilingual Access to Online Content - the Europeana Experience Vivien Petras (Humboldt-Universität zu Berlin) With the help of many people involved in Europeana (referenced in the slides) Eurovoc Conference, 18-19 November 2010 Outline • Europeana – a brief introduction • Multilingual access to Europeana – approaches • Europeana Semantic Data Layer • Multilingual Alignments of Vocabularies • Semantic Search Engine Prototype Vivien Petras, Humboldt-Universität zu Berlin Eurovoc Conference, 18-19 November 2010 Europeana “A digital library that is a single, direct and multilingual access point to the European cultural heritage.” European Parliament, 27 September 2007 Vivien Petras, Humboldt-Universität zu Berlin Eurovoc Conference, 18-19 November 2010 Europeana Today • 13 million objects • 28 data aggregators • 1500 participating institutions • 200 partners • 35 FTE’s • 21 projects • 1 million visits in 2010 • 30,000 My Europeana signees • 2008: Prototype • 2010: Operational Service • Stable portal • Open Source Code • EuropeanaLabs • Public Domain Charter From: Cousins, Jill (2010). Europeana Overview. Europeana Open Cultures Conference, 14-15 October Amsterdam Vivien Petras, Humboldt-Universität zu Berlin Eurovoc Conference, 18-19 November 2010 Europeana Contributions by Country Different languages!(?) From: Cousins, Jill (2010). Europeana Overview. Europeana Open Cultures Conference, 14-15 October Amsterdam Vivien Petras, Humboldt-Universität zu Berlin Eurovoc Conference, 18-19 November 2010 Europeana Content Types Videos Sounds 1% 1% Goethe, Johann Wolfgang von Title: Goethe, Johann Wolfgang von Date: unknown Creator: Goethe, Johann Wolfgang von Description: Goethe, Johann Wolfgang von Texts Language: de-DE 38% Format: image/jpeg Source: SLUB/Deutsche Fotothek Images Rights: Deutsche Fotothek 60% Provider: Deutsche Fotothek ; Germany Identifier: http://www.deutschefotothek.de/obj70226592.html Subject: Bildnis; Bildniskatalog; Foto; Fotos; Books, Articles, Postcards, Folklore Portrait objects, Photography, Art Type: image Vivien Petras, Humboldt-Universität zu Berlin Eurovoc Conference, 18-19 November 2010 Multilingual Acess to Europeana • Interface • static pages • Search • query translation • (document translation) • Subject Browse (& Search) • Controlled vocabularies • Semantic Data Layer French English Spanish Dutch Portugese German Italian Polish Hungarian Swedish Vivien Petras, Humboldt-Universität zu Berlin Eurovoc Conference, 18-19 November 2010 Europeana Semantic Data Layer Doerr, M.; Gradmann, S.; Hennicke, S.; Isaac, A.; Van de Sompel, H. (2010). The Europeana Data Model (EDM). Vivien Petras, Humboldt-Universität zu Berlin Eurovoc Conference, 18-19 November 2010 Europeana Semantic Data Layer Bridging „isles of information“ by connecting objects from different domains via cross-vocabulary links. museum archive library Doerr, M.; Gradmann, S.; Hennicke, S.; Isaac, A.; Van de Sompel, H. (2010). The Europeana Data Model (EDM). Vivien Petras, Humboldt-Universität zu Berlin Eurovoc Conference, 18-19 November 2010 Semantic Data Layer Alignment Example Norwegian vocabulary SKOS Mapping skos:exactMatch Irish vocabulary From: Cousins, Jill (2010). Europeana Overview. Europeana Open Cultures Conference, 14-15 October Amsterdam Vivien Petras, Humboldt-Universität zu Berlin Eurovoc Conference, 18-19 November 2010 Multilingual Alignment: Approach • Identify and convert relevant semantic resources • Pivot vocabularies for relevant categories (subject, persons, places…) = multilingual and with wide coverage • E.g. UDC, DDC, VIAF, TGN, Geonames, Wordnets, dbPedia From: Isaac, Antoine; Schreiber, Guus (2010). Vrije Universiteit Amsterdam Approach to Multilingual Mapping of Vocabularies. Vivien Petras, Humboldt-Universität zu Berlin Eurovoc Conference, 18-19 November 2010 Multilingual Alignment: Approach • Align more specific vocabularies to the pivots = anchoring mappings • Finding instances of skos:exactMatch mappings • Vocabulary characteristics important for matching: • Lexical variance of lables (e.g. plural/singular, diacritics, multilinguality) • Preferred / alternative labels • Nature of hierarchy From: EuropeanaConnect Milestone 1.2.1 (2010). Specification of preferred terms identification methodology. Vivien Petras, Humboldt-Universität zu Berlin Eurovoc Conference, 18-19 November 2010 Multilingual Alignment: Approach • Methodology: • Conversion to SKOS/RDF • Application of different alignment methods: • Lexical matching • Structure-based matching • Instance-based matching • Filtering / disambiguation of matching candidates: • Analyzing children / parent matches • Combining alignments From: EuropeanaConnect Milestone 1.2.1 (2010). Specification of preferred terms identification methodology. Vivien Petras, Humboldt-Universität zu Berlin Eurovoc Conference, 18-19 November 2010 VUA Vocabulary Aligment Tool Amalgame • AMsterdam ALignment GenerAtion MEtatool • Uses EDOAL (Expressive and Declarative Ontology Alignment Language) or SKOS • Also provides pre- / post-mapping statistics and an evaluation tool From: EuropeanaConnect Milestone 1.2.2 (2010). Semantics of descriptions aligned (intermediary). Vivien Petras, Humboldt-Universität zu Berlin Eurovoc Conference, 18-19 November 2010 VUA Vocabulary Aligment Tool Amalgame http://semanticweb.cs.vu.nl/beta/amalgame/list_alignments • Skosified: en, fr, de, nl, hu • Mappings (>500,000): en, fr, nl • Mostly label matches Vivien Petras, Humboldt-Universität zu Berlin Eurovoc Conference, 18-19 November 2010 Europeana Semantic Search Engine http://eculture.cs.vu.nl/europeana/session/search Vivien Petras, Humboldt-Universität zu Berlin Eurovoc Conference, 18-19 November 2010 Europeana Semantic Search Engine Disambiguation of search terms Vivien Petras, Humboldt-Universität zu Berlin Eurovoc Conference, 18-19 November 2010 Europeana Semantic Search Engine Multilingual query expansion Vivien Petras, Humboldt-Universität zu Berlin Eurovoc Conference, 18-19 November 2010 Europeana Semantic Search Engine • Works created by matching • • • Clustering of search results • • • • • person Works related to matching person Works created by a teacher of matching person Works related to an artefact created by matching person Works created by an artist professionally related to matching person Works titled Works showing concept Works with matching Location …. Vivien Petras, Humboldt-Universität zu Berlin Eurovoc Conference, 18-19 November 2010 Next Steps • Adding more vocabularies from the content providers: • VIAF • Spanish and Polish subject heading lists • Switching metadata delivery to Europeana Data Model (EDM) format (2011) • And: linking with the cloud… Vivien Petras, Humboldt-Universität zu Berlin Eurovoc Conference, 18-19 November 2010 Europeana & Linked Open Data Information Spaces • DBpedia • PND and SWD (prototype) • Geonames • LCSH • … Doerr, M.; Gradmann, S.; Hennicke, S.; Isaac, A.; Van de Sompel, H. (2010). The Europeana Data Model (EDM). Vivien Petras, Humboldt-Universität zu Berlin Eurovoc Conference, 18-19 November 2010 Thank you. www.europeana.eu Vivien Petras, Humboldt-Universität zu Berlin Eurovoc Conference, 18-19 November 2010