Enrichment of Library Authority Files by Linked Open Data Sources Gerd Zechmeister Semantic Web Company – http://www.semantic-web.at Presentation agenda 1. 2. 3. 4. 5. About us LOD2 Project Demonstration Scenario Process & Results Summary & Outlook © Semantic Web Company – http://www.semantic-web.at/ 2 About us • Based in Vienna (privately held) • 20 specialists from several fields • Focus: Semantic (web) technologies & search applications – 1st project based on semantic technologies in 2001 – Foundation of Semantic Web School in 2004 Semantic Web Company GmbH since 2008 – PoolParty development started in 2007, on the market since 2009 © Semantic Web Company – http://www.semantic-web.at/ 3 © Semantic Web Company – http://www.semantic-web.at/ 4 PP Thesaurus Manager 2. 4. 3. 1. 5. 1. 2. 3. 4. 5. Each concept in one or many concept schemes Each concept has one URI Each concept has one ore more labels (Poly-)Hierarchical and non-hierachical relations Matching between concepts from various sources © Semantic Web Company – http://www.semantic-web.at/ 5 SKOSsy • Select DBPedia categories • Choose extraction depth, data to extract and format (TTL, TriG etc.) • Extract it and import it into PoolParty as Seed Thesaurus © Semantic Web Company – http://www.semantic-web.at/ 6 • FP7 project (2010-2014) • 15 partners (technology researchers, companies and service providers) from 11 European countries plus 1 associated partner from Korea • Coordinated by the AKSW research group at the University of Leipzig © Semantic Web Company – http://www.semantic-web.at/ 7 LOD Life-Cycle Management • Extraction of RDF from text, XML and SQL • Querying and Exploration using SPARQL • Authoring of Linked Data using a Semantic Wiki • Semi-automatic link discovery between Linked Data sources • Knowledge-base Enrichment and Repair © Semantic Web Company – http://www.semantic-web.at/ 8 Demonstration Scenario • Alignment – Example Data vs LOD resources in SKOS – Identification of matching concepts • Enrichment – Addition of matches to Example Data dump © Semantic Web Company – http://www.semantic-web.at/ 9 Demonstration Scenario • Applied tools and frameworks Tool/Framework Function Using SKOS Thesauri as graph/SPARQL endpoint Creating example data as graph/SPARQL endpoint Comparing data to detect matching concepts Extracting categories from DBPedia to import it as Thesaurus into PoolParty © Semantic Web Company – http://www.semantic-web.at/ 10 Demonstration Scenario • Example Data – – – Schlagwortnormdatei (SWD = keyword authority file) from DNB data dump 166.414 concepts in German with alignments to LCSH, RAMEAU etc. Expressed in SKOS (hierarchical and associative relations) © Semantic Web Company – http://www.semantic-web.at/ 11 Demonstration Scenario • SKOS vocabularies for alignment – Standard Thesaurus Economy (STW) • 6520 concepts with english/german prefLabel – European Union Thesaurus (EUROVOC) • 6797 concepts with multilingual prefLabel – Extracted concepts from DBPedia via SKOSsy: „Economy“ • 13294 concepts in German © Semantic Web Company – http://www.semantic-web.at/ 12 Process & Results: preparational steps 1. Download – SWD data dump from DNB server 2. Evaluation – SKOS compatibility 3. Transformation – SWD data as SPARQL endpoint 4. Vocabulary selection – Focus on Economy vocabularies © Semantic Web Company – http://www.semantic-web.at/ 13 Process & Results: Alignment • Specification in SILK workbench – Define data sources: SWD & EUROVOC – Define tasks: compare all skos:prefLabels and deliver all matching links – Initiate process and create output file © Semantic Web Company – http://www.semantic-web.at/ 14 SILK Workbench Alignment SWD vs EUROVOC © Semantic Web Company – http://www.semantic-web.at/ 15 SILK Workbench Alignment SWD vs EUROVOC © Semantic Web Company – http://www.semantic-web.at/ 16 Process & Results: Alignment SWD 166414 cs. 3440 matching links 2169 STW EUROVOC 6520 cs. 6797 cs. 1318 DPPedia Wirtschaft 13294 cs. © Semantic Web Company – http://www.semantic-web.at/ 17 Process & Results: Enrichment Upload of exactmatches to the SWD graph in Virtuoso © Semantic Web Company – http://www.semantic-web.at/ 18 Process & Results: Enrichment Subject Predicate Object <http://dnb.info/gnd/4000 107-6> <skos:exactMatch> <http://de.dbpedia.org/resource/Abfallwirtschaft> <http://dnb.info/gnd/4000 107-6> <skos:exactMatch> <http://eurovoc.europa.eu/1158> <http://dnb.info/gnd/4000 107-6> <skos:exactMatch> <http://zbw.eu/stw/descriptor/13325-0> © Semantic Web Company – http://www.semantic-web.at/ 19 SWD DBPedia EUROVOC STW © Semantic Web Company – http://www.semantic-web.at/ 20 Process & Results: Enrichment <skos:Concept rdf:about="http://d-nb.info/gnd/4000107-6"> <skos:definition xml:lang="de">Weiter als im Gabler definiert, auch für öffentliche Abfallwirtschaft</skos:definition> <dnb:hasCoordinatedConcept-of> <dnb:CoordinatedConcept> <dnb:coordination-of rdf:resource="http://d-nb.info/ddc-sg/360"/> <dnb:coordination-of rdf:resource="http://d-nb.info/gnd/4000107-6"/> <dnb:det2 rdf:resource="http://d-nb.info/ddc/class/363.728"/> </dnb:CoordinatedConcept> </dnb:hasCoordinatedConcept-of> <skos:related rdf:resource="http://d-nb.info/gnd/4000100-3"/> <skos:related rdf:resource="http://d-nb.info/gnd/4076573-8"/> <dcterms:identifier>(DE-588)040001075</dcterms:identifier> <dcterms:identifier>(DE-588c)4000107-6</dcterms:identifier> <skos:broader rdf:resource="http://d-nb.info/gnd/4220414-8"/> <skos:prefLabel xml:lang="de">Abfallwirtschaft</skos:prefLabel> <skos:exactMatch rdf:resource="http://de.dbpedia.org/resource/Abfallwirtschaft"> <skos:exactMatch rdf:resource="http://eurovoc.europa.eu/1158"> <skos:exactMatch rdf:resource="http://zbw.eu/stw/descriptor/13325-0"> </skos:Concept> © Semantic Web Company – http://www.semantic-web.at/ 21 Summary & Outlook • Playground for future scenarios – Linked Open Library Data – LOD2 technology stack components • Further applications – Executing tasks for regular updates – Link exchange with LOD providers – Integration of data and cross-media (e.g. geo-references, images, AV files) – Expansion of authority files for cataloguing (e.g. multilingual searches) © Semantic Web Company – http://www.semantic-web.at/ 22 Get in contact! Gerd Zechmeister Research & Development Manager g.zechmeister@semantic-web.at Semantic Web Company GmbH http://www.semantic-web.at/ http://poolparty.biz/ Mariahilfer Strasse 70/8 http://twitter.com/semwebcompany 1070 Vienna - Austria © Semantic Web Company – http://www.semantic-web.at/ 23