A new Union Catalogue generation in Linked Open Data

34st ADLUG ANNUAL MEETING 2015
American University of Rome
21-23 October 2015
A new Union Catalogue generation
in linked open data
Tiziana Possemato
@Cult
A new model to create a Union catalogue
The project goals: the conversion in linked open data, the
publishing of the dataset in RDF, the production of a unique
portal with a userfriendly interface of bibliographic and
authority data coming from different universities in the south of
Italy:
•
•
•
•
•
•
Università
Università
Università
Università
Università
Università
degli
degli
degli
degli
degli
degli
Copyright 2008 @CULT. All rights reserved
Studi di Napoli Federico II (Napoli)
Studi di Napoli L’Orientale (Napoli)
Studi di Napoli Parthenope (Napoli)
Studi di Salerno (Salerno)
Studi del Sannio (Benevento)
Studi della Basilicata (Potenza)
The original data
The original data used in the project comes from different ILS (Aleph e
Sebina) and they are in Marc format (Unimarc):
•
•
•
•
•
•
Università
Università
Università
Università
Università
Università
degli
degli
degli
degli
degli
degli
Studi di Napoli Federico II => Aleph
Studi di Napoli L’Orientale => Sebina
Studi di Napoli Parthenope => Aleph
Studi di Salerno => Aleph
Studi del Sannio => Sebina
studi della Basilicata => Aleph
The project takes in consideration two different types of data:
• Bibliographic records
• Authority records
Copyright 2008 @CULT. All rights reserved
Biblioteche
Individuazione
Bib1
Raccolta
Bib2
Selezione
Bib4
Elaborazione
Bib3
Bib6
Bib5
(Elaborazione dati in RDF)
Search Engine
FRBR
RDF Store
Functional Requirements for Bibliographic
Records
LOD Cloud
Linked Open Services Platform
Copyright 2008 @CULT. All rights reserved
The portal: a three layers architecture
Person/Works
Contra academicos
De Beatâ Vitâ
Manifestations
Item
Copyright 2008 @CULT. All rights reserved
De civitate Dei contra paganos
The portal: a three layers architecture
1°- Person/Work: the set of data related to Person and Works, in RDF
(linked open data), saved in a SPARQL endpoint and made available by
specific search and presentation functions. The variant forms of name for
Person and the Work titles coming from local authority files and VIAF.
2°- Manifestations: bibliographic data indexed in SOLR, that is able to
produce new different data aggregations in facets (such as publication
date, language, publisher, edition, etc.); this layer gives to users a great
series of search and navigation functions.
3°- Item: holdings data, related to copy information, coming from local
OPAC or local system of each specific library.
Copyright 2008 @CULT. All rights reserved
The controlled name access point
Production of a unique name access point for Person names, containing all
preferred and variant forms of name, used both in authority files and in
different bibliographic catalogues, finally enriched with VIAF forms.
A) Different name forms for sant’Agostino in the authority file of
Università degli Studi di Napoli Federico II:
Augustinus, Aurelius <354-430>
< Augustinus Hipponensis <354-430>
< Aurelius Augustinus Hipponensis <354-430>
< Agostino d’Ippona <santo ; 354-430>
< Agostino, Aurelio <santo ; 354-430>
Copyright 2008 @CULT. All rights reserved
The controlled name access point
B) Different name forms for sant’Agostino in bibliographic data of Catalogo
Università degli Studi di Napoli Federico II:
Augustinus, Aurelius <354-430>
< Augustinus, Aurelius santo
< Augustinus Hipponensis 354-430
< Aurelius Augustinus Hipponensis 354-430
<Agostino d’Ippona santo ; 354-430
< Agostino, Aurelio santo ; 354-430>.
Copyright 2008 @CULT. All rights reserved
The controlled name access point
C) Different name forms for sant’Agostino in bibliographic data of Catalogo
Università degli Studi del Sannio:
Augustinus, Aurelius <santo>
< Augustinus, Aurelius santo ; ; pseudo
< Augustinus santo
< Augustinus, Aurelius
< Augustinus : von Hippo santo
< Agostino : d'Ippona santo
< Agostino santo
Copyright 2008 @CULT. All rights reserved
The controlled name access point
D) Different name forms for sant’Agostino in bibliographic data of Catalogo
Università degli Studi della Basilicata:
Augustinus, Aurelius santo
Augustinus, Aurelius santo ; 354-430
<Agostino, Aurelio santo>
<Augustinus Hipponensis>
<Aurelius Augustinus Hipponensis>
<Agostino d’Ippona santo>
Copyright 2008 @CULT. All rights reserved
The controlled name access point
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Augustinus, Aurelius <354-430>
Augustinus, Aurelius santo
Augustinus Hipponensis 354-430
Aurelius Augustinus Hipponensis 354-430
Agostino d’Ippona santo ; 354-430
Agostino, Aurelio santo ; 354-430
Augustinus, Aurelius santo
Augustinus, Aurelius santo ; ; pseudo
Augustinus santo
Augustinus, Aurelius
Augustinus : von Hippo santo
Agostino : d'Ippona santo
Agostino santo
… (etc.)
Copyright 2008 @CULT. All rights reserved
VIAF API
The name cluster for sant’Agostino:
The controlled name access point
The Person cluster for sant’Agostino:
•
•
•
•
•
•
•
•
•
•
Name cluster ID 245
•
•
•
•
UNION CATALOGUE
(search for Manifestations with •
•
Name ID cluster 245)
Copyright 2008 @CULT. All rights reserved
Augustinus, Aurelius <354-430> (AUF) (VIAF)
Augustinus, s., vesc. d’Ippona, 354-430 (VIAF)
Augustine, Saint, Bishop of Hippo (VIAF)
Augustinus, Aurelius santo (FED) (BAS)
Augustinus Hipponensis 354-430 (FED)
Aurelius Augustinus Hipponensis 354-430 (AUF) (FED) (BAS)
Agostino d’Ippona santo ; 354-430 (FED)
Agostino, Aurelio santo ; 354-430 (FED) (BAS)
Augustinus, Aurelius santo (BAS)
Augustinus, Aurelius <santo> (ORI) (SAL)
Augustinus santo (SAN)
Augustinus, Aurelius (SAN)
Augustinus : von Hippo santo (SAN)
Agostino : d'Ippona santo (SAN)
Agostino santo (SAN)
Augustinus, Aurelius santo ; 354-430 (BAS)
Retrieve Works associated to Person:
the controlled title access point
The Title cluster for Varro, Marcus Terentius (ID 69518):
•
•
•
•
•
•
•
•
Title cluster ID 855
•
•
•
UNION CATALOGUE
•
(search for Manifestations with
•
Title ID cluster 855)
Copyright 2008 @CULT. All rights reserved
De lingua latina (ID 986)
Saturae menippeae (ID 1314)
De vita populi romani ad Atticum (ID 2135)
De gente populi romani (ID 2136)
Antiquitates (ID 2137)
Logistorika (ID 2138)
De re rustica (ID 855)
Res rusticae (FED)
Économie rurale (SAL) (VIAF) (FED)
Del Camp (SAL)
Gespräche über die Landwirtschaft (FED)
Varro the farmer a selection from the Res rusticae
Rerum rusticarum libri tres (VIAF) (BAS)
(FED)
Layer 1 (Person/Works) & Layer 2 (Manifestations)
Association of Person and Title cluster ID to Manifestation record
=LDR
=001
=005
=010
=010
=100
=101
=102
=200
=210
=215
=225
=300
=300
=410
=500
=676
=700
=702
=702
=801
=997
01350nam 2200289 450
000005093
20150112111102.0
\\$a2-251-00329-6$bvol. 2
\\$a2-251-01400-4$bvol. 3
\\$a20010205d--------km-y0itay0103----ba
2\$alat$afre
\\$aFR
1\$aÉconomie rurale$fVarron
\\$aParis$c<<Les>> belles lettres
\\$a3 volumi$d20 cm
2\$aCollection des universités de France
\\$aTraduzione francese con testo latino a fronte
\\$aSulla copertina del terzo tomo è erroneamente riportata la dicitura "livre IV"
\0$12001$aCollection des universités de France
Title ID in $9
11$aDe re rustica / Varro, Marcus Terentius$9855
\\$a630.93$v(22. ed.)$9Agricoltura. Mondo antico
\1$aVarro,$bMarcus Terentius$f<116-27 a.C.>$069518
Person ID
\1$aHeurgon,$bCharles
\1$aGuiraud,$bCharles
\0$aIT$bUniversità della Basilicata - B.I.A.$gREICAT$2unimarc
\\$aUNIBAS
Copyright 2008 @CULT. All rights reserved
in $0
How enrich the Manifestations with Title of Work
Servomechanism practice in Università Parthenope => Uniform title in tag 500
=LDR
=005
=100
=101
=102
=105
=200
=210
=215
=500
=610
=676
=700
=801
=951
=997
00673nam0 2200217 450 =001 000021998
20090123105405.0
\\$a20090123d1954----km-y0itay50------ba
0\$aeng
\\$aGB
\\$aa-------001yy
1\$aServomechanism practice$fWilliam R. Ahrendt
\\$aLondon [etc.]$cMcGraw-Hill$dc1954
\\$aVII, 349 p.$cill.$d24 cm
10$aServomechanism practice$912013
1\$aServomeccanismi
\\$a629.8$v21$9Ingegneria dei controlli automatici
\1$aAhrendt,$bWilliam Robert$030
\0$aIT$bUNIPARTHENOPE$c20090123$gRICA$2UNIMARC
\\$aS 629.8/6$bS A, 252$cDSA$d2009
\\$aUNIPARTHENOPE
Copyright 2008 @CULT. All rights reserved
How enrich the Manifestations with Title of Work
Servomechanism practice in Università Federico II => Uniform title added in tag 995
=LDR
=001
=005
=100
=101
=105
=200
=205
=210
=215
=610
=676
=700
=702
=801
=995
=997
00652nam0 22002291i 450
000000022
20030908103336.0
\\$a20020821d--------km-y0itay50------ba
0\$aita
\\$ay-------001yy
1\$aServomechanism practice$fW.R. Ahrendt, C.J. Savant
\\$a2nd ed.
\\$aNew York$cMcGraw-Hill Book Company$d1960
\\$aXV, 566 p.$cill.$d24 cm
0\$aServomeccanismi
\\$a629.8
\1$aAhrendt,$bW. R.$030
\1$aSavant,$bClement J.$c<jr.>
\0$aIT$bUNINA$gRICA$2UNIMARC
\\$aServomechanism practice$912013
\\$aUNINA
Copyright 2008 @CULT. All rights reserved
APIs VIAF to retrieve data
An important step of the project to retrieve data from VIAF: using the
specific APIs, we retrieve the info for Person/Family/Corporate body and
Work/Expression layer.
For each VIAF ID we retrieve the following info:
a) Variant forms on name for each ID: they will be used to make
possible, to end users and search engine, to extend the search functions
using all possible forms of names, in different string texts and
alphabets.
b) Works associated to specific VIAf ID of Person: used to combine
Person with his Works, considering (filtering) only title works for wich
exists al least one Manifestation in the original catalogues.
Copyright 2008 @CULT. All rights reserved
The conversion process from Marc to RDF
RESOURCES
METADATA CREATORS
(Librarians, curators)
ALIADA
BROWSERS
(GOOGLE)
 Library
Management
System (ILS)
 Museum
Collection
Management
System
(MMS)
IT COMPANIES
OTHER PUBLIC
AND CULTURAL
INSTITUTIONS
LINKED DATA CLOUD
http://lod-cloud.net/
 Content
Management
System (CMS)
Copyright 2008 @CULT. All rights reserved
© 2015 Aliada Consortium 18
Data conversion with ALIADA
Dublin Core
Validation of Input Data
RDF output
MARCXML2RDF
ALIADA ontology
VALIDATION
RDF
DublinCore2RDF
translation
LIDO2RDF
USER
INTERFACE
Other
RDF-izer
RDF-izer
CONVERSION
RDF Triple Store
endpoint
LINKING
PUBLICATION
Linked Data
Server
Copyright 2008 @CULT. All rights reserved
Links Discovery
Creation CKAN
DataHub page
VALIDATION
Linked Dataset
Ontologies used in the project
Other ontologies added to Aliada
ontologies:
• DCMI Metadata Terms
• RDF Schema
• RDA elements
• BIBFRAME
Copyright 2008 @CULT. All rights reserved
The final result of conversion:
- RDF dataset for Persons and Works
- RDF dataset for Manifestations
(Elaborazione dati in RDF)
Search Engine
RDF Store
LOD Cloud
Linked Open Services Platform
Copyright 2008 @CULT. All rights reserved
Where we are (with the project)…
Task
Definition of project requirements
Status and Date
[closed]
Analysis of bibliographic and authority data [closed]
Data modelling (definition of ontologies
and mapping)
[closed]
Conversion data in RDF
[in progress] – Within 30.11.2015
Definition of a portal to use as prototype
[in progress] – Within 31.10.2015
Meeting with librarians to define better
portal search and discovery functions
Within 30.11.2015
Implementation activities
Within 31.12.2015
Go live
January 2016
Copyright 2008 @CULT. All rights reserved
The project as Union catalogue model
The Union catalogue of Campania and Basilicata Universities
in linked open data
The core for a Nation-wide Union Catalogue project
Copyright 2008 @CULT. All rights reserved
23
Union catalogue - Hypothesis of a project
1°Physical union catalogue: one LMS, one catalogue, one OPAC => Trento
Union Catalogue model
2°Central database (hub) with download of physical data (shared data, in
different original formats)
3°A network of libraries grouped in local hubs (each local hub served by
different LMS), linked to a Central Index system, core of the network (local
hub are connected to Central Index via API)
4°Union catalogue in RDF (linked open data) plus a complex API layer to
connect different ILS
Copyright 2008 @CULT. All rights reserved
RDF Store
LOD Cloud
Copyright 2008 @CULT. All rights reserved
API layer
RDF Union Catalogue
www.atcult.it
Grazie
Tiziana Possemato – @CULT
tiziana.possemato@atcult.it
Copyright 2008 @CULT. All rights reserved
Cel.: 3485810489