Digital Library Management

advertisement
Semantic Web and Digital Library
Management
using Fedora-Commons
Ludovic Deravet
Software Architect @ I.R.I.S. S&E
Semantic Web and Digital Library
Management
PART 1:
INTRODUCTION
WEBOS
Evolution of the Web
SEMANTIC
WEB
SPARQL
RDF
WEB
Flash
XML
OWL
Distributed
Search
RSS
Java
DESKTOP
HTTP
Semantic
databases
HTML
Semantic Search
MacOS
SGML
SQL
SaaS
Websites
Windows
Email
Wikis
Weblogs
Groupware
FTP
File Servers
File Systems
Social Networking
Lightweight
collaboration
Keyword Search
Databases
1980-1990
1990-2000
2000-2010
2010-2020
PC
ERA
WEB
1.0
WEB
2.0
WEB
3.0
Intelligent
personal
agents
Management
Digital Library Management
• Bulk load
• Cataloguing
• Editing
• Storing
• Searching
Data
• Synonyms
• Homonyms
• Languages
Search
Editing
Managing and Searching
Information
• Parametric
Search
• Time Spent
Intelligent
Type
Too Many
Brainy
Miss
Smart
Lie
Too Few
Clever
Blue
Search Result(s)
• Reusability
• Complexity
What is Semantic Web?
Based on simple ideas
•
•
•
•
•
•
Information is Unambiguous
Data will become Findable
Data will be Reusable
Data will be Interoperable
Systems will be Flexible
Real time Information
Semantic Web Foundations
http://www.irislink.com/#company
SELECT ?subject ?label
WHERE {
?subject rdfs:subClassOf ?object .
OPTIONAL {
?subject rdfs:label ?label
}
URIs
I.R.I.S.
SPARQL
experts
D.M.
Triples
RDFS
Model and
Technologies
Notations
RDF
triples
Data
Exchange
Formats
OWL
<rdf:RDF … xmlns:contact=http://.../contact#>
<contact:Person rdf:about=http://.../contact#me>
<contact:fullName>…</contact:fullName>
<contact:mailBox rdf:resource=mailto:xxx@yyy/>
</contact:Person
</rdf:RDF>
How does it look like? (example)
Fedora-Commons Features
REST
SOAP
REST
Manage
API
SOAP
REST
Access
API
SOAP
Default
Search
REST
REST
RDF
Search
OAI
Provider
Fedora Repository Modules
Dissemination
Validation
Security
Resource
Index
Storage
Management
Registry
CMA
Files
RDBMS
RDF
What can you do with FedoraCommons?
Full control of your content
• Store whatever you want
• Provide easy access to your content
• Express relationships
• Enable permanence of your content
• Incorporate extensible components
• Scale your project up and down
How can we help you?
I.R.I.S. S&E – International Organisations
Ready for
the future
On top of
Technologies
Expertise &
Consulting
Experience
Strong
Partnerships
Questions?
Semantic Web and Digital Library
Management
PART 2:
ADVANCED
What topics?
Digital Library
Management
Semantic Web
Fedora-Commons
Semantic Web and Digital Library
Management
DIGITAL LIBRARY
What is Digital Library Management?
A solution to meet the needs for:
– Bulk load of digital assets
– Cataloguing
– Editing
– Storing
– Searching
Evolution of the Web
WebOS
Semantic Web
SPARQL
RDF
WWW
Flash
XML
OWL
Distributed
Search
RSS
Java
HTTP
Semantic
databases
HTML
Semantic Search
MacOS
Desktop
SGML
SQL
SaaS
Websites
Windows
Email
Wikis
Weblogs
Groupware
FTP
File Servers
File Systems
Social Networking
Lightweight
collaboration
Keyword Search
Databases
1980-1990
1990-2000
2000-2010
2010-2020
PC ERA
WEB 1.0
WEB 2.0
WEB 3.0
Intelligent
personal
agents
Problem – Searching and Managing
Information
• Synonyms
 have a different spelling but have the same (or quite) meaning
• Homonyms
 sound alike but have different meaning
 most of the time, they have a different spelling
• Languages
 might require lot of maintenance
 not always the same level of quality in each language
• Parametric Search
 It’s difficult to find things, especially something specific
 Too few = too many search results
 Too much = no search result
Problem – Searching and Managing
Information
• Time spent
 users spend too much time searching for what they are looking for
• Data reusability
 Limited ability to reuse data
• Managing the information is complex
 Within the same company, each department often manages its own
information
 Each department might have its own way of solving the problem
 Try to use technologies to solve the original problem (e.g. MDM)
 High volume of information requires human management of the information
 Using hierarchical solutions by classifying information
 Using horizontal solutions with tags
Semantic Web and Digital Library
Management
SEMANTIC WEB
What is Semantic Web?
The idea behind is “quite” simple:
– electronic information will become unambiguous
– data will become findable
– data will be reusable
– data will be interoperable
– systems will be flexible
– real time information
Foundations of Semantic Web
• URIs for everything
• Triples: <subject> <predicate> <object>
• Models and technologies (e.g. RDF)
• Data exchange formats (e.g. RDF/XML, NTriples)
• Notations (e.g. RDFS, OWL)
• SPARQL
Foundations of Semantic Web
(example)
Albert
SUBJECT
is the father of Philippe
PREDICATE
http://www.belgium.be/person
albert/profile.html
OBJECT
http://www.belgium.be/person
philippe/profile.html
http://www.belgium.be/rdf/
relationship#fatherof
in RDF notation
<rdf:RDF xmlns:be=http://www.belgium.be/rdf/relationship#>
Foundations of Semantic Web
(example)
RDFS
be:King rdfs:subClassOf be:Person
be:Prince rdfs:subClassOf be:Person
SPARQL
dc:subject rdf:type rdf:Property
PREFIX be: <http://www.belgium.be/ontology>
SELECT ?firstname ?lastname
WHERE {
?person a be:Person
?person be:firstname
?person be:lastname
}
How does it look like? (example)
Semantic Web and Digital Library
Management
FEDORA-COMMONS
OVERVIEW
What is Fedora-Commons?
Open Source Framework to Manage Digital Content
• Documents
• Images
• Video
• Audio
Long-term preservation
• Of Files
• Of Metadata
Based on Standards
• Dublin Core
• Metadata Encoding and Transmission Standard (METS)
• Resource Description Framework (RDF)
•…
What is Fedora-Commons?
Services Oriented
• No Monolithic Application
• Modularity and Extensibility
• Simple Integration (web interface)
Very Large Repository
• Scalable to Millions of objects
• Performance
Semantic Web and Digital Library
Management
FEDORA-COMMONS
IN DETAILS
Fedora-Commons Features
REST
SOAP
REST
Manage
API
SOAP
REST
Access
API
SOAP
Default
Search
REST
REST
RDF
Search
OAI
Provider
Fedora Repository Modules
Dissemination
Validation
Security
Resource
Index
Storage
Management
Registry
CMA
Files
RDBMS
RDF
Semantic Web and Digital Library
Management
Fedora Repository Modules
Dissemination
Validation
Security
RI
Store
Management
Registry
CMA
CMA – Content Model Architecture
Content Model
fedora-model: hasModel
Data (Digital
Object)
fedora-model: hasService
fedora-model: isContractorOf
Service
Definition
fedora-model: isDeploymentOf
Service
Deployment
Digital Object
PID
• Unique
identifier
Object
Properties
• State (Active,
Inactive,
Deleted)
• Label
• Owner
• Creation
Date
• Modification
Date
Reserved
Datastreams
• DC (Dublin
Core)
• RELS-EXT
• RELS-INT
Custom
Datastreams
• Datastream 1
• …
• Datastream n
Digital Objects Relationships - Example
ns:hasPhotoLocation
Operating
System
Address
Windows
dc:title
Rights
ns:isRunningOn
ns:hasAddress
IRIS Corporate
ns:hasName
ns:hasLicense
ns:hasText
Document
Server
ns:hasLogo
ns:supportFormats
ns:hasCompression
Compression
Documents
I.R.I.S. Group
dc:title
iHQC
ns:hasLogo
Prefix
Namespace URI
Description
dps
http://www.dps.org
Document Processing System terms
Semantic Web and Digital Library
Management
Fedora Repository Modules
Dissemination
Validation
Security
RI
Store
Management
Registry
CMA
Dissemination (Example)
Title: The ‘Great Migrations’
Owner: NGC
Date: 06/11/2010
1) http://website/pid/pdf
THUMBNAIL
VIDEO
3) Returns PDF representation (
dissemination) of the requested
resource
W
S
D
L
2) Calls service with PID
and format
Transformation
Service
XML
High Speed Videos Streaming
platform
Archive
notice
Semantic Web and Digital Library
Management
Fedora Repository Modules
Dissemination
Validation
Security
RI
Store
Management
Registry
CMA
Stores
Fedora Repository Modules
Storage
Low Level Storage Interface
Akubra
S3 LLS
Default Store
File-System
iRODs LLS
SRB LLS
Sun
Honeycomb LLS
Amazon
iRODS is handling the digital objects
Scalable (no limitation
Fedora-Commons
of files)
is handling the metadata / management
StorageTek 5800
Reliable (SLA 99.99%)
Distributed Management System
Distributed Management Storage
System
No file-system limitation
Stores can be located at different
places datasets
(geographically)
Manages
stored in a wide range of data
Cost Management (pay for what you use)
stores (file-system, network, databases…)
Large datasets
Semantic Web and Digital Library
Management
Fedora Repository Modules
Dissemination
Validation
Security
RI
Store
Management
Registry
CMA
Resource Index
Fedora Repository Modules
RI
iTQL
SPARQL
Triples
Store
Mulgara
Resource Index (RI) - Example
Library
dc:title
Video
dc:language
L1
ns:isMemberOf
dc:description
V1
Category
dc:author
Stephen Hawking’s
Universe
English
Explores the greatest
mysteries of the cosmos.
Stephen Hawking
ns:isCategoryOf
C1
ns:isCollectionOf
ns:isCollectionOf
ns:isCollectionOf
dc:title
Episode
Episode
Episode
ns:type
The Story of
Everything
E3
E1
Science
E2
dc:title
ns:format
dc:title
ns:format
Blue-Ray
Time Travel
Blue-Ray
Aliens
ns:format
Blue-Ray
Resource Index (cont’d) - Triples
Subject
Predicate
Object
<S.H.’s Universe>
<is a member of collection>
<Science Library Videos>
<Episode 1>
<is a member of collection>
<S.H.’s Universe>
<Episode 2>
<is a member of collection>
<S.H.’s Universe>
<Episode 3>
<is a member of collection>
<S.H.’s Universe>
<Episode 1>
<has a title>
<Aliens>
<Episode 2>
<has a title>
<Time Travel>
<Episode 3>
<has a title>
<The Story of Everything>
<Episode 1>
<has format>
<Blue-Ray>
<Episode 2>
<has format>
<Blue-Ray>
<Episode 3>
<has format>
<Blue-Ray>
….
….
….
Resource Index (cont’d)
Query
Result(s)
select $video
from <#ri>
where $video <fedora-model:hasModel> <info:fedora:Video>
"video"
info:fedora/o:V1
select $video, $episode, $title
from <#ri>
where $video <dc:title> $title
and $video <dc:creator> ‘Stephen Hawking’ and $episode
<ns:format> ‘Blue-Ray’
"video", "episode", "title"
info:fedora/o:V1, info:fedora/o:E1, Aliens
info:fedora/o:V1, info:fedora/o:E2, Time Travel
info:fedora/o:V1, info:fedora/o:E3, The Story of Everything
ITQL Queries (http://docs.mulgara.org/itqlcommands/index.html)
Semantic Web and Digital Library
Management
Fedora Repository Modules
Dissemination
Validation
Security
RI
Store
Management
Registry
CMA
Validation
• Applied when managing digital objects:
–
–
–
–
–
foxml 1.0
foxml 1.1
mets 1.0
mets 1.1
atom
<sch:pattern name="Preliminary Object Checks" id="preliminary">
<sch:rule context="foxml:datastream[@ID='AUDIT']">
<sch:assert test="count(foxml:datastreamVersion) = 1">The AUDIT Datastream
can only have ONE version since it is a non-versionable datastream.
(foxml: datastreamVersion)</sch:assert>
</sch:rule>
</sch:pattern>
• Use schematron
– rule-based validation language
– structural language expressed in XML
Security
• Legacy Authentication and Authorization
– Authorization: XACML (from Sun)
– Authentication: using server filters
• FeSL
– will replace XACML in a future release of FedoraCommons
– based on JAAS (Java Authentication and Authorization
Service
Management
• Primary APIs
– REST API (HTTP)
– API-A and API-M (SOAP)
• Secondary APIs
– Resource Index with iTQL and SPARQL (HTTP)
– OAI-PMH for metadata harvesting across repositories
(HTTP)
• Third-Party APIs
– MediaShelf with a Java client APIs
Semantic Web and Digital Library
Management
WHO’S GONE FEDORACOMMONS and USER
COMMUNITY
Users Community
ActiveFedora
Built on RubyFedora, this ruby gem provides an active record oriented way of working with
objects in Fedora
django-fedora
A python Django web UI for Fedora.
Djatoka Integration
A sample content model, service definition, and service deployment object demonstrating
how to integrate Fedora with the Djatoka JPEG2000 service.
DSpace2 Storage-Fedora
A Google Summer of Code 2009 project to persist DSpace 2 entities in Fedora
Enhanced Content Models
Extends Fedora's basic content models to add xml schema restrictions for datastreams and
ontology information, allowing restrictions to be expressed on relationships, in addition to
other features
eSciDoc
An eResearch environment developed specifically for use by scientific and scholarly
communities.
EZService
A utility to simplify the creation of Fedora Service Definition and Deployment objects.
Fedora-Planets integration
Provides a simple way add Planets (http://planets-project.eu) preservation services as
disseminators on fedora objects
fedora_rest
A Drupal module for building custom interfaces to Fedora Commons repositories.
FESL
Fedora Enhanced Security Layer is a community-driven project to refactor Fedora's
Authentication and Authorization functionality.
Users Community
funAPI
A Java web application that provides an unAPI implementation for the Fedora
Hydra
Will provide a "Lego Set" of web services and templates that can be used for a wide range of
content management workflows.
Honeycomb Storage Plugin
Allows for the use of the Sun StorageTek 5800 as the underlying storage for Fedora.
iRODS Storage Plugin
Allows for the use of iRODS as the underlying storage for Fedora.
Islandora
A Drupal module that allows users to view and manage objects stored in Fedora
JCR Connect Adapter
A JCR adapter for Fedora, implemented as a Jackrabbit persistence manager, that translates all
node/property storing and loading requests to Fedora API calls
JyFedoREST
a Jython package for creating and managing objects in a Fedora Repository via the REST API
Muradora
A web front-end for Fedora focusing on flexible access control
oreprovider
An OAI-ORE provider that provides Resource Maps for Fedora objects, using the Resource
Index.
PyFedora
A Python library for interfacing with Fedora's REST api
Users Community
PyFedoREST
a Python package for creating and managing objects in a Fedora Repository via the REST
API
pypi-fcrepo
A python module for working with Fedora repositories through the REST API.
python-fedoracommons
Python libraries for interfacing with Fedora's API-A, API-M, and RISearch interfaces.
python-fedoracommons-webarchive
A web interface providing search and browse for a FedoraCommons and Solr powered
archive
RODA
An OAIS-compliant, service-oriented digital repository system designed to preserve
government authentic digital objects
RubyFedora
A ruby gem for creating and managing objects in Fedora.
WORD-Fedora
Provides a SWORD 1.3 deposit interface for Fedora.
The Fascinator
A front end to Fedora commons repository that uses Solr to handle all browsing, searching,
and security.
Who’s gone Fedora-Commons?
•
•
•
•
•
•
Encyclopedia of Chicago
National Science Digital Library (NDSL)
New York Public Library (NYPL)
Bibliothèque nationale de France (BnF)
The Public Library of Science (PLoS One)
University of Prince Edward Island
Who’s gone Fedora-Commons?
Broadcasting and Media (1)
•
WGBH
Consortia (8)
•
ARROW Project
•
ASSESS Project, Australian National University Supercomputer Facility
•
Colorado Alliance of Research Libraries
•
DANS
•
National Institute for Technology and Liberal Education (NITLE)
•
OhioLINK Digital Resource Commons
•
Open Access Repositories in New Zealand
•
Phaidra, University of Vienna
Corporations (16)
•
4TIC S.L.
•
Acuity Unlimited
•
Aptivate
•
Atos Origin
•
Curalia, AB
•
Docuteam
•
FIZ Karlsruhe
•
Func. Internet Integration
•
Harris Corporation
•
Inter-Fermadof (Nigeria), LTD.
•
MediaShelf, LLC
•
Octagon Data Systems
•
Sun Microsystems - Honeycomb Group
•
Swiss Education and Research Network (SWITCH)
•
Trifork A/S
•
VTLS, Inc.
•
WebOPAC Application Pvt. Ltd.
Government Agencies (8)
•
U.S. Centers for Disease Control and Prevention
•
Danish National IT and Telecom Agency
•
Entidad Publica Empresarial Red.es
•
The Food and Agriculture Organization of the United Nations
•
Idaho National Laboratory
•
Kennisnet Ict op school
•
NASA Goddard Space Flight Center Library
•
National Library of Medicine (USA)
Medical Centers and Libraries (4)
•
Cornell University - College of Veterinary Medicine
•
Duke University - Medical Center Archives*
•
Memorial Sloan-Kettering Cancer Center - Department of Surgery &
Department of Public Affairs
•
Virginia College of Osteopathic Medicine
Who’s gone Fedora-Commons?
IT-Related Institutions (10)
•
Catholic University of Louvain
•
Centro Tecnológico INTECCA (Innovación y Desarrollo
Tecnológico de los Centros Asociados), UNED
•
Cornell University - Cornell Information Technologies
•
Macquarie University, E-Learning Center of Excellence
•
Northwestern University - Academic Technologies
•
Purdue University - Information Technology
•
University of North Florida
•
University of Queensland - Information Technology Services
•
University of San Diego
•
University of Sydney
•
University of Virginia - Information Technology and
Communications
Medical Centers and Libraries (4)
•
Cornell University - College of Veterinary Medicine
•
Duke University - Medical Center Archives*
•
Memorial Sloan-Kettering Cancer Center - Department of
Surgery & Department of Public Affairs
•
Virginia College of Osteopathic Medicine*
National/Public Libraries and Archives (16)
•
Alberta Library (TAL)
•
Boston Public Library
•
e-SpacioUNED
•
Library of Congress
•
National Library of Australia*
•
National Library of Estonia
•
National Library of France / Bibliothèque nationale de France
(BnF)
•
National Library of Portugal
•
National Library of Scotland
•
National Library of Singapore*
•
National Library of Slovakia*
•
National Library of Sweden
•
National Library of Wales / Llyfrgell Genedlaethol Cymru*
•
New York Public Library
•
Royal Netherlands Academy of Arts and Sciences
•
The State and University Library of Denmark
Professional Societies (2)
•
American Geophysical Union
•
Athens Archaeological Society*
Who’s gone Fedora-Commons?
Publishing (5)
•
CiteSeer
•
Digital Peer Publishing (DiPP)
•
Digital Publishing System (DPubS)
•
DiVA, Uppsala University Library
•
Public Library of Science (PLoS)
Research Groups and Projects (20)
Semantic and Virtual Library Projects (6)
•
Biodiversity Heritage Library
•
Carnegie Foundation for the Advancement of Teaching Knowledge Media Laboratory
•
Encyclopedia of Chicago
•
Encyclopedia of Life
•
National Science Digital Library (NSDL)
•
Open Learning Exchange Nepal
•
•
•
•
•
•
•
•
•
•
University Libraries and Archives (71)
•
…
•
…
•
…
•
•
•
•
•
•
•
•
•
•
Alfred Wegener Institute for Polar and Marine Research
Berlin Brandenburg Academy of Sciences and Humanities
California State University Los Angeles - CoolStateLA Enterpise
System
Centre de Calcul de L'Institut National de Physique Nucleaire et de
Physique des Particules
Columbia University - Center for International Earth Science
Information Network*
DART Project
Electronic Text Center, University of New Brunswick
eSciDoc Project - Max Planck Society and FIZ Karlsruhe
Interuniversity Consortium for Political and Social Research (ICPSR)
Kings College London - Center for e-Research
Kuwait Institute for Scientific Research*
Oxford University - Refugee Studies Center
RAMP Project
Royal Irish Academy, Digital Humanities Observatory
Semantic Technologies for the Enhancement of Case Based Learning
Project
TGE-Adonis, Centre national de la recherche scientifique
UK Archaeological Data Service
UK Data Archive
University of Illinois Urbana Champaign, Grainger Library
USQ/ARROW The Fascinator
Download