Semantic Web and Digital Library Management using Fedora-Commons Ludovic Deravet Software Architect @ I.R.I.S. S&E Semantic Web and Digital Library Management PART 1: INTRODUCTION WEBOS Evolution of the Web SEMANTIC WEB SPARQL RDF WEB Flash XML OWL Distributed Search RSS Java DESKTOP HTTP Semantic databases HTML Semantic Search MacOS SGML SQL SaaS Websites Windows Email Wikis Weblogs Groupware FTP File Servers File Systems Social Networking Lightweight collaboration Keyword Search Databases 1980-1990 1990-2000 2000-2010 2010-2020 PC ERA WEB 1.0 WEB 2.0 WEB 3.0 Intelligent personal agents Management Digital Library Management • Bulk load • Cataloguing • Editing • Storing • Searching Data • Synonyms • Homonyms • Languages Search Editing Managing and Searching Information • Parametric Search • Time Spent Intelligent Type Too Many Brainy Miss Smart Lie Too Few Clever Blue Search Result(s) • Reusability • Complexity What is Semantic Web? Based on simple ideas • • • • • • Information is Unambiguous Data will become Findable Data will be Reusable Data will be Interoperable Systems will be Flexible Real time Information Semantic Web Foundations http://www.irislink.com/#company SELECT ?subject ?label WHERE { ?subject rdfs:subClassOf ?object . OPTIONAL { ?subject rdfs:label ?label } URIs I.R.I.S. SPARQL experts D.M. Triples RDFS Model and Technologies Notations RDF triples Data Exchange Formats OWL <rdf:RDF … xmlns:contact=http://.../contact#> <contact:Person rdf:about=http://.../contact#me> <contact:fullName>…</contact:fullName> <contact:mailBox rdf:resource=mailto:xxx@yyy/> </contact:Person </rdf:RDF> How does it look like? (example) Fedora-Commons Features REST SOAP REST Manage API SOAP REST Access API SOAP Default Search REST REST RDF Search OAI Provider Fedora Repository Modules Dissemination Validation Security Resource Index Storage Management Registry CMA Files RDBMS RDF What can you do with FedoraCommons? Full control of your content • Store whatever you want • Provide easy access to your content • Express relationships • Enable permanence of your content • Incorporate extensible components • Scale your project up and down How can we help you? I.R.I.S. S&E – International Organisations Ready for the future On top of Technologies Expertise & Consulting Experience Strong Partnerships Questions? Semantic Web and Digital Library Management PART 2: ADVANCED What topics? Digital Library Management Semantic Web Fedora-Commons Semantic Web and Digital Library Management DIGITAL LIBRARY What is Digital Library Management? A solution to meet the needs for: – Bulk load of digital assets – Cataloguing – Editing – Storing – Searching Evolution of the Web WebOS Semantic Web SPARQL RDF WWW Flash XML OWL Distributed Search RSS Java HTTP Semantic databases HTML Semantic Search MacOS Desktop SGML SQL SaaS Websites Windows Email Wikis Weblogs Groupware FTP File Servers File Systems Social Networking Lightweight collaboration Keyword Search Databases 1980-1990 1990-2000 2000-2010 2010-2020 PC ERA WEB 1.0 WEB 2.0 WEB 3.0 Intelligent personal agents Problem – Searching and Managing Information • Synonyms have a different spelling but have the same (or quite) meaning • Homonyms sound alike but have different meaning most of the time, they have a different spelling • Languages might require lot of maintenance not always the same level of quality in each language • Parametric Search It’s difficult to find things, especially something specific Too few = too many search results Too much = no search result Problem – Searching and Managing Information • Time spent users spend too much time searching for what they are looking for • Data reusability Limited ability to reuse data • Managing the information is complex Within the same company, each department often manages its own information Each department might have its own way of solving the problem Try to use technologies to solve the original problem (e.g. MDM) High volume of information requires human management of the information Using hierarchical solutions by classifying information Using horizontal solutions with tags Semantic Web and Digital Library Management SEMANTIC WEB What is Semantic Web? The idea behind is “quite” simple: – electronic information will become unambiguous – data will become findable – data will be reusable – data will be interoperable – systems will be flexible – real time information Foundations of Semantic Web • URIs for everything • Triples: <subject> <predicate> <object> • Models and technologies (e.g. RDF) • Data exchange formats (e.g. RDF/XML, NTriples) • Notations (e.g. RDFS, OWL) • SPARQL Foundations of Semantic Web (example) Albert SUBJECT is the father of Philippe PREDICATE http://www.belgium.be/person albert/profile.html OBJECT http://www.belgium.be/person philippe/profile.html http://www.belgium.be/rdf/ relationship#fatherof in RDF notation <rdf:RDF xmlns:be=http://www.belgium.be/rdf/relationship#> Foundations of Semantic Web (example) RDFS be:King rdfs:subClassOf be:Person be:Prince rdfs:subClassOf be:Person SPARQL dc:subject rdf:type rdf:Property PREFIX be: <http://www.belgium.be/ontology> SELECT ?firstname ?lastname WHERE { ?person a be:Person ?person be:firstname ?person be:lastname } How does it look like? (example) Semantic Web and Digital Library Management FEDORA-COMMONS OVERVIEW What is Fedora-Commons? Open Source Framework to Manage Digital Content • Documents • Images • Video • Audio Long-term preservation • Of Files • Of Metadata Based on Standards • Dublin Core • Metadata Encoding and Transmission Standard (METS) • Resource Description Framework (RDF) •… What is Fedora-Commons? Services Oriented • No Monolithic Application • Modularity and Extensibility • Simple Integration (web interface) Very Large Repository • Scalable to Millions of objects • Performance Semantic Web and Digital Library Management FEDORA-COMMONS IN DETAILS Fedora-Commons Features REST SOAP REST Manage API SOAP REST Access API SOAP Default Search REST REST RDF Search OAI Provider Fedora Repository Modules Dissemination Validation Security Resource Index Storage Management Registry CMA Files RDBMS RDF Semantic Web and Digital Library Management Fedora Repository Modules Dissemination Validation Security RI Store Management Registry CMA CMA – Content Model Architecture Content Model fedora-model: hasModel Data (Digital Object) fedora-model: hasService fedora-model: isContractorOf Service Definition fedora-model: isDeploymentOf Service Deployment Digital Object PID • Unique identifier Object Properties • State (Active, Inactive, Deleted) • Label • Owner • Creation Date • Modification Date Reserved Datastreams • DC (Dublin Core) • RELS-EXT • RELS-INT Custom Datastreams • Datastream 1 • … • Datastream n Digital Objects Relationships - Example ns:hasPhotoLocation Operating System Address Windows dc:title Rights ns:isRunningOn ns:hasAddress IRIS Corporate ns:hasName ns:hasLicense ns:hasText Document Server ns:hasLogo ns:supportFormats ns:hasCompression Compression Documents I.R.I.S. Group dc:title iHQC ns:hasLogo Prefix Namespace URI Description dps http://www.dps.org Document Processing System terms Semantic Web and Digital Library Management Fedora Repository Modules Dissemination Validation Security RI Store Management Registry CMA Dissemination (Example) Title: The ‘Great Migrations’ Owner: NGC Date: 06/11/2010 1) http://website/pid/pdf THUMBNAIL VIDEO 3) Returns PDF representation ( dissemination) of the requested resource W S D L 2) Calls service with PID and format Transformation Service XML High Speed Videos Streaming platform Archive notice Semantic Web and Digital Library Management Fedora Repository Modules Dissemination Validation Security RI Store Management Registry CMA Stores Fedora Repository Modules Storage Low Level Storage Interface Akubra S3 LLS Default Store File-System iRODs LLS SRB LLS Sun Honeycomb LLS Amazon iRODS is handling the digital objects Scalable (no limitation Fedora-Commons of files) is handling the metadata / management StorageTek 5800 Reliable (SLA 99.99%) Distributed Management System Distributed Management Storage System No file-system limitation Stores can be located at different places datasets (geographically) Manages stored in a wide range of data Cost Management (pay for what you use) stores (file-system, network, databases…) Large datasets Semantic Web and Digital Library Management Fedora Repository Modules Dissemination Validation Security RI Store Management Registry CMA Resource Index Fedora Repository Modules RI iTQL SPARQL Triples Store Mulgara Resource Index (RI) - Example Library dc:title Video dc:language L1 ns:isMemberOf dc:description V1 Category dc:author Stephen Hawking’s Universe English Explores the greatest mysteries of the cosmos. Stephen Hawking ns:isCategoryOf C1 ns:isCollectionOf ns:isCollectionOf ns:isCollectionOf dc:title Episode Episode Episode ns:type The Story of Everything E3 E1 Science E2 dc:title ns:format dc:title ns:format Blue-Ray Time Travel Blue-Ray Aliens ns:format Blue-Ray Resource Index (cont’d) - Triples Subject Predicate Object <S.H.’s Universe> <is a member of collection> <Science Library Videos> <Episode 1> <is a member of collection> <S.H.’s Universe> <Episode 2> <is a member of collection> <S.H.’s Universe> <Episode 3> <is a member of collection> <S.H.’s Universe> <Episode 1> <has a title> <Aliens> <Episode 2> <has a title> <Time Travel> <Episode 3> <has a title> <The Story of Everything> <Episode 1> <has format> <Blue-Ray> <Episode 2> <has format> <Blue-Ray> <Episode 3> <has format> <Blue-Ray> …. …. …. Resource Index (cont’d) Query Result(s) select $video from <#ri> where $video <fedora-model:hasModel> <info:fedora:Video> "video" info:fedora/o:V1 select $video, $episode, $title from <#ri> where $video <dc:title> $title and $video <dc:creator> ‘Stephen Hawking’ and $episode <ns:format> ‘Blue-Ray’ "video", "episode", "title" info:fedora/o:V1, info:fedora/o:E1, Aliens info:fedora/o:V1, info:fedora/o:E2, Time Travel info:fedora/o:V1, info:fedora/o:E3, The Story of Everything ITQL Queries (http://docs.mulgara.org/itqlcommands/index.html) Semantic Web and Digital Library Management Fedora Repository Modules Dissemination Validation Security RI Store Management Registry CMA Validation • Applied when managing digital objects: – – – – – foxml 1.0 foxml 1.1 mets 1.0 mets 1.1 atom <sch:pattern name="Preliminary Object Checks" id="preliminary"> <sch:rule context="foxml:datastream[@ID='AUDIT']"> <sch:assert test="count(foxml:datastreamVersion) = 1">The AUDIT Datastream can only have ONE version since it is a non-versionable datastream. (foxml: datastreamVersion)</sch:assert> </sch:rule> </sch:pattern> • Use schematron – rule-based validation language – structural language expressed in XML Security • Legacy Authentication and Authorization – Authorization: XACML (from Sun) – Authentication: using server filters • FeSL – will replace XACML in a future release of FedoraCommons – based on JAAS (Java Authentication and Authorization Service Management • Primary APIs – REST API (HTTP) – API-A and API-M (SOAP) • Secondary APIs – Resource Index with iTQL and SPARQL (HTTP) – OAI-PMH for metadata harvesting across repositories (HTTP) • Third-Party APIs – MediaShelf with a Java client APIs Semantic Web and Digital Library Management WHO’S GONE FEDORACOMMONS and USER COMMUNITY Users Community ActiveFedora Built on RubyFedora, this ruby gem provides an active record oriented way of working with objects in Fedora django-fedora A python Django web UI for Fedora. Djatoka Integration A sample content model, service definition, and service deployment object demonstrating how to integrate Fedora with the Djatoka JPEG2000 service. DSpace2 Storage-Fedora A Google Summer of Code 2009 project to persist DSpace 2 entities in Fedora Enhanced Content Models Extends Fedora's basic content models to add xml schema restrictions for datastreams and ontology information, allowing restrictions to be expressed on relationships, in addition to other features eSciDoc An eResearch environment developed specifically for use by scientific and scholarly communities. EZService A utility to simplify the creation of Fedora Service Definition and Deployment objects. Fedora-Planets integration Provides a simple way add Planets (http://planets-project.eu) preservation services as disseminators on fedora objects fedora_rest A Drupal module for building custom interfaces to Fedora Commons repositories. FESL Fedora Enhanced Security Layer is a community-driven project to refactor Fedora's Authentication and Authorization functionality. Users Community funAPI A Java web application that provides an unAPI implementation for the Fedora Hydra Will provide a "Lego Set" of web services and templates that can be used for a wide range of content management workflows. Honeycomb Storage Plugin Allows for the use of the Sun StorageTek 5800 as the underlying storage for Fedora. iRODS Storage Plugin Allows for the use of iRODS as the underlying storage for Fedora. Islandora A Drupal module that allows users to view and manage objects stored in Fedora JCR Connect Adapter A JCR adapter for Fedora, implemented as a Jackrabbit persistence manager, that translates all node/property storing and loading requests to Fedora API calls JyFedoREST a Jython package for creating and managing objects in a Fedora Repository via the REST API Muradora A web front-end for Fedora focusing on flexible access control oreprovider An OAI-ORE provider that provides Resource Maps for Fedora objects, using the Resource Index. PyFedora A Python library for interfacing with Fedora's REST api Users Community PyFedoREST a Python package for creating and managing objects in a Fedora Repository via the REST API pypi-fcrepo A python module for working with Fedora repositories through the REST API. python-fedoracommons Python libraries for interfacing with Fedora's API-A, API-M, and RISearch interfaces. python-fedoracommons-webarchive A web interface providing search and browse for a FedoraCommons and Solr powered archive RODA An OAIS-compliant, service-oriented digital repository system designed to preserve government authentic digital objects RubyFedora A ruby gem for creating and managing objects in Fedora. WORD-Fedora Provides a SWORD 1.3 deposit interface for Fedora. The Fascinator A front end to Fedora commons repository that uses Solr to handle all browsing, searching, and security. Who’s gone Fedora-Commons? • • • • • • Encyclopedia of Chicago National Science Digital Library (NDSL) New York Public Library (NYPL) Bibliothèque nationale de France (BnF) The Public Library of Science (PLoS One) University of Prince Edward Island Who’s gone Fedora-Commons? Broadcasting and Media (1) • WGBH Consortia (8) • ARROW Project • ASSESS Project, Australian National University Supercomputer Facility • Colorado Alliance of Research Libraries • DANS • National Institute for Technology and Liberal Education (NITLE) • OhioLINK Digital Resource Commons • Open Access Repositories in New Zealand • Phaidra, University of Vienna Corporations (16) • 4TIC S.L. • Acuity Unlimited • Aptivate • Atos Origin • Curalia, AB • Docuteam • FIZ Karlsruhe • Func. Internet Integration • Harris Corporation • Inter-Fermadof (Nigeria), LTD. • MediaShelf, LLC • Octagon Data Systems • Sun Microsystems - Honeycomb Group • Swiss Education and Research Network (SWITCH) • Trifork A/S • VTLS, Inc. • WebOPAC Application Pvt. Ltd. Government Agencies (8) • U.S. Centers for Disease Control and Prevention • Danish National IT and Telecom Agency • Entidad Publica Empresarial Red.es • The Food and Agriculture Organization of the United Nations • Idaho National Laboratory • Kennisnet Ict op school • NASA Goddard Space Flight Center Library • National Library of Medicine (USA) Medical Centers and Libraries (4) • Cornell University - College of Veterinary Medicine • Duke University - Medical Center Archives* • Memorial Sloan-Kettering Cancer Center - Department of Surgery & Department of Public Affairs • Virginia College of Osteopathic Medicine Who’s gone Fedora-Commons? IT-Related Institutions (10) • Catholic University of Louvain • Centro Tecnológico INTECCA (Innovación y Desarrollo Tecnológico de los Centros Asociados), UNED • Cornell University - Cornell Information Technologies • Macquarie University, E-Learning Center of Excellence • Northwestern University - Academic Technologies • Purdue University - Information Technology • University of North Florida • University of Queensland - Information Technology Services • University of San Diego • University of Sydney • University of Virginia - Information Technology and Communications Medical Centers and Libraries (4) • Cornell University - College of Veterinary Medicine • Duke University - Medical Center Archives* • Memorial Sloan-Kettering Cancer Center - Department of Surgery & Department of Public Affairs • Virginia College of Osteopathic Medicine* National/Public Libraries and Archives (16) • Alberta Library (TAL) • Boston Public Library • e-SpacioUNED • Library of Congress • National Library of Australia* • National Library of Estonia • National Library of France / Bibliothèque nationale de France (BnF) • National Library of Portugal • National Library of Scotland • National Library of Singapore* • National Library of Slovakia* • National Library of Sweden • National Library of Wales / Llyfrgell Genedlaethol Cymru* • New York Public Library • Royal Netherlands Academy of Arts and Sciences • The State and University Library of Denmark Professional Societies (2) • American Geophysical Union • Athens Archaeological Society* Who’s gone Fedora-Commons? Publishing (5) • CiteSeer • Digital Peer Publishing (DiPP) • Digital Publishing System (DPubS) • DiVA, Uppsala University Library • Public Library of Science (PLoS) Research Groups and Projects (20) Semantic and Virtual Library Projects (6) • Biodiversity Heritage Library • Carnegie Foundation for the Advancement of Teaching Knowledge Media Laboratory • Encyclopedia of Chicago • Encyclopedia of Life • National Science Digital Library (NSDL) • Open Learning Exchange Nepal • • • • • • • • • • University Libraries and Archives (71) • … • … • … • • • • • • • • • • Alfred Wegener Institute for Polar and Marine Research Berlin Brandenburg Academy of Sciences and Humanities California State University Los Angeles - CoolStateLA Enterpise System Centre de Calcul de L'Institut National de Physique Nucleaire et de Physique des Particules Columbia University - Center for International Earth Science Information Network* DART Project Electronic Text Center, University of New Brunswick eSciDoc Project - Max Planck Society and FIZ Karlsruhe Interuniversity Consortium for Political and Social Research (ICPSR) Kings College London - Center for e-Research Kuwait Institute for Scientific Research* Oxford University - Refugee Studies Center RAMP Project Royal Irish Academy, Digital Humanities Observatory Semantic Technologies for the Enhancement of Case Based Learning Project TGE-Adonis, Centre national de la recherche scientifique UK Archaeological Data Service UK Data Archive University of Illinois Urbana Champaign, Grainger Library USQ/ARROW The Fascinator