PIDs – a service for users Natasa Bulatovic Research and Development Max Planck Digital Library April 2012 This work is licensed under a Creative Commons Attribution 2.0 Germany License http://creativecommons.org/licenses/by/2.0/de/ PIDs at the Max Planck Digital Library MPG Landscape: Several milions objects to identify (maintained centrally by specialized repositories) Many more to identifiy (maintained elsewhere) Name 19.07.2012 A PID or a Persistently Identified Dilemma? We mostly agree about What to identify ? When to identify? But we discuss a lot about Why to identify (why not use just Cool URIs) ? Where to identify? To merge in my data? How? (a lot of work) What to do if it is already identified? Does a specialized PID system offers us benefits or just further headaches? Which system is better? Does the PID syntax allows me to ..? Name 19.07.2012 So what many of us users do? Leave it status-quo and hope to come back again at a later stage Just assign URLs and service them from our domains Sometimes we even use PURLs (e.g. for metadata schema profiles) Publishers use mostly DOIs, we keep these in our metadata We start building a new system: make thorough analysis of current PID systems in place We agree on Handle and use it, but only for some data We talk to researchers, Go to step 1 again Name 19.07.2012 What is our problem? Do we have a PID API for users? Simple Well documented and understandable Leverage the power of HTTP (REST style) Provide additional value: client libraries and other services Note: not a global resolver for various PID systems Name 19.07.2012 A practical example APSR SWORD 1.2-compatible OJS Plugin http://pkp.sfu.ca/support/forum/viewtopic.php?f=28&t=3877 screencast arXiv 1.3-compliant endpoint: http://arxiv.org/help/submit_sword Feedforward – personal information environment, with a SWORD interface , among other features. SWORD Widget – For Netvibes, IGoogle and embedding in web pages The Depot – SWORD-compliant. Foresite – using SWORD to deposit ORE resource maps describing journals within JSTOR into a DSpace repository. Biomedcentral’s Open Repository – implementing a SWORD interface Intrallect – desktop drag and drop tool based on SWORD Microsoft Article Authoring Add-in for Word 2007 – allows repository deposit direct from Word. Microsoft Zentity Research Output Repository platform supports SWORD deposit Microsoft Client code – Microsoft Office SWORD deposit plugin http://www.codeplex.com/OfficeSWORD Microsoft eJournal Service (Alpha) and Research Output Repository Platform (Beta): http://www.microsoft.com/mscorp/tc/scholarly_communication.mspx SOURCE project BibApp – SWORD Ruby Client http://code.google.com/p/bibapp/ Facebook client http://fb.swordapp.org/ ICE-TheOREM – has demonstrated ‘ORE-over-SWORD’ TARDIS, the Australian Repository for Diffraction Images is implementing a SWORD interface: http://tardis.edu.au/wiki/index.php/TARDIS2 PublicationsList.org is using SWORD for deposit into EPrints http://publicationslist.org/ EM-Loader project has used SWORD for batch deposit http://publicationslist.org/emloader/emloader-report-sword-experiences.html Max Planck Digital Library’s eSciDoc solution, ‘PubMan’ has implemented SWORD http://colab.mpdl.mpg.de/mediawiki/PubMan_Sword Windows SWORD desktop client created by Hrvoje Jerković http://dspace-depositapp.blogspot.com/ CUNY, The City University of New York Libraries are using SWORD for deposit into DSpace The National Strategies are using SWORD to deposit from their K-Int tagging tool into Drupal The Collections Trust‘s Culture Grid is using SWORD for ‘RESTful deposit’. Technical specification (PDF) SWORD interfaces in various installations of DSpace, EPrints, Fedora and IntraLibrary The YODL-ING project at York is developing a SWORD-based ‘one-stop’ deposit client EU PEER Project will be implementing SWORD for deposit http://www.peerproject.eu/ The CLASM project will be developing a SWORD plugin for Moodle http://dablog.ulcc.ac.uk/category/projects/clasm/ The National STEM Centre is implementing SWORD for client and bulk deposit … Name • Different technologies • Different repositories • Code Libraries for easy adoption by repositories • Easy to use • Easy to understand 19.07.2012 An Idea Agree on common API Plenty of experience already exists among PID providers Implement additional service interface (no need to modify already established ones) the interface must cover basic PID system functionality the interface may cover particular PID system (or community) specifics RESTful - a PID is a Resource as well – that makes it different from a URL or just any string used to identify something Clear definition e.g. PID services ontology Think about value-added services Name 19.07.2012 Actors Service provider The system manages the PID Resource provider The user who acquires PID for own resources HTTP Requests GET retrieve information POST create new resource PUT update existing resource DELETE – not used for now Name 19.07.2012 Ontology •Service Provider A user perspective •Context – specific dimension offered by the Service Provider (e.g naming authority, DOIprofile etc.). At least 1 must exist •PID – the persistent identifier •Type – the type of the persistent identifier (e.g. Handle, URN, URL…) •Resolution – a resolution to a particular representation of a resource. •Metadata profiles- metadata profiles maintained within the context. Different contexts may have different metadata profiles. Metadata may be associated with PIDs depending on context. •Extensions (optional) - specific extensions offered within a context. Within a service, different contexts may have different extensions Name 19.07.2012 On Resolution A Resolution could be associated with the following attributes: The URL to which to resolve to Default (true, false) – if a particular resolution is default or not Category (e.g. metadata, content, fragment, …) Format (user defined URL to link to the representation format related to the resolution) And perhaps some Linked Data support in addition? Outgoing Accept header definition (to facilitate content negotiation at the Resource provider side) Incoming Accept header definition (to facilitate content negotiation at the Service provider side) Name 19.07.2012 Service level operations <text> - placeholder for a concrete value italictext - example name of the operation Usually implemented or not GET <service> Resolves to the home page of the service presentaion Example: GET http://handle.gwdg.de GET <service>/explain Resolves to a service description i.e. “service document” The service document is an e.g. RDF/XML formatted document which informs the user agent about the type of the PID system, supported formats, system operations, available contexts, metadata profiles, accept headers, info on content negotiation support Note • The second form of the interface could optionally be avoided by content negotiation or simply by using LD principle such as: GET <service>/data. Same for all further usage of “explain” Name 19.07.2012 PID retrieval/resolution operations GET <service>/<PID> Resolves to the default resolution and default representation format of a resource associated with this <PID> Example: GET http://pid.gwdg.de/11858%2F00-001Z-0000-0001-41F3-C Note: the basic PID retrieval does not have to be structured by the context (unless the context is mandatory part of the PID value itself), as this has to be inherently handled by the PID service provider GET <service>/<PID>/explain Returns a XML/RDF document containing all-in-one service provider information about this particular <PID> PID properties e.g. type, context, links to extensions, links to the metadata profiles , links to the resolutions (and information about delivered format, potentially fragment pattern, description by the resource provider, accept headers) GET <service>/<PID>/policy See also http://tools.ietf.org/html/draft-kunze-ark-15#section-5.1.1 Name 19.07.2012 PID retrieval/resolution operations GET <service>/<PID>/<formatId> Resolves to the representation of a resource related to a particular format ID (i.e. “xml”, “rdf”, “html” …) <resolutions> <format> <id>xml></id> <operation> http://pid.gwdg.de/11858%2F00-001Z-0000-0001-41F3-C/xml </operation> <format-description> The XML representation of the identified resource </format-description> </format> <url>http://pubman.mpdl. mpg.de/item/escidoc:12345</url> <accept-header>text/xml</accept-header> </resolutions> Useful to facilitate content negotiation Useful to resolve to particular format of resource by resource provider Perhaps fragment patterns can be treated as a specific format Name 19.07.2012 Create/update a PID POST <service>/pid Input: takes input in defined format (e.g. XML, RDF/XML or JSON). The input contains necessary properties and metadata that are to be associated with the PID. Creates a new PID for the provided input, and returns the same result which user would get when invoking GET <service>/<pid> /explain operation for the newly created PID PUT <service>/<PID> Input: takes input in defined format (e.g. XML, RDF/XML or JSON). The input contains the compliete properties and metadata, with modified values (or old values) that are to be associated with the PID. Updates the existing PID for the provided input, and returns the same result which user would get when invoking GET <service>/<pid>/explain operation for the newly created PID Name 19.07.2012 User retrieval/maintenance operations GET <service>/me GET <service>/user/<id> Returns information about the user agent E.g. id, name, basic information, contexts allowed, could also be some derived metrics such as: no of PIDs created etc. PUT <service>/me PUT <service>/user/<id> Input: agreed input format containing data user can modify (e.g. RDF/XML, XML or JSON) Name 19.07.2012 Metadata information retrieval operations GET <service>/<PID>/metadata Returns links to the available metadata records the service provider system maintains locally for the provided <PID> <metadata> <profile> http://pid.gwdg.de/11858%2F00-001Z-0000-0001-41F3-C/metadata/profile-1 <profile> </metadata> GET <service>/<PID>/metadata/<profileId> Returns the available metadata record with <profileId> for the provided <PID> Name 19.07.2012 Searching GET <service>/search?query=<query> Allows for searching for PIDs according to the search criteria supported by querying standards e.g. SRW/CQL GET <service>/search/explain Provides a list of search fields that can be used as criteria in the query Name 19.07.2012 Extensions (summary) As Extensions could be considered: all PID system specific operations (i.e. available on context or service level), user community extensions (i.e. available on context or PID level) Depending on the purpose and aim of the extension, operations may differ, however, operations could be GET/PUT/POST <service>/extensions/<extension-id> For PID service-level specific extension operations GET/PUT/POST <service>/<pid>/extensions/<extension-id> For PID-level specific extension operations GET/PUT/POST <service>/<context-id>/extensions/<extension-id> For context-level specific specific extension operations Name 19.07.2012 Value added service for users Statistics and metrics http://de.wikipedia.org/wiki/QR-Code Where is my resource identified, linked, cited, mentioned? Communication with other PID systems Google search Which versions are out there My PID is also a resource User friendly interface for researchers Google search Client libraries to use the API Sustainability Focus on service and support i.e. best practices, how-tos Developing long-term relationships with user communities Name 19.07.2012 A user view only, however We need an API and easy to use system An API shall be Feasible for adoption Complete for basic operations (from end-user aspect) Clearly defined messages (XML, RDF or JSON) Used Name 19.07.2012 Thank you for your attention! Natasa Bulatovic, Max Planck Digital Library bulatovic@mpdl.mpg.de Name 19.07.2012