CERIF Data Surgery University of Bath 9 February 2012 CERIF RDF metadata for Oxford DMPonline data management plans David Shotton, Richard Jones and Silvio Peroni Image BioInformatics Research Group Department of Zoology University of Oxford, UK http:/ibrg.zoo.ox.ac.uk e-mail: david.shotton@zoo.ox.ac.uk © David Shotton, 2012 Published under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Licence Data management plans – the DCC DMPonline tool https://dmponline.dcc.ac.uk/ Funders increasingly require applicants to submit data management plans with grant applications The DCC has created a data management planning tool, DMPonline, that enables researchers to create data management plans tailored to meet the requirements of different funders The Oxford DMPonline Project We have been funded by JISC to customize the DMPonline tool for use at Oxford, and to integrate it with the university’s research grant administrative system As an exercise in ‘drinking my own champagne’, I created a DMPonline data management plan to submit to the JISC with my grant application Here is the first paragraph of that eight-page plan: Some background - the JISC UMF DataFlow Project Researchers A DataStage data package consists of a number of data files accompanied by an RDF metadata manifest, with a SWORD v2 wrapper DataStage file system Researchers, other users SWORD deposit DataBank is a generic repository, and can be used to store things other that research datasets, for example data management plans (DMPs) DataBank repository More background - RDF metadata and linked data The principles are quite simple All entities (classes) and their relationships (properties) are identified and defined by unique URIs UIRs reference publicly available and commonly accepted structured vocabularies (ontologies) each relationship is expressed as a subject – predicate – object ‘triple’ the syntax defined by W3C’s Resource Description Framework (RDF) Examples: cerif:Project dcterms:title “The Open Citation Project” . cerif:Project foaf:homePage <http://opencitations.net> . Such statements can be combined into interconnected information networks (RDF graphs) – forming ‘linked data’ the truth content of each original statement is maintained thereby creating a web of knowledge, the Semantic Web The Oxford DMPonline Project - schematic The role of RDF metadata for DMPs Our plan is thus to treat DMPs as ‘data’, to wrap them in a SWORD data package along with some RDF metadata, and store them in a searchable instance of DataBank, the Oxford DMPBank, just as we would for research data packages from DataStage Initially, I thought we would need to develop our own CERIF-compliant ontology, DaMO, the Data Management Ontology, to permit the creation of appropriate metadata to accompany DMPs However, I was delighted to learn that the CERIF Linked Data Task Group was in the process of mapping CERIF to RDF, and so we now wish to align with that effort, rather than re-inventing the wheel As part of that, Silvio Peroni and I last month developed CERRO, the CERIF Roles and Relationships Ontology, and suggested some additions to the draft CERIF and SEMCERIF ontologies, as contributions to the LD effort I hope to have opportunity to present CERRO in tomorrow’s workshop What machine-readable metadata do we need for DMPs? N.B. These relate to the DMP itself, and the grant application and research project to which it relates, not to the datasets that the funded research project will create Data Management Plan Author, DMP title, creation date, version number, identifier (DOI or URI) Status after funding – DMP open, open if anonymized, or confidential Potential funding source Funding agency, URL, funding scheme / call, submission deadline Grant application to which DMP relates Lead applicant, lead applicant’s department Co-applicant, co-applicant’s department Title, university reference number, date submitted, decision date, decision Funded research project to which DMP relates Principal investigator, principal investigator’s department Co-applicant, co-applicant’s department Project title, funding agency, grant number, ref number, start date, end date DMP metadata mapping to ontologies Data management plan fabio:DataManagementPlan DMP author dcterms:author, cerro:author DMP title dcterms:title DMP creation date fabio:hasCreationDate DMP version number prism:versionIdentifier DMP identifier (DOI or URL) prism:doi, fabio:hasURL DMP status after funding: pso:confidential, pso:open-access, pso:anonymized Potential funding source cerif:Funding Funding agency cerif:FundingAgency Funding agency’s URL foaf:homepage Name of funding scheme / call Submission deadline cerif:hasCallIdentifier, cerif:hasCallName, cerif:hasFundingProgrammeName cerif:hasDeadline DMP metadata mapping to ontologies (continued) Funding application cerif:FundingApplication Lead applicant cerro:applicant Lead applicant’s department semcerif:Department (sub-classes of cerif:OrganizationalUnit) Co-applicant cerro:co-applicant Co-applicant’s department semcerif:Department Grant application title dcterms:title University reference number dcterms:identifier Date submitted cerif:hasSubmissionDate Decision date cerif:hasDecisonDate Application status cerif:hasStatus (“accepted” / “rejected”) DMP metadata mapping to ontologies (continued) Funded research project cerif:Project Principal investigator cerro:principal-investigator PI’s department semcerif:Department Co-investigator cerro:co-investigator Co-investigator’s department semcerif:Department Research project title cerif:hasProjectTitle Funding agency Funding agency’s grant number cerif:hasGrantNumber University reference number dcterms:identifier Project start date cerif:startDate Project end date cerif:endDate Project status cerif:hasFundingAgency cerif:FundingAgency cerif:hasStatus (“funded” / “unfunded”) Conclusion The creation of RDF metadata to accompany data management plans is straightforward [Note: the mappings on the preceding slides have been updated since 10th Feb] It uses CERRO, the new CERIF Roles and Relationships Ontology It requires two new temporal data properties in FaBiO, the FRBR-aligned bibliographic ontology It requires two new statuses in PSO, the Publication Status Ontology It requires a few new classes in CERIF and SEMCERIF ontologies, that Silvio Peroni and I have suggested in the following document: SPAR – Semantic Publishing and Referencing Ontologies http://purl.org/spar/