Presentation

advertisement
CERIF Data Surgery
University of Bath
9 February 2012
CERIF RDF metadata for Oxford
DMPonline data management plans
David Shotton, Richard Jones and Silvio Peroni
Image BioInformatics Research Group
Department of Zoology
University of Oxford, UK
http:/ibrg.zoo.ox.ac.uk
e-mail: david.shotton@zoo.ox.ac.uk
© David Shotton, 2012
Published under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Licence
Data management plans – the DCC DMPonline tool
https://dmponline.dcc.ac.uk/

Funders increasingly require applicants to submit data management plans with
grant applications

The DCC has created a data management planning tool, DMPonline, that
enables researchers to create data management plans tailored to meet the
requirements of different funders
The Oxford DMPonline Project



We have been funded by JISC

to customize the DMPonline tool for use at Oxford, and

to integrate it with the university’s research grant administrative system
As an exercise in ‘drinking my own champagne’, I created a DMPonline data
management plan to submit to the JISC with my grant application
Here is the first paragraph of that eight-page plan:
Some background - the JISC UMF DataFlow Project
Researchers

A DataStage data package consists
of a number of data files
accompanied by an RDF metadata
manifest, with a SWORD v2 wrapper
DataStage file system
Researchers, other users
SWORD deposit

DataBank is a generic repository, and
can be used to store things other that
research datasets, for example data
management plans (DMPs)
DataBank repository
More background - RDF metadata and linked data


The principles are quite simple

All entities (classes) and their relationships (properties) are identified and
defined by unique URIs

UIRs reference publicly available and commonly accepted structured
vocabularies (ontologies)

each relationship is expressed as a subject – predicate – object ‘triple’

the syntax defined by W3C’s Resource Description Framework (RDF)
Examples:
cerif:Project dcterms:title “The Open Citation Project” .
cerif:Project foaf:homePage <http://opencitations.net> .

Such statements can be combined into interconnected information networks
(RDF graphs) – forming ‘linked data’

the truth content of each original statement is maintained

thereby creating a web of knowledge, the Semantic Web
The Oxford DMPonline Project - schematic
The role of RDF metadata for DMPs

Our plan is thus to treat DMPs as ‘data’, to wrap them in a SWORD data
package along with some RDF metadata, and store them in a searchable
instance of DataBank, the Oxford DMPBank, just as we would for research
data packages from DataStage

Initially, I thought we would need to develop our own CERIF-compliant
ontology, DaMO, the Data Management Ontology, to permit the creation of
appropriate metadata to accompany DMPs

However, I was delighted to learn that the CERIF Linked Data Task Group
was in the process of mapping CERIF to RDF, and so we now wish to align
with that effort, rather than re-inventing the wheel

As part of that, Silvio Peroni and I last month developed CERRO, the CERIF
Roles and Relationships Ontology, and suggested some additions to the
draft CERIF and SEMCERIF ontologies, as contributions to the LD effort

I hope to have opportunity to present CERRO in tomorrow’s workshop
What machine-readable metadata do we need for DMPs?
N.B. These relate to the DMP itself, and the grant application and research project
to which it relates, not to the datasets that the funded research project will create


Data Management Plan

Author, DMP title, creation date, version number, identifier (DOI or URI)

Status after funding – DMP open, open if anonymized, or confidential
Potential funding source



Funding agency, URL, funding scheme / call, submission deadline
Grant application to which DMP relates

Lead applicant, lead applicant’s department

Co-applicant, co-applicant’s department

Title, university reference number, date submitted, decision date, decision
Funded research project to which DMP relates

Principal investigator, principal investigator’s department

Co-applicant, co-applicant’s department

Project title, funding agency, grant number, ref number, start date, end date
DMP metadata mapping to ontologies


Data management plan
fabio:DataManagementPlan

DMP author
dcterms:author, cerro:author

DMP title
dcterms:title

DMP creation date
fabio:hasCreationDate

DMP version number
prism:versionIdentifier

DMP identifier (DOI or URL)
prism:doi, fabio:hasURL

DMP status after funding:
pso:confidential, pso:open-access,
pso:anonymized
Potential funding source
cerif:Funding

Funding agency
cerif:FundingAgency

Funding agency’s URL
foaf:homepage

Name of funding scheme / call

Submission deadline
cerif:hasCallIdentifier,
cerif:hasCallName,
cerif:hasFundingProgrammeName
cerif:hasDeadline
DMP metadata mapping to ontologies (continued)

Funding application
cerif:FundingApplication

Lead applicant
cerro:applicant

Lead applicant’s department
semcerif:Department (sub-classes of
cerif:OrganizationalUnit)

Co-applicant
cerro:co-applicant

Co-applicant’s department
semcerif:Department

Grant application title
dcterms:title

University reference number
dcterms:identifier

Date submitted
cerif:hasSubmissionDate

Decision date
cerif:hasDecisonDate

Application status
cerif:hasStatus
(“accepted” / “rejected”)
DMP metadata mapping to ontologies (continued)

Funded research project
cerif:Project

Principal investigator
cerro:principal-investigator

PI’s department
semcerif:Department

Co-investigator
cerro:co-investigator

Co-investigator’s department
semcerif:Department

Research project title
cerif:hasProjectTitle

Funding agency

Funding agency’s grant number
cerif:hasGrantNumber

University reference number
dcterms:identifier

Project start date
cerif:startDate

Project end date
cerif:endDate

Project status
cerif:hasFundingAgency
cerif:FundingAgency
cerif:hasStatus
(“funded” / “unfunded”)
Conclusion

The creation of RDF metadata to accompany data management plans is
straightforward

[Note: the mappings on the preceding slides have been updated since 10th Feb]

It uses CERRO, the new CERIF Roles and Relationships Ontology

It requires two new temporal data properties in FaBiO, the FRBR-aligned
bibliographic ontology

It requires two new statuses in PSO, the Publication Status Ontology

It requires a few new classes in CERIF and SEMCERIF ontologies, that Silvio
Peroni and I have suggested in the following document:
SPAR – Semantic Publishing and Referencing
Ontologies
http://purl.org/spar/
Download