“Onderzoeksprogramma Biodiversiteit werkt” Introduction Data Management Plan The Dutch government recognises the importance of open access to research data from public funding and is committed to the terms and conditions set by the OECD on this matter1&2. NWO is subject to the international commitments of the Dutch government but also recognises, as an independent science funding organisation, the importance of open access to research data for science and society. The Internet, and established and new digital information infrastructures, create an environment in which the value of generated research data can be significantly enhanced by allowing multiple users to find, see and use research data. NWO therefore signed the “Berlin Declaration”3 and actively supports access to research data by means of the Internet and digital information infrastructures that are relevant for the activities and programmes of NWO. With regards to specifically biodiversity data; the Netherlands have signed the Memorandum of Understanding of the Global Biodiversity Information Facility (GBIF)4 and are committed to the “free and open access to biodiversity data” objectives of this international network. In order to evaluate the output of research proposals in terms of the (electronic) accessibility of data, applicants are requested to include a Data Management Plan with their research proposal. The Data Management Plan is expected to describe the data that will be authored as well as how the data will be managed and made accessible throughout its lifetime. The data management plan should cover several general elements: - the types of data to be authored; - the standards that would be applied for primary data, metadata, etc.; - provisions for archiving and preservation; - access policies and provisions; - plans for eventual transition or termination of the data collection in the long-term future. This set of elements provides a basic view of what NWO is likely to expect in a data management plan. The specific sections, remarks and questions below act as a more detailed guideline. Information that is not mentioned in this document but is considered to be relevant for the Data Management Plan should obviously be mentioned. Please bear in mind that the Data Management Plan acts as an information source for NWO, not as an internal action plan to operationalise the stages of data management in your research. The guidelines provided in this document are partly based on documentation generated by the UK’s Digital Curation Centre5. OECD Declaration on Access to research Data from Public Funding, 30 March 2004 – C(2004)31/REV1 OECD Recommendation of the Council concerning Access to Research Data form Public Funding, 14 December 2006 – C(2006)184 3 Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities: http://oa.mpg.de/openaccess-berlin/berlin_declaration.pdf 4 GBIF Memorandum of Understanding (2007–2011): http://www2.gbif.org/mou07-11.pdf 5 The Digital Curation Centre, UK: http://www.dcc.ac.uk/ 1 2 -1- Requirements Data Management Plan 1) Introduction and context 1.1 1.2 1.3 Basic research information (title, summary, duration, partners etc.). What are the aims and purpose of the research? List of related data policies. E.g. is the research co-funded by another funding body with a specific data policy? Is the research subject to your institute’s data policy? Is the research subject to certain data restrictions because you work in protected area’s and / or with protected species? 2) Legal, rights and ethical issues 2.1 2.2 2.3 2.4 Who owns the Intellectual Property and Copyright? Is this set in your employers contract? How will the data be licensed? E.g. Creative Commons, attribution? What are the ethical and privacy issues? How will these be resolved? E.g. names of data collectors, anonymisation of data, consent agreements. What is the dispute resolution process and / or mechanism for mediation? 3) Access, data sharing and reuse 3.1 3.2 3.3 How will data be made available? What is the process for gaining access? Are any permissions & restrictions placed on data? - Mention the use of data networks such as GBIF for species occurrence data, GenBank for genetic sequences, CBOL for DNA bar coding data, ILTER for ecological data, etc. - Specify the resolution of data access, e.g. metadata only, data with blurred geographical info, data in full detail, etc. - Mention the technical accessibility and related process for access. Examples: listed on website and available upon request, downloadable from website, available through webservices, etc. Under what conditions are the data available? Mention any kind of legal procedures or agreements that are in place. How can / will the data be shared and re-use? Which bodies / groups are likely to be interested? What are the foreseeable uses? - Related to 3.1; mention through which networks and for which community the data will be specifically available. - Are there specific disciplines and/or research communities that will benefit from the publication of your (raw) data? - Are there specific none-scientific communities that will benefit from your published data? E.g. nature conservation, eco-tourism, gardeners, etc. Are there embargo periods? How is the release timeframe justified? Include note on right-of-first-use for original data collector / creator / investigator. E.g. data available after two year, data available after all primary publications, data release subject to institutional policy, etc. 4) Data collection / development methods 4.1 What does 'data' comprise for the research? Data description incl. volume, type, content to be created etc. - Examples: natural species occurrence data for indigenous trees, genetic data for mouse -2- 4.2 4.3 4.4 4.5 4.6 4.7 4.8 populations, plant community data in Dutch wetlands, capture re-capture data for fish populations in lake IJsselmeer, radio-tracking data for wild geese, historic natural history collection data, etc. Do mention the less obvious data that come with the biological data such as images, shape files, etc. - Try to quantify volumes of data, for example: data collection concerns 5 years of monthly population sampling at 25 sites with an estimated number of xx recordings throughout the research period - Taxonomic, geographical and temporal range of the data. (These are important features if you only release the metadata for your research.) For example: Family Mustelidae, The Netherlands and Belgium, 2010-2015 Have you surveyed existing data? (incl. 3rd party data) What can be used / extended? Are there any access issues? What is the ‘added value’ to reuse? Why does new data need to be created? What is the relationship between new dataset(s) and existing data? How will you manage interoperability i.e. what methods will be used to integrate the data being gathered in the project with pre- existing data sources? How will you capture / create the data? incl. content selection, instrumentation, technologies and approaches chosen, methods for naming, versioning etc. How will metadata and documentation be captured? What form will it take? What standards will be used? What contextual details are needed to make data meaningful? Mention the taxonomic hierarchy you have used for taxonomic ordering. Why have you chosen particular standards and approaches? E.g. recourse to staff expertise, Open Source, accepted domain-local standards, widespread usage, institute’s policy, etc. What criteria will be used for Quality Assurance / Management? E.g. ISO standards, documentation, calibration, taxonomic validation, general data validation, monitoring, transcription metadata, etc. Are you using Digital Object Identifiers (DOI’s), Life Science Identifiers (LSID’s), or any other kind of persistent identifiers for your data or datasets? 5) Data standards 5.1 5.2 5.3 5.4 5.5 Describe data types e.g. experimental measures, qualitative, raw, processed, etc. List all data standards you are specifically using, e.g. Ecological Metadata Language (EML) for ecological data, DarwinCore for species occurrence data, ISO 19136:2007 for geographic information, etc. Which file formats and platforms will be used and why? E.g. recourse to staff expertise, Open Source, accepted standards, widespread usage, etc. How do data creation decisions take account of end user needs? Have you thought about the data interoperability on forehand? Does the research takes account of specific data directives, such as the EU’s Inspire Directive6 for spatial information? 6) Short-term storage and data management 6.1 6.2 6 Anticipated data volumes? (Mega – Gigabytes?) Where will the data be stored? On what media? Who will be responsible? How will it be transmitted? (encryption if appropriate) - Mention the use of specific data storage packages, e.g. TurboVeg for plant community data. http://inspire.jrc.ec.europa.eu/ -3- 6.3 6.4 6.5 - Mention more general platform information, e.g. customised access database, oracle database, etc. How will access arrangements and data security be managed? How permissions, restrictions and embargoes are enforced? Note on sensitive data, storage on off- network mobile devices etc. How regularly, by whom, and how will data be backed up? Appraisal and retention timeframes, ideally with definite figures. (N.B. this may simply point to relevant institutional or funding body requirements / policies: political, temporal, commercial, legal.) 7) Deposit and long-term preservation 7.1 7.2 7.3 7.4 7.5 7.6 7.7 What is the long-term strategy for maintaining, curating and archiving the data? On what basis will data be selected for preservation? How long will data be kept? (ideally with definite figures) (N.B. this may simply point to relevant institutional or funding body requirements / policies: political, temporal, commercial, legal.) How will you dispose / transfer sensitive data? Justification of decisions. How will data be prepared for preservation / data sharing? (incl. anonymisation if appropriate) Where and how data will be archived e.g. deposit in public repository or existing community database? Transmission of data (encryption if appropriate) What related information will be deposited? - references, reports, research papers, fonts, original bid proposal, etc What metadata / documentation will be created at each stage of ingest / transformation? – descriptive, structural, administrative, preservation etc. How will this be created and by whom? What procedures are in place for preservation and backup? How regular, by whom, methods used? E.g. format normalisation, migration, etc. 8) Resourcing 8.1 8.2 Staff / organisational roles and responsibilities for implementing this data management plan, incl. time allocations, project management of technical aspects, contributions of non-project staff etc. Financial issues (e.g. payments to service providers within institutions, payments to external data centres for hosting data, income derived from licensing data, etc.) 9) Compliance and review 9.1 9.2 How adherence to this data management plan will be checked or demonstrated? How and when this data management plan will be reviewed? 10) Agreement / ratification by stakeholders (if appropriate) 10.1 Statement of agreement (with signatures if required) Cees H.J. Hof, Coordinator Netherlands Biodiversity Information Facility (NLBIF) Amsterdam, 29 August 2011 -4-