Data Management Planning ESRC funding applicants Version 1.0 July 2015 University of Bristol Research Data Service Image: London 360 from St Paul's Cathedral, Wikimedia, Public Domain SUMMARY After funding is awarded, grant holders are required to seek further advice and guidance from the UK Data Service. It is expected that ESRC-funded research data will be deposited with the UK Data Service or, in the INTRODUCTION case of small datasets, with either an ESRC data service provider or a responsible repository It is a widely held view that publicly-funded research within three months of the end of a grant. data is a public good, produced in the public interest A Data Management Plan (DMP) is typically and should, whenever possible, be openly available for required at the funding application stage. secondary scientific research. This guide is intended Within the DMP barriers to data sharing, along for Economic and Social Research Council (ESRC) with any measures you plan to take to overcome applicants who are required to submit a Data them, should be identified. Management Plan (DMP) along with their application. For sensitive data explicit mention of consent, anonymisation and potential access restrictions The ESRC research data policy2 consists of nine (see ESRC’s Framework for Research Ethics) underlying principles which align with the RCUK should be made in the DMP. If the date could be Common Principles on Data Sharing.3 Like many other highly sensitive, consider deposit with UK Data funding bodies, the ESRC expects grant holders, Service Secure Lab. whether partially or wholly funded, to generate robust ESRC expects researchers to investigate copyright data, ready for re-use and long-term preservation. issues and to attempt to gain copyright clearance Academic publishers also increasingly require that data so that data can be shared at the end of the which underpins a published research output (a project. Research Enterprise and Development1 journal article for instance) should be made available can assist with copyright issues. for validation purposes. Documentation should be provided alongside the data so others can understand it. A metadata A Data Management Plan (DMP), along with any record should be created upon deposit of the associated data management costs, is an integral part data and a persistent identifier obtained so data of all grant applications made to the ESRC (except for can be formally cited. applicants applying for studentships) and should be submitted alongside your main Je-S application. Your 1 2 Research Enterprise and Development, http://www.bristol.ac.uk/red/researchcommercial/copyright.html ESRC Research Data Policy. 2015, http://www.esrc.ac.uk/files/about-us/policies-and-standards/esrcresearch-data-policy/ 3 RCUK Common Principles on Data Sharing, http://www.rcuk.ac.uk/research/datapolicy/ 2 DMP should explain how you’ll manage any research and evaluated secondary sources of data before data that you plan to use or create. An assessment of considering primary research. The ESRC evaluate the DMP will be made as part of the general equally all applications for funding on the basis of assessment of your application. A poorly prepared scientific quality, regardless of whether the research DMP may have a detrimental effect on an otherwise intends to re-use existing data or to create new data. If strong application. Your DMP should describe: you are planning to create new data, then you should include in your DMP an analysis of the gaps identified any intentions you have for re-using existing data, between available and required data to show why new or justification of why new data needs to be data is necessary. generated Data sharing how data will be shared and potential barriers to data sharing, along with any measures you plan At the close of a funded project the ESRC data policy to take to overcome these difficulties stipulates that your research data must be formally consent, confidentiality, anonymisation and any deposited with the UK Data Service within three other appropriate ethical considerations months of the end of the grant. The ESRC will withhold the data formats you intend to use, along with a final grant payments as a penalty for not doing so. brief explanation of why you’ve chosen them the volume of data you expect to create Smaller datasets (for example a subset of data which methodologies for data collection and/or supports claims made in a journal article) may be processing lodged with an ESRC data service provider, or an data quality assurance procedures and appropriate responsible digital repository such as the storage/security arrangements plans to address University of Bristol Research Data Repository.4 If using copyright and intellectual property ownership of a non-ESRC data service provider, it is the grant the data holder’s responsibility to ensure a persistent identifier documentation and metadata, including relevant (such as a DOI) is provided for the data and to inform standards the UK Data Service of the published location. A project metadata record should also be created in the individuals with responsibility for implementing UK Data Service’s ReShare repository to maximise the your DMP, and how it will be monitored and discoverability of the data. There are many benefits to developed depositing research data with one of the ESRC data Re-use of existing data services, including the active promotion of your When assessing your grant application, ESRC reviewers research and services available for dealing with will be looking for evidence that you have considered sensitive data. 4 University of Bristol Research Data Repository http://data.bris.ac.uk/data/ 3 You must provide a statement on data sharing in the data sharing. In particular, the ESRC requires relevant section of the Je-S application form. Your researchers to demonstrate due diligence in three DMP should indicate exactly how this sharing will be areas before it will consider a waiver: achieved. Describe your plans to deposit your data with an ESRC data service or any other repository or when gaining informed consent, include consent for data sharing (see below) give reasons why this is not possible. The ESRC will allow an embargo period on data (generally no longer where needed, protecting participants’ identities by anonymising data than 12 months from the end of a grant) in order to allow grant holders to publish their research findings. If you plan to use an embargo period state this in your considering data access restrictions in the DMP In addition to the UK Data Service,7 the Secure Lab8 DMP. has been established to promote excellence in research by enabling controlled access to data deemed While the re-use of data is very much encouraged, it is too sensitive or confidential to be made openly recognised that some research data will be sensitive available. If you think your data will be suited for and unsuitable for sharing. It is the responsibility of the deposit with the UK Data Service Secure Lab, you researcher to consider confidentiality, ethics, security should contact them directly for confirmation before and copyright before beginning any ESRC-funded finalising your DMP. research. It may be that parts of the data that are sensitive cannot be shared, but the remainder can. Consent You should read the ESRC’s Framework for Research Obtaining permission to publish data from human Ethics,5 and anticipate and address any likely barriers research participants is essential even if data is to be to data sharing. More guidance is available from our anonymised before publication. This is because some document Sharing Research Data Concerning Human risk of re-identification may remain, even after Participants.6 anonymisation, and participants should be made If you believe that your research data cannot be aware that others outside of the research project may shared at all, you must provide justification for this. be able to view this data. Also, even if a participant has Waivers of deposit to ESRC data services are the right to withdraw from a study, it may not be exceptional, and the ESRC reserves the right to refuse possible to remove their data: the ESRC Framework for waivers if there is insufficient evidence that the Research Ethics states that all research should indicate applicant has fully explored all strategies to enable the point at which data will have been anonymised 5 6 https://data.bris.ac.uk/files/2015/02/Publicationsensitive.pdf 7 UK Data Service, http://ukdataservice.ac.uk/ ESRC Framework for Research Ethics. 2015, http://www.esrc.ac.uk/files/funding/guidance-for-applicants/esrc- 8 UK Data Service Secure Lab, http://ukdataservice.ac.uk/get-data/how-to- framework-for-research-ethics-2015/ access/accesssecurelab.aspx 4 and amalgamated and in certain circumstances cannot technological ‘dependencies’. These may be fairly then be excluded. If you will be gaining consent from common technologies such as a desktop PC, the participants for your research you should read the Windows 7 operating system and Adobe Reader 9 ESRC Framework for Research Ethics, which contains software. Or the technology required to access data guidelines on consent for publishing data. The might be rare and hard to acquire or even unique. You Research Data Service has also produced a guide to should address this problem by minimising the number sharing data involving human participants, which of technological dependencies involved in using your includes sample statements for consent forms. data. Data formats Where dependencies are inevitable you should favour ‘open’ technologies rather than proprietary ones. As part of your DMP you should state in which Proprietary technologies are owned by a vendor or format(s) your data will be collected, analysed and group of vendors. Commercial pressures may lead to stored (for example, Open Document Format, CSV file the withdrawal of a particular piece of commercial or Excel spreadsheet). Your own research needs must hardware or software, in favour of a new and possibly come first in selecting a data format. If you find that incompatible replacement. In contrast, ‘open’ you do need to use a non-standard format, you should technologies are supported by a community of users consider converting your data to a more widely re- and do not have the same commercial vulnerabilities. usable format once your own data analysis is complete. For example, if you intend to use analysis Your Case for Support should describe the actions you software such as NVivo, you should mention in your plan to take to ensure the quality of your proposed DMP that your data will be exported at the end of the research activities as a whole. The DMP is only project in the widely accepted forms of text files, concerned with the quality of your research data. spreadsheets and XML. If you’re unsure which file Quality should be considered whenever data is created formats to use, the UK Data Archive publishes a list of or altered, for instance at the time of data collection, 9 recommended deposit formats. These formats may data entry or digitisation. You should provide also be appropriate for use throughout your research. information about the procedures you will carry out to ensure that data quality is maintained, such as Open and proprietary technologies allocating time to validate data or entering values into A major barrier to data sharing is the widespread use prepared databases or transcription templates. of non-standard, highly specialised file formats. In Interview software can also help by verifying order to use any digital file, a number of digital consistency and detecting inadmissible responses. technologies must be available, which are known as 9 UK Data Archive File Formats Table, http://www.data- archive.ac.uk/create-manage/format/formats-table 5 guidance before your application is finalised. The back- Copyright and Intellectual Property up procedures, policies and controlled access If you are planning to use existing data as part of your arrangements used by the RDSF are of a very high research, the data may be subject to copyright or standard. If you do not intend to make use of RDSF, other restrictions which could prevent you from your storage provider’s back-up procedures should be sharing any new data you derive from it. The ESRC will described instead. If you will be working expect applicants to investigate these issues and to collaboratively with other institutions, make sure that attempt to gain copyright clearance so that your data the security and back-up procedures of each data can be shared at the end of your project. You should holding partner are described within the DMP. give full and appropriate acknowledgement, via citation, for any existing data that you use. Your DMP should also describe how you’ll keep your data safe before it’s deposited with a storage facility Unless stated otherwise, the ownership of intellectual such as the RDSF. This is particularly important if property lies with the organisation carrying out the you’re conducting field research. As a minimum research. If you plan to work collaboratively with an requirement, try to ensure at all times at least two external partner, copyright and IPR issues should be copies of the data exist and that every copy can easily clarified in a Consortium Agreement. This isn’t be accounted for and located if required. required as part of your application, but it should be mentioned that if the application is successful such an ESRC grant holders must adhere to the requirements agreement will be created. All partners should be of the Data Protection Act 1998. If you plan to handle aware before applying for funding that a Consortium sensitive, personal data, extra security measures must Agreement will be forthcoming. Research Enterprise be considered. The Office of the University Secretary12 and Development10 prepare Consortium Agreements can provide more advice on observing Data Protection and can advise on other IPR issues. legislation. Backup and data security Organising and describing data It is recommended that, as you make data, you store it Metadata is ‘data about data’ and is information (or in the University’s Research Data Storage Facility cataloguing information) that enables data users to (RDSF) managed by the Advanced Computing Research find and or use a dataset. In your DMP you should 11 Centre. Each research staff member is entitled to 5TB outline plans for documenting your research data, to of storage without charge. If your storage quota is meet both your own needs and those of later users. used up, or your project requires more storage space, there will be a cost and ACRC should be contacted for 10 12 Research Enterprise and Development, http://www.bristol.ac.uk/red/contracts/ 11 Advanced Computing Research Centre, https://www.acrc.bris.ac.uk/ Office of the University Secretary, http://www.bris.ac.uk/secretary/dataprotection/. 6 The ESRC expects documentation to include difficult task of ‘unpicking’ it. How will they make sense information such as data origin, fieldwork and of your file and folder naming conventions? What collection methods, and any processing of the data. extra information would they need to make maximum Descriptions of your data could be kept in a separate, use of your data? dedicated database or in a spreadsheet. If you’re DMP development planning to use data analysis software, such as a qualitative analysis package, you will also have the Once funding has been awarded, grant holders are option of adding documentation within the software expected to implement their DMP from the first itself in the form of notes, memos, nodes or planning stages of the project, as well as seeking classifications. advice and guidance from the UK Data Service to clarify how plans to deal with confidentiality and data When your data is deposited with an ESRC data service sharing are to be implemented in practice. In addition provider or responsible repository, you will be to this, where relevant the grant holder is expected to expected to complete a standardised metadata record. report on the ongoing implementation of the DMP to In some cases you will be expected to use metadata ESRC. Any issues arising during ESRC-funded research standards, such as the Data Documentation Initiative that could impact on data sharing must be raised with (DDI) specifically developed for the social sciences. your assigned ESRC case officer as soon as possible. Whilst the ESRC allow a period of privileged use for collected data, they still expect a metadata record to Roles and responsibilities be published at the earliest opportunity, including Data management responsibilities should be clearly details of how and when the data can be accessed. assigned to named individuals in your DMP. In You should also outline within your DMP how you’ll collaborative research projects, several individuals name files and folders to make sure that you and from different institutions can be named if others have appropriate access. You should describe appropriate. Plans described here should tally with the how you will keep track of different versions of ‘Staff Duties’ and ‘Justification of Resources’ sections documents (for instance, by adding version in the main Je-S application form. Several supporting information to the first page of each Word document services are in place at Bristol to help you manage your and by setting a folder aside for definitive, ‘milestone’ research data, and any of these which you plan to use versions of documents). should be mentioned in your DMP. In attempting to organise and document your data, it These services include: ACRC (data storage), your may help to imagine a secondary data user trying to Zonal IT team (everyday IT support), the data.bris make sense of your data in your absence, after your service (research data management training and project has concluded. If presented with only the data general data management guidance), RED (Consortium itself, this secondary user may be faced with the Agreements for collaborative research and IPR) and 7 the Office of the Secretary (for Data Protection and FOI). _______________________________________ CITING RESEARCH DATA IN RESEARCH OUTPUTS From 1st April 2013 all the UK’s Research Funding Councils, as part of RCUK, require research outputs (i.e. journal articles) to provide a means by which third parties can access any underpinning research datasets. The ESRC expects all grant holders to deposit data at the same time as outputs (e.g. journal articles) are published, and to use repositories which provide persistent identifiers for datasets (such as a DOI) which can be formally cited. A Digital Object Identifier or DOI printed in a paper will lead an enquirer to a specific webpage where either the data is directly available, or that contains details of how the data can be accessed. Given the extended timescales involved in the publication process, it is strongly recommended that the authors of published academic outputs do not provide their current contact details as a means by which underpinning research data may be accessed, as these will change over time. 8 SAMPLE ESRC DATA MANAGEMENT PLAN INTRODUCTION The following is intended as an illustration of an ESRC Data Management Plan. It is drawn from a real world ESRC proposal prepared by the University of Bristol Law School. The plan is made public with the kind permission of the applicant, Dr Margherita Pieraccini. Further costing and ethical issues relating to the proposal were covered in the wider ‘Case for Support’. This document is not available; however the following statement from the Case for Support explains the nature of the planned digital outputs and how they relate to the wider research questions: “Bringing forth the different values of nature will be done primarily through a series of workshops in three different case studies areas where biodiversity offsetting has been considered. The case-studies - deploying workshops as fora for experiential learning, environmental democratisation and reflexivity - will attempt to bring together the various actors involved and/or affected in the planning processes. All the workshops will be sites not just for the collection but for the co-creation/co-production of knowledge, seeking to locate, using different means, diverse perceptions of nature and values and exploring ways in which these can be integrated to produce a more legitimate biodiversity offsetting strategy. By hosting many of these co-production elements of the research at the sites of the development, values will not only be articulated by people’s conceptualisation of the issues but also by the places themselves, making biodiversity itself integral to the co-production of new biodiversity offsetting strategies, in line with a ‘morethan-human’ approach. The workshops will be supplemented by semi-structured interviews and by extensive documentary analysis.” SAMPLE DATA MANAGEMENT PLAN The data management and data sharing plan for the project will adhere to the RCUK Common Principles on Data Policy and the ESRC Research Data Policy. Specifically, we will aim to maximise transparency and accountability, enable scrutiny of any data generated, increase the impact and visibility of the research and address any barriers to access to data compatible with full ethics compliance. 1. Roles and Responsibilities The PI has previous experience in managing data similar to those that will be generated in this project due to her participation in the AHRC-funded Contested Common Land project, running from 2007 to 2010 and her current ESRC Future Research Leaders Ecologies and Identities project, running from 2012 to 2015. She has also completed the University of Bristol data security tutorial online. The PI has therefore the capabilities to oversee all data management activities. The RA will assist with the collection of data and the data analysis will be a task shared by the PI, CO-I and RA. Prior to any data collection taking place, the PI will seek the School of Law ethical approval for all aspects of the project including data management. This aspect of the project will therefore also be scrutinised by the Research Ethics Committee of the Faculty of Social Sciences and Law, on which independent members serve (in accordance with ESRC requirements). Should any data management difficulties of an ethical nature arise during the course of the research, the PI will seek the Research Ethics Committee’s advice. 2. Assessment of existing Data Considering the novelty of the subject to be studied, the qualitative questions the proposed research asks and the transformative methodology proposed, there are no existing resources that can be re-used to explore the subject of the proposed research. This has been confirmed by searching the online catalogue of the Economic and Social Data Service (http://www.esds.ac.uk/Lucene/Search.aspx) that has not identified existing dataset containing such material. The proposed research is therefore innovative and will contribute to the development of a socio-legal data set on biodiversity offsetting in the UK. 3. Information on New Data and Quality Assurance 3.1 Typology of Data Data collected during the empirical stage of the project will be of a qualitative nature and will include: 1) Workshops digital recording and transcripts 2) Interviews digital recording and transcripts 3) Written documents and material objects (e.g. maps, pictures) generated for and during the Workshops 4) Participant observation notes created by the PI and RA during the empirical work 3.2 Format For the formatting of data the formats recommended by the UK Data Archive for long-term preservation of qualitative data, digital audio data and documentation will be used (see table p. 12 at http://www.dataarchive.ac.uk/media/2894/managingsharing.pdf). 3.3 Quality Assurance A check of recording equipment and battery life will be carried out before interviews and workshops. The accuracy of transcription will be checked by the RA by reading the completed transcript whilst listening to the recording. Quality will be ensured also through a peer observation of the workshops. The PI and RA will conduct the workshops together. The RA will conduct all the interviews but will be accompanied by the PI in the first 3 interviews. 10 3.4 Ethical considerations Any confidential data where consent for its use has not been given by informants will only be used as background research and will not be made public or placed in the ESRC data repository. All anonymised and semi-anonymised data and data for which we have consent for public attribution will be made public and placed in the ESRC data repository. For ethical issues regarding data collection, i.e. gaining informed consent and anonymising data, please refer to the Ethics section of the Je-S application. 4. Storage and Sharing of Data 4.1 Back-up Storage Facility All electronic data, textual or audio, created will be stored on the University of Bristol’s dedicated Research Data Storage Facility (RDSF) (https://www.acrc.bris.ac.uk/storage.htm), which provides an integrated resilient petascale facility in which 5TB of disk storage per Data Steward is free of charge for staff. This two million pound investment provides nightly backup of all data, with further resilience provided by three geographically distinct storage locations. A tape library is used for backup purposes and also for long-term, offline data storage. Only authorised users can access data stored within the RDSF. The RDSF is managed by Bristol's Advanced Computing Research Centre (ACRC) which has a dedicated steering group and a rigorous data storage policy (https://www.acrc.bris.ac.uk/acrc/RDSF_policy.pdf). All electronic files will be password protected and encrypted using university-supplied encryption software. Any hardcopy documents will be kept under lock and key in the University offices of the project team. During field work, any data created will be encrypted and stored on University-approved hardware until it can be uploaded to the RDSF. Procedures are also in place to allow authenticated, external collaborators to view, add and/or edit data in the RDSF, which will be utilised by the project. 4.2 Access and Data sharing The policies developed for use of the RDSF address the holding of sensitive data and Freedom of Information Act 2000 requests. We will be able to limit the number of people who can access our data stored in the RDSF, by telling the RDSF who can access the data and providing relevant IP addresses. In line with the Data Protection Act 1998, data from the interviews and workshops will be anonymised to remove personal information and the consent of the interviewees gained before making the data available for re-use by other researchers. Sensitive data will be classed as strictly confidential and I will make sure that only the core research team (PI, CO-I and RA) are able to access them (see also the Ethics section of the Je-S application). 11 4.3 Economic and Social Service Data We will make sure that the data will be offered to the Economic and Social Data Service for archiving within three months of the project ending, and the data documentation will be prepared according to the UK Data Archive best practice guidance. 5. Costing data management The costs of data collection (comprised equipment), analysis, and sharing have all been accounted for in the Justification of Resources attachment. As for the costs of storage, RDSF provides up to 5TB of disk storage per Data Steward free of charge (as explained above). Data collected during the course of our project will not exceed the 5TB limit so I will not be charged for using the facility. 12