ESRC funding applicants Research Data Service Data Management Planning University of Bristol

advertisement
Data Management Planning
ESRC funding applicants
Version 1.0 July 2015
University of Bristol
Research Data Service
Image: London 360 from St Paul's Cathedral, Wikimedia, Public Domain

SUMMARY

After funding is awarded, grant holders are
required to seek further advice and guidance
from the UK Data Service.
It is expected that ESRC-funded research data will
be deposited with the UK Data Service or, in the
INTRODUCTION
case of small datasets, with either an ESRC data
service provider or a responsible repository
It is a widely held view that publicly-funded research
within three months of the end of a grant.

data is a public good, produced in the public interest
A Data Management Plan (DMP) is typically
and should, whenever possible, be openly available for
required at the funding application stage.

secondary scientific research. This guide is intended
Within the DMP barriers to data sharing, along
for Economic and Social Research Council (ESRC)
with any measures you plan to take to overcome
applicants who are required to submit a Data
them, should be identified.


Management Plan (DMP) along with their application.
For sensitive data explicit mention of consent,
anonymisation and potential access restrictions
The ESRC research data policy2 consists of nine
(see ESRC’s Framework for Research Ethics)
underlying principles which align with the RCUK
should be made in the DMP. If the date could be
Common Principles on Data Sharing.3 Like many other
highly sensitive, consider deposit with UK Data
funding bodies, the ESRC expects grant holders,
Service Secure Lab.
whether partially or wholly funded, to generate robust
ESRC expects researchers to investigate copyright
data, ready for re-use and long-term preservation.
issues and to attempt to gain copyright clearance
Academic publishers also increasingly require that data
so that data can be shared at the end of the
which underpins a published research output (a
project. Research Enterprise and Development1
journal article for instance) should be made available
can assist with copyright issues.

for validation purposes.
Documentation should be provided alongside the
data so others can understand it. A metadata
A Data Management Plan (DMP), along with any
record should be created upon deposit of the
associated data management costs, is an integral part
data and a persistent identifier obtained so data
of all grant applications made to the ESRC (except for
can be formally cited.
applicants applying for studentships) and should be
submitted alongside your main Je-S application. Your
1
2
Research Enterprise and Development,
http://www.bristol.ac.uk/red/researchcommercial/copyright.html
ESRC Research Data Policy. 2015,
http://www.esrc.ac.uk/files/about-us/policies-and-standards/esrcresearch-data-policy/
3
RCUK Common Principles on Data Sharing,
http://www.rcuk.ac.uk/research/datapolicy/
2
DMP should explain how you’ll manage any research
and evaluated secondary sources of data before
data that you plan to use or create. An assessment of
considering primary research. The ESRC evaluate
the DMP will be made as part of the general
equally all applications for funding on the basis of
assessment of your application. A poorly prepared
scientific quality, regardless of whether the research
DMP may have a detrimental effect on an otherwise
intends to re-use existing data or to create new data. If
strong application. Your DMP should describe:
you are planning to create new data, then you should

include in your DMP an analysis of the gaps identified
any intentions you have for re-using existing data,
between available and required data to show why new
or justification of why new data needs to be
data is necessary.
generated

Data sharing
how data will be shared and potential barriers to
data sharing, along with any measures you plan
At the close of a funded project the ESRC data policy
to take to overcome these difficulties

stipulates that your research data must be formally
consent, confidentiality, anonymisation and any
deposited with the UK Data Service within three
other appropriate ethical considerations

months of the end of the grant. The ESRC will withhold
the data formats you intend to use, along with a
final grant payments as a penalty for not doing so.
brief explanation of why you’ve chosen them

the volume of data you expect to create
Smaller datasets (for example a subset of data which

methodologies for data collection and/or
supports claims made in a journal article) may be
processing
lodged with an ESRC data service provider, or an
data quality assurance procedures and
appropriate responsible digital repository such as the
storage/security arrangements plans to address
University of Bristol Research Data Repository.4 If using
copyright and intellectual property ownership of
a non-ESRC data service provider, it is the grant
the data
holder’s responsibility to ensure a persistent identifier
documentation and metadata, including relevant
(such as a DOI) is provided for the data and to inform
standards
the UK Data Service of the published location. A



project metadata record should also be created in the
individuals with responsibility for implementing
UK Data Service’s ReShare repository to maximise the
your DMP, and how it will be monitored and
discoverability of the data. There are many benefits to
developed
depositing research data with one of the ESRC data
Re-use of existing data
services, including the active promotion of your
When assessing your grant application, ESRC reviewers
research and services available for dealing with
will be looking for evidence that you have considered
sensitive data.
4
University of Bristol Research Data Repository
http://data.bris.ac.uk/data/
3
You must provide a statement on data sharing in the
data sharing. In particular, the ESRC requires
relevant section of the Je-S application form. Your
researchers to demonstrate due diligence in three
DMP should indicate exactly how this sharing will be
areas before it will consider a waiver:
achieved. Describe your plans to deposit your data

with an ESRC data service or any other repository or
when gaining informed consent, include consent
for data sharing (see below)
give reasons why this is not possible. The ESRC will

allow an embargo period on data (generally no longer
where needed, protecting participants’ identities
by anonymising data
than 12 months from the end of a grant) in order to

allow grant holders to publish their research findings.
If you plan to use an embargo period state this in your
considering data access restrictions in the DMP
In addition to the UK Data Service,7 the Secure Lab8
DMP.
has been established to promote excellence in
research by enabling controlled access to data deemed
While the re-use of data is very much encouraged, it is
too sensitive or confidential to be made openly
recognised that some research data will be sensitive
available. If you think your data will be suited for
and unsuitable for sharing. It is the responsibility of the
deposit with the UK Data Service Secure Lab, you
researcher to consider confidentiality, ethics, security
should contact them directly for confirmation before
and copyright before beginning any ESRC-funded
finalising your DMP.
research. It may be that parts of the data that are
sensitive cannot be shared, but the remainder can.
Consent
You should read the ESRC’s Framework for Research
Obtaining permission to publish data from human
Ethics,5 and anticipate and address any likely barriers
research participants is essential even if data is to be
to data sharing. More guidance is available from our
anonymised before publication. This is because some
document Sharing Research Data Concerning Human
risk of re-identification may remain, even after
Participants.6
anonymisation, and participants should be made
If you believe that your research data cannot be
aware that others outside of the research project may
shared at all, you must provide justification for this.
be able to view this data. Also, even if a participant has
Waivers of deposit to ESRC data services are
the right to withdraw from a study, it may not be
exceptional, and the ESRC reserves the right to refuse
possible to remove their data: the ESRC Framework for
waivers if there is insufficient evidence that the
Research Ethics states that all research should indicate
applicant has fully explored all strategies to enable
the point at which data will have been anonymised
5
6 https://data.bris.ac.uk/files/2015/02/Publicationsensitive.pdf
7 UK Data Service, http://ukdataservice.ac.uk/
ESRC Framework for Research Ethics. 2015,
http://www.esrc.ac.uk/files/funding/guidance-for-applicants/esrc-
8 UK Data Service Secure Lab, http://ukdataservice.ac.uk/get-data/how-to-
framework-for-research-ethics-2015/
access/accesssecurelab.aspx
4
and amalgamated and in certain circumstances cannot
technological ‘dependencies’. These may be fairly
then be excluded. If you will be gaining consent from
common technologies such as a desktop PC, the
participants for your research you should read the
Windows 7 operating system and Adobe Reader 9
ESRC Framework for Research Ethics, which contains
software. Or the technology required to access data
guidelines on consent for publishing data. The
might be rare and hard to acquire or even unique. You
Research Data Service has also produced a guide to
should address this problem by minimising the number
sharing data involving human participants, which
of technological dependencies involved in using your
includes sample statements for consent forms.
data.
Data formats
Where dependencies are inevitable you should favour
‘open’ technologies rather than proprietary ones.
As part of your DMP you should state in which
Proprietary technologies are owned by a vendor or
format(s) your data will be collected, analysed and
group of vendors. Commercial pressures may lead to
stored (for example, Open Document Format, CSV file
the withdrawal of a particular piece of commercial
or Excel spreadsheet). Your own research needs must
hardware or software, in favour of a new and possibly
come first in selecting a data format. If you find that
incompatible replacement. In contrast, ‘open’
you do need to use a non-standard format, you should
technologies are supported by a community of users
consider converting your data to a more widely re-
and do not have the same commercial vulnerabilities.
usable format once your own data analysis is
complete. For example, if you intend to use analysis
Your Case for Support should describe the actions you
software such as NVivo, you should mention in your
plan to take to ensure the quality of your proposed
DMP that your data will be exported at the end of the
research activities as a whole. The DMP is only
project in the widely accepted forms of text files,
concerned with the quality of your research data.
spreadsheets and XML. If you’re unsure which file
Quality should be considered whenever data is created
formats to use, the UK Data Archive publishes a list of
or altered, for instance at the time of data collection,
9
recommended deposit formats. These formats may
data entry or digitisation. You should provide
also be appropriate for use throughout your research.
information about the procedures you will carry out to
ensure that data quality is maintained, such as
Open and proprietary technologies
allocating time to validate data or entering values into
A major barrier to data sharing is the widespread use
prepared databases or transcription templates.
of non-standard, highly specialised file formats. In
Interview software can also help by verifying
order to use any digital file, a number of digital
consistency and detecting inadmissible responses.
technologies must be available, which are known as
9 UK Data Archive File Formats Table, http://www.data-
archive.ac.uk/create-manage/format/formats-table
5
guidance before your application is finalised. The back-
Copyright and Intellectual Property
up procedures, policies and controlled access
If you are planning to use existing data as part of your
arrangements used by the RDSF are of a very high
research, the data may be subject to copyright or
standard. If you do not intend to make use of RDSF,
other restrictions which could prevent you from
your storage provider’s back-up procedures should be
sharing any new data you derive from it. The ESRC will
described instead. If you will be working
expect applicants to investigate these issues and to
collaboratively with other institutions, make sure that
attempt to gain copyright clearance so that your data
the security and back-up procedures of each data
can be shared at the end of your project. You should
holding partner are described within the DMP.
give full and appropriate acknowledgement, via
citation, for any existing data that you use.
Your DMP should also describe how you’ll keep your
data safe before it’s deposited with a storage facility
Unless stated otherwise, the ownership of intellectual
such as the RDSF. This is particularly important if
property lies with the organisation carrying out the
you’re conducting field research. As a minimum
research. If you plan to work collaboratively with an
requirement, try to ensure at all times at least two
external partner, copyright and IPR issues should be
copies of the data exist and that every copy can easily
clarified in a Consortium Agreement. This isn’t
be accounted for and located if required.
required as part of your application, but it should be
mentioned that if the application is successful such an
ESRC grant holders must adhere to the requirements
agreement will be created. All partners should be
of the Data Protection Act 1998. If you plan to handle
aware before applying for funding that a Consortium
sensitive, personal data, extra security measures must
Agreement will be forthcoming. Research Enterprise
be considered. The Office of the University Secretary12
and Development10 prepare Consortium Agreements
can provide more advice on observing Data Protection
and can advise on other IPR issues.
legislation.
Backup and data security
Organising and describing data
It is recommended that, as you make data, you store it
Metadata is ‘data about data’ and is information (or
in the University’s Research Data Storage Facility
cataloguing information) that enables data users to
(RDSF) managed by the Advanced Computing Research
find and or use a dataset. In your DMP you should
11
Centre. Each research staff member is entitled to 5TB
outline plans for documenting your research data, to
of storage without charge. If your storage quota is
meet both your own needs and those of later users.
used up, or your project requires more storage space,
there will be a cost and ACRC should be contacted for
10
12
Research Enterprise and Development,
http://www.bristol.ac.uk/red/contracts/
11 Advanced Computing Research Centre,
https://www.acrc.bris.ac.uk/
Office of the University Secretary,
http://www.bris.ac.uk/secretary/dataprotection/.
6
The ESRC expects documentation to include
difficult task of ‘unpicking’ it. How will they make sense
information such as data origin, fieldwork and
of your file and folder naming conventions? What
collection methods, and any processing of the data.
extra information would they need to make maximum
Descriptions of your data could be kept in a separate,
use of your data?
dedicated database or in a spreadsheet. If you’re
DMP development
planning to use data analysis software, such as a
qualitative analysis package, you will also have the
Once funding has been awarded, grant holders are
option of adding documentation within the software
expected to implement their DMP from the first
itself in the form of notes, memos, nodes or
planning stages of the project, as well as seeking
classifications.
advice and guidance from the UK Data Service to
clarify how plans to deal with confidentiality and data
When your data is deposited with an ESRC data service
sharing are to be implemented in practice. In addition
provider or responsible repository, you will be
to this, where relevant the grant holder is expected to
expected to complete a standardised metadata record.
report on the ongoing implementation of the DMP to
In some cases you will be expected to use metadata
ESRC. Any issues arising during ESRC-funded research
standards, such as the Data Documentation Initiative
that could impact on data sharing must be raised with
(DDI) specifically developed for the social sciences.
your assigned ESRC case officer as soon as possible.
Whilst the ESRC allow a period of privileged use for
collected data, they still expect a metadata record to
Roles and responsibilities
be published at the earliest opportunity, including
Data management responsibilities should be clearly
details of how and when the data can be accessed.
assigned to named individuals in your DMP. In
You should also outline within your DMP how you’ll
collaborative research projects, several individuals
name files and folders to make sure that you and
from different institutions can be named if
others have appropriate access. You should describe
appropriate. Plans described here should tally with the
how you will keep track of different versions of
‘Staff Duties’ and ‘Justification of Resources’ sections
documents (for instance, by adding version
in the main Je-S application form. Several supporting
information to the first page of each Word document
services are in place at Bristol to help you manage your
and by setting a folder aside for definitive, ‘milestone’
research data, and any of these which you plan to use
versions of documents).
should be mentioned in your DMP.
In attempting to organise and document your data, it
These services include: ACRC (data storage), your
may help to imagine a secondary data user trying to
Zonal IT team (everyday IT support), the data.bris
make sense of your data in your absence, after your
service (research data management training and
project has concluded. If presented with only the data
general data management guidance), RED (Consortium
itself, this secondary user may be faced with the
Agreements for collaborative research and IPR) and
7
the Office of the Secretary (for Data Protection and
FOI).
_______________________________________
CITING RESEARCH DATA IN
RESEARCH OUTPUTS
From 1st April 2013 all the UK’s Research Funding
Councils, as part of RCUK, require research outputs
(i.e. journal articles) to provide a means by which third
parties can access any underpinning research datasets.
The ESRC expects all grant holders to deposit data at
the same time as outputs (e.g. journal articles) are
published, and to use repositories which provide
persistent identifiers for datasets (such as a DOI) which
can be formally cited. A Digital Object Identifier or DOI
printed in a paper will lead an enquirer to a specific
webpage where either the data is directly available, or
that contains details of how the data can be accessed.
Given the extended timescales involved in the
publication process, it is strongly recommended that
the authors of published academic outputs do not
provide their current contact details as a means by
which underpinning research data may be accessed, as
these will change over time.
8
SAMPLE ESRC DATA MANAGEMENT PLAN
INTRODUCTION
The following is intended as an illustration of an ESRC Data Management Plan. It is drawn from a real world ESRC
proposal prepared by the University of Bristol Law School. The plan is made public with the kind permission of the
applicant, Dr Margherita Pieraccini.
Further costing and ethical issues relating to the proposal were covered in the wider ‘Case for Support’. This
document is not available; however the following statement from the Case for Support explains the nature of the
planned digital outputs and how they relate to the wider research questions:
“Bringing forth the different values of nature will be done primarily through a series of workshops in three different
case studies areas where biodiversity offsetting has been considered. The case-studies - deploying workshops as fora
for experiential learning, environmental democratisation and reflexivity - will attempt to bring together the various
actors involved and/or affected in the planning processes. All the workshops will be sites not just for the collection but
for the co-creation/co-production of knowledge, seeking to locate, using different means, diverse perceptions of
nature and values and exploring ways in which these can be integrated to produce a more legitimate biodiversity
offsetting strategy. By hosting many of these co-production elements of the research at the sites of the development,
values will not only be articulated by people’s conceptualisation of the issues but also by the places themselves,
making biodiversity itself integral to the co-production of new biodiversity offsetting strategies, in line with a ‘morethan-human’ approach. The workshops will be supplemented by semi-structured interviews and by extensive
documentary analysis.”
SAMPLE DATA MANAGEMENT PLAN
The data management and data sharing plan for the project will adhere to the RCUK Common Principles on Data
Policy and the ESRC Research Data Policy. Specifically, we will aim to maximise transparency and accountability,
enable scrutiny of any data generated, increase the impact and visibility of the research and address any barriers to
access to data compatible with full ethics compliance.
1. Roles and Responsibilities
The PI has previous experience in managing data similar to those that will be generated in this project due to her
participation in the AHRC-funded Contested Common Land project, running from 2007 to 2010 and her current ESRC
Future Research Leaders Ecologies and Identities project, running from 2012 to 2015. She has also completed the
University of Bristol data security tutorial online. The PI has therefore the capabilities to oversee all data management
activities. The RA will assist with the collection of data and the data analysis will be a task shared by the PI, CO-I and
RA. Prior to any data collection taking place, the PI will seek the School of Law ethical approval for all aspects of the
project including data management. This aspect of the project will therefore also be scrutinised by the Research Ethics
Committee of the Faculty of Social Sciences and Law, on which independent members serve (in accordance with ESRC
requirements). Should any data management difficulties of an ethical nature arise during the course of the research,
the PI will seek the Research Ethics Committee’s advice.
2. Assessment of existing Data
Considering the novelty of the subject to be studied, the qualitative questions the proposed research asks and the
transformative methodology proposed, there are no existing resources that can be re-used to explore the subject of
the proposed research. This has been confirmed by searching the online catalogue of the Economic and Social Data
Service (http://www.esds.ac.uk/Lucene/Search.aspx) that has not identified existing dataset containing such material.
The proposed research is therefore innovative and will contribute to the development of a socio-legal data set on
biodiversity offsetting in the UK.
3. Information on New Data and Quality Assurance
3.1 Typology of Data
Data collected during the empirical stage of the project will be of a qualitative nature and will include:
1)
Workshops digital recording and transcripts
2)
Interviews digital recording and transcripts
3)
Written documents and material objects (e.g. maps, pictures) generated for and during the Workshops
4)
Participant observation notes created by the PI and RA during the empirical work
3.2 Format
For the formatting of data the formats recommended by the UK Data Archive for long-term preservation of qualitative
data, digital audio data and documentation will be used (see table p. 12 at http://www.dataarchive.ac.uk/media/2894/managingsharing.pdf).
3.3 Quality Assurance
A check of recording equipment and battery life will be carried out before interviews and workshops. The accuracy of
transcription will be checked by the RA by reading the completed transcript whilst listening to the recording. Quality
will be ensured also through a peer observation of the workshops. The PI and RA will conduct the workshops together.
The RA will conduct all the interviews but will be accompanied by the PI in the first 3 interviews.
10
3.4 Ethical considerations
Any confidential data where consent for its use has not been given by informants will only be used as background
research and will not be made public or placed in the ESRC data repository. All anonymised and semi-anonymised data
and data for which we have consent for public attribution will be made public and placed in the ESRC data repository.
For ethical issues regarding data collection, i.e. gaining informed consent and anonymising data, please refer to the
Ethics section of the Je-S application.
4. Storage and Sharing of Data
4.1 Back-up Storage Facility
All electronic data, textual or audio, created will be stored on the University of Bristol’s dedicated Research Data
Storage Facility (RDSF) (https://www.acrc.bris.ac.uk/storage.htm), which provides an integrated resilient petascale
facility in which 5TB of disk storage per Data Steward is free of charge for staff. This two million pound investment
provides nightly backup of all data, with further resilience provided by three geographically distinct storage locations.
A tape library is used for backup purposes and also for long-term, offline data storage. Only authorised users can
access data stored within the RDSF. The RDSF is managed by Bristol's Advanced Computing Research Centre (ACRC)
which has a dedicated steering group and a rigorous data storage policy
(https://www.acrc.bris.ac.uk/acrc/RDSF_policy.pdf).
All electronic files will be password protected and encrypted using university-supplied encryption software. Any
hardcopy documents will be kept under lock and key in the University offices of the project team. During field work,
any data created will be encrypted and stored on University-approved hardware until it can be uploaded to the RDSF.
Procedures are also in place to allow authenticated, external collaborators to view, add and/or edit data in the RDSF,
which will be utilised by the project.
4.2 Access and Data sharing
The policies developed for use of the RDSF address the holding of sensitive data and Freedom of Information Act 2000
requests. We will be able to limit the number of people who can access our data stored in the RDSF, by telling the
RDSF who can access the data and providing relevant IP addresses.
In line with the Data Protection Act 1998, data from the interviews and workshops will be anonymised to remove
personal information and the consent of the interviewees gained before making the data available for re-use by other
researchers. Sensitive data will be classed as strictly confidential and I will make sure that only the core research team
(PI, CO-I and RA) are able to access them (see also the Ethics section of the Je-S application).
11
4.3 Economic and Social Service Data
We will make sure that the data will be offered to the Economic and Social Data Service for archiving within three
months of the project ending, and the data documentation will be prepared according to the UK Data Archive best
practice guidance.
5. Costing data management
The costs of data collection (comprised equipment), analysis, and sharing have all been accounted for in the
Justification of Resources attachment. As for the costs of storage, RDSF provides up to 5TB of disk storage per Data
Steward free of charge (as explained above). Data collected during the course of our project will not exceed the 5TB
limit so I will not be charged for using the facility.
12
Download