a centre of expertise in data curation and preservation CONCEPTUALISE Mark Thorley Natural Environment Research Council Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons.org/licenses/by-ncsa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. Digital Curation 101, October 6th-10th, 2008, NeSC, Edinburgh a centre of expertise in data curation and preservation Overview • • • • Background – what & why. Policy drivers – funder rules. Roles – who’s job is it anyway? Practical steps – thinking about data issues. a centre of expertise in data curation and preservation a centre of expertise in data curation and preservation Background: what do we mean by data? • Data as a by-product of research. • Data as a part of the scientific record – must be maintained to allow reproduction and validation. • Data as a ‘published’ output in its own right. a centre of expertise in data curation and preservation Background: why is data management important? • Good research practice: to do good science requires good data management. • Reproduceability: maintenance of the scientific record. • Long-term value for re-use and re-purposing. • Good research = good digital curation. a centre of expertise in data curation and preservation Background: drivers for sharing • Scientific need: especially for large-scale or long-term studies. • Increased value: where part of a larger collection (eg. Oceans or atmosphere). • Value for money: data collection can be very expensive. • Publicly funded: public right of access. a centre of expertise in data curation and preservation Key learning 1 • Do not do data management / curation for its own sake. Do it to ensure: • Good research outcomes; • Data of long-term value are available for re-use and re-purposing; • The scientific record is protected. a centre of expertise in data curation and preservation Policy drivers • Professional standards: GLP etc. • ESF: Good scientific practice in research and scholarship (2000). • RCUK: Governance of good research conduct (2008). • Research Council data policies. a centre of expertise in data curation and preservation ESF: Data accumulation, handling and storage • • 36. Data are produced at all stages in experimental research and in scholarship. Data sets are an important resource, which enable later verification of scientific interpretation and conclusions. They may also be the starting point for further studies. It is vital, therefore, that all primary and secondary data are stored in a secure and accessible form. 37. Institutions must pay particular attention to documenting and archiving original research and scholarship data. Several codes of good practice recommend a minimum period of 10 years, longer in the case of especially significant or sensitive data. National or regional discipline-based archives should be considered where there are practical or other problems in storing data at the institution where the research was conducted. a centre of expertise in data curation and preservation RCUK code of conduct • Management and preservation of data and primary materials: • …. ensure that relevant primary data and research evidence are preserved and accessible to others for reasonable periods after the completion of the research. This is a shared responsibility between researcher and the research organisation, but individual researchers should always ensure that primary material is available to be checked. Such conditions should also be applied where ownership of data may rest with third parties, for example where there is commercial sponsorship of research. Data should normally be preserved and accessible for not less than 10 years for any projects, and for projects of clinical or major social, environmental or heritage importance, the data should be retained for up to 20 years, and preferably permanently within a national collection, or as required by the funder’s data policy. a centre of expertise in data curation and preservation Generic policy principles • Research Councils recognise data as a valuable long-term, public-good resource. • Data sharing improves opportunities for exploitation. • Investigator teams have a right of first use and a right to be acknowledged. • Effective exploitation requires effective data management. a centre of expertise in data curation and preservation • Formal data policy – currently being updated. • Joint JISC & ESRC supported UK Date Archive, including the Economic and Social Data Service. • Applicants must carry out a data review to ensure funds not requested for data that are already available. Data must be offered to the archive within 3 months of end of award. • Partner in National Data Strategy for Social Science Research. a centre of expertise in data curation and preservation • Data policy handbook and guidance. New version under development. • All data must be offered to a NERC data centre to enable long-term management and re-use. • Recognition of rights of investigator teams. • NERC supports 6 data centres for long-term management of environmental data. a centre of expertise in data curation and preservation • Data sharing policy and implementation guidelines. Endorsed by Council 2006, apply from April 2007. • Applicants must produce a data sharing plan. Data sharing encouraged in all research areas where there is a strong scientific need and it is cost effective to do so. • Funds can be requested to support data management and sharing activities. a centre of expertise in data curation and preservation • Data sharing and preservation policy – applies to new grants awarded from January 2006. • Applicants must produce a plan for data sharing and preservation and include costings in grant applications. • Implementing data management facilities at MRC owned centres (as part of corporate responsibility for data). a centre of expertise in data curation and preservation • De facto policy - detailed in funding guidance. • Any significant electronic resources or datasets created as a result of research funded by the AHRC must be made available in an accessible depository for at least three years after the end of the grant. • Can request resources to support management and sharing. • Archaeology – special case. Must use the AHRC supported Archaeology Data Service. a centre of expertise in data curation and preservation • No formal policy as yet, however, strong consideration of policy development. • Encourages PIs to manage primary data as the basis for publications securely and for an appropriate time in a durable form under the control of the institution of their origin. a centre of expertise in data curation and preservation • Polices under development following merger of PPARC and CCLRC. • Facilities (ie CCLRC) – well developed policies and facilities on a per-project basis. • Grant holders (ie PPARC) – Data curation policy agreed in principle. a centre of expertise in data curation and preservation Roles: who’s job is it? Data sharing / curation has added requirements and expectations on to research teams. Research Curation Re-use Researchers have to ‘do stuff’ to their data to enable re-use. ‘Stuff’ is not always getting done! a centre of expertise in data curation and preservation ‘Doing stuff’ to data takes time and skills which research teams do not always have. Research Curation Re-use Asking researchers to ‘do stuff’ with data that falls outside of their area of expertise/interest. a centre of expertise in data curation and preservation Research teams need access to data management skills and incentives. Re-use Research Curation Informaticians bridge the gap. a centre of expertise in data curation and preservation Who are the key players? Informaticians Researchers Research Re-use Curation Data managers a centre of expertise in data curation and preservation Key learning 2 • Data management is too important to be left to data managers – must involve researchers! • Researchers and data managers must work together to identify data management activities appropriate to the research. • Roles will change over time: • Within project – responsibility of research team; • Post project – responsibility of ‘data centre’. a centre of expertise in data curation and preservation Practical steps: the grant application • Depends on the research area and the requirements of the funder, but avoid nugatory effort. • Research funder wants to know: • Relevant policies being met; • PI has given serious consideration to what is needed wrt data and has demonstrated will be able to deliver; • Necessary resources included in proposal. a centre of expertise in data curation and preservation Grant application: example • What data are planned for collection and which of these data are perceived of having long-term value? • What, if any, existing data will be required? Who will supply these data and will there be a cost? • Have the necessary legal & ethical issues been considered (eg consent and confidentiality)? • What specialist data and informatics skills will be required by the programme and where will these be obtained from? • Where will data be held and how will access be provided for long-term re-use and re-purposing? • ££££££££ a centre of expertise in data curation and preservation Practical steps: the detailed plan • Ask an expert ! • Researchers must work with data managers to ensure that which is of long-term value is appropriately managed. • Be pragmatic. • Build on what others have done; • Don’t aim to change the world – recognise that for many researchers data management is a necessary but ‘un-productive’ overhead. • Early intervention – better long-term outcome. a centre of expertise in data curation and preservation Key learning 3 • Early intervention leads to better long-term outcomes. • Data management professionals should be involved from the research planning phase onwards; • Develop domain specialist ‘informaticians’. a centre of expertise in data curation and preservation •Further information •Mark Thorley •NERC mrt@nerc.ac.uk