Managing Data Julia Barrett UCD James Joyce Library An Leabharlann UCD Outline •Why manage data? •Checklist: •The nature of your data •Ethics and IP •Data sharing •Data repositories •Data storage, backup & security •UCD Policies & Contacts •UCD Library services for researchers Data management and curation practices among university researchers • None of the researchers interviewed for the study had received formal training in data mgmt. practices – levels of expertise a problem as they are learning on the job • Few researchers, especially early career, think about the long-term preservation of their data • The demands of publication output overwhelm long-term considerations of data curation • A great need for more effective collaboration tools, as well as online spaces that support the volume of data generated and provide appropriate privacy and access controls. – Council on Library & Information Resources: “The problem of data”. (August 2012) Why manage data? • There are significant risks with not managing research data effectively – Confused data • Arising from lack of documentation meaning that experiments may have to be repeated to make sense of results of those previously undertaken – Loss of data • May not be possible to repeat • Loss of potential, opportunity and impact Why manage data? • Managing your data means: – Easier to find your data – Easier for your results to be verified – Easier to combine your data with other data – Easier for others to find, understand and use your data for new research to develop new discoveries – Easier for others to cite your data Why manage data? • Managing data isn’t something new…it’s part of good research practice and shouldn’t be something you do because a funding body mandates it • Funding bodies are moving towards advocating this, e.g. The Natural Environment Research Council: – www.nerc.ac.uk/researc h/sites/data/policy.asp • “…a formal requirement for all applications for NERC funding to include outline data management plans, which will be evaluated as part of the standard NERC grant assessment process. This brings NERC into line with similar requirements from MRC and BBSRC, and the requirement for a data management plan announced by the National Science Foundation. All successful applicants for funding will be required to produce a full data management plan, in conjunction with the relevant NERC data centre.” A valuable asset • Only a small proportion of all the data collected will be made visible • A process of selection, reduction and distillation – Publishable document • Research data as a valuable asset •http://commons.wikimedia.org/wiki/File:Old_Wikisource_logo_u sed_until_2006.jpg Data management checklist • Basic details (name of supervisor, project title etc) • The nature of your data • Describing and documenting data (metadata) • Managing ethics and intellectual property • Data sharing • Data storage, backup & security • Data archiving • UCD policies • UCD contacts • Your feedback The nature of your data • Key questions: – What data? – How much data? – Growth rate? – In what format will data be stored in the short term? – Why particular formats? Managing ethics and IP • Key questions: – Are there any ethical or privacy issues that may prohibit the sharing of some or all of the dataset/s? – If so what possible ways might there be to resolve these? (E.g. referral to UCD’s Ethics Committee; anonymisation of data; formal consent agreements; different levels of access to data, e.g. research purposes only, no commercial) Data sharing - Funders • NERC (Natural Environment Research Council) expects everyone that it funds to manage the data they produce in an effective manner for the lifetime of their project, and for these data to be made available for others to use with as few restrictions as possible, and in a timely manner. • To protect the research process NERC will allow those who undertake NERC-funded work a period of time to work exclusively on, and publish the results of, the data they have collected. This period will normally be a maximum of two years from the end of data collection. Data sharing – Journal publishers • Increasing number of journal publishers require the sharing of associated data • Coverage of journal policies varies – – – – Data sharing requirement Data sharing suggestion Specification of the use of a particular data repository Evidence of data sharing as a precondition to publication • Register of journal open data policies: http://oad.simmons.edu/oadwiki/Journal_opendata_policies • DRYAD (www.datadryad.org/) – an international repository that manages the research data underpinning peer-reviewed articles in the biosciences. Barriers to sharing • A huge amount of data ends up unpublished, unshared and essentially wasted – another form of data loss particularly for datasets that have clear scope for wider research use, decisionmaking, policy making and hold significant longterm value • Tension between the pressure to make data more open earlier on and the real fear that researchers have that if they do that others will reap the benefits from the hard work they’ve done Why might public access to your research data be restricted? • “We intend to make a patent application, and must avoid prior disclosure.” • “Don’t want to make locations of members of endangered species available to poachers.” • “The research data are confidential because of the arrangement my research group has made with the commercial partner sponsoring our research.” • “My data form part of a long-term study upon which my research group is entirely reliant for its on-going research publications and academic reputation. We only share this with trusted colleagues.” Advantages of sharing – to you • Increased citation rates – Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308 www.plosone.org/article/info:doi%2F10.1371%2Fjo urnal.pone.0000308 Data repositories – different types • Earthchem www.earthchem.org/ – EarthChem is a community-driven effort to facilitate the preservation, discovery, access and visualization of data generated in the geosciences, with particular emphasis on geochemical, geochronlogical, and petrological data • DRYAD Digital Repository http://datadryad.org/ – Datafiles associated with published articles in the biosciences – www.youtube.com/watch?v=RP33cl8tL28 • Fig Share http://figshare.com/ – Figshare allows researchers to publish their research outputs in an easily citable, sharable and discoverable manner – www.youtube.com/watch?v=WlJlPmoJcJk – http://figshare.com/faqs Institutional Repository or Digital Library Advantages of a repository • Provides a metadata structure for you to fill in • Publishes the data for you by giving your dataset a unique identifier, e.g. DOI • Serves as a backup vehicle for your data • May preserve your data for the future • Makes sharing your data easy • Others may cite your research more • May provide some computational tools for people to use with your data Locating relevant datasets • Databib – http://databib.org/ • Open Access Directory’s list of repositories and databases for open data – http://oad.simmons.edu/oadwiki/Data_rep ositories • CalPoly’s LibGuide – http://libguides.calpoly.edu/content.php?pid=2 77668&sid=2294712 Data storage, backup & security • Key questions: – Where will you store your data in the short term, after acquisition? – How is your data backed up? – How often is your data backed up? – How will you ensure the security of your data? • https://docs.google.com/a/ucd.ie/spreadsheet/v iewform?formkey=dGV3QUF4UGxTaTkweDFJWlh iU1g2VVE6MA#gid=0 UCD Policies & Contacts • Code of Good Practice in Research • www.ucd.ie/research ethics/ • www.ucd.ie/innovati on/researchers/ • Data Storage and Retention Guidelines • www.ucd.ie/itservice (Research Ethics s/researchit/ Committee) • www.ucd.ie/library/s upporting_you/resea • Intellectual Property rch_support/ Policy UCD Library services for researchers • Advice on: – using resources such as Web of Science, Science Direct and discipline-specific databases – using academic search engines such as Google Scholar/Publish or Perish etc. – using patent databases – using mapping resources • www.youtube.com/watch?v=OKtEcU95_K4&list=PLR7vlXp1FB n3NLd6dqO_6hWmT8w6-FPRq&index=2&feature=plpp_video – managing your references using Endnote – where to publish your research – impact factors of journals – measuring the impact of your published research • www.youtube.com/watch?v=fepRJaccUqI&feature=autoplay&li st=PLR7vlXp1FBn3NLd6dqO_6hWmT8w6-FPRq&playnext=1 – showcasing your research by depositing to Research Repository UCD UCD Library’s Research Services Unit • Julia Barrett, – Research Services Manager – Julia.barrett@ucd.ie • www.ucd.ie/library/supporting_you/research_support/ • www.ucd.ie/library/supporting_you/research_support/data_m anagement/ (data management checklist available here)