Managing Data

advertisement
Managing Data
Julia Barrett
UCD James Joyce Library
An Leabharlann UCD
Outline
•Why manage data?
•Checklist:
•The nature of your data
•Ethics and IP
•Data sharing
•Data repositories
•Data storage, backup &
security
•UCD Policies & Contacts
•UCD Library services for
researchers
Data management and curation practices
among university researchers
• None of the researchers interviewed for the study had
received formal training in data mgmt. practices – levels
of expertise a problem as they are learning on the job
• Few researchers, especially early career, think about the
long-term preservation of their data
• The demands of publication output overwhelm long-term
considerations of data curation
• A great need for more effective collaboration tools, as
well as online spaces that support the volume of data
generated and provide appropriate privacy and access
controls.
– Council on Library & Information Resources: “The problem of data”.
(August 2012)
Why manage data?
• There are significant risks with not managing
research data effectively
– Confused data
• Arising from lack of documentation
meaning that experiments may have to
be repeated to make sense of results of
those previously undertaken
– Loss of data
• May not be possible to repeat
• Loss of potential, opportunity and impact
Why manage data?
• Managing your data
means:
– Easier to find your data
– Easier for your results to
be verified
– Easier to combine your
data with other data
– Easier for others to find,
understand and use your
data for new research to
develop new discoveries
– Easier for others to cite
your data
Why manage data?
• Managing data isn’t
something new…it’s part
of good research practice
and shouldn’t be
something you do
because a funding body
mandates it
• Funding bodies are
moving towards
advocating this, e.g. The
Natural Environment
Research Council:
– www.nerc.ac.uk/researc
h/sites/data/policy.asp
• “…a formal requirement for all
applications for NERC funding to
include outline data management
plans, which will be evaluated as
part of the standard NERC grant
assessment process. This brings
NERC into line with similar
requirements from MRC and
BBSRC, and the requirement for a
data management plan announced
by the National Science Foundation.
All successful applicants for funding
will be required to produce a full
data management plan, in
conjunction with the relevant NERC
data centre.”
A valuable asset
• Only a small
proportion of all the
data collected will be
made visible
• A process of
selection, reduction
and distillation
– Publishable document
• Research data as a
valuable asset
•http://commons.wikimedia.org/wiki/File:Old_Wikisource_logo_u
sed_until_2006.jpg
Data management checklist
• Basic details (name of
supervisor, project title etc)
• The nature of your data
• Describing and documenting
data (metadata)
• Managing ethics and
intellectual property
• Data sharing
• Data storage, backup &
security
• Data archiving
• UCD policies
• UCD contacts
• Your feedback
The nature of your data
• Key questions:
– What data?
– How much data?
– Growth rate?
– In what format will data be stored in the short
term?
– Why particular formats?
Managing ethics and IP
• Key questions:
– Are there any ethical or privacy issues that may
prohibit the sharing of some or all of the dataset/s?
– If so what possible ways might there be to resolve
these? (E.g. referral to UCD’s Ethics Committee;
anonymisation of data; formal consent agreements;
different levels of access to data, e.g. research
purposes only, no commercial)
Data sharing - Funders
• NERC (Natural Environment Research Council)
expects everyone that it funds to manage the data
they produce in an effective manner for the lifetime
of their project, and for these data to be made
available for others to use with as few restrictions as
possible, and in a timely manner.
• To protect the research process NERC will allow those
who undertake NERC-funded work a period of time to
work exclusively on, and publish the results of, the
data they have collected. This period will normally be
a maximum of two years from the end of data
collection.
Data sharing – Journal publishers
• Increasing number of journal publishers require the
sharing of associated data
• Coverage of journal policies varies
–
–
–
–
Data sharing requirement
Data sharing suggestion
Specification of the use of a particular data repository
Evidence of data sharing as a precondition to
publication
• Register of journal open data policies:
http://oad.simmons.edu/oadwiki/Journal_opendata_policies
• DRYAD (www.datadryad.org/) – an international
repository that manages the research data
underpinning peer-reviewed articles in the
biosciences.
Barriers to sharing
• A huge amount of data ends up unpublished,
unshared and essentially wasted – another form
of data loss particularly for datasets that have
clear scope for wider research use, decisionmaking, policy making and hold significant longterm value
• Tension between the pressure to make data
more open earlier on and the real fear that
researchers have that if they do that others will
reap the benefits from the hard work they’ve
done
Why might public access to your
research data be restricted?
• “We intend to make a patent application, and
must avoid prior disclosure.”
• “Don’t want to make locations of members of
endangered species available to poachers.”
• “The research data are confidential because of
the arrangement my research group has made
with the commercial partner sponsoring our
research.”
• “My data form part of a long-term study upon
which my research group is entirely reliant for its
on-going research publications and academic
reputation. We only share this with trusted
colleagues.”
Advantages of sharing – to you
• Increased citation rates
– Piwowar HA, Day RS, Fridsma DB (2007) Sharing
Detailed Research Data Is Associated with
Increased Citation Rate. PLoS ONE 2(3): e308.
doi:10.1371/journal.pone.0000308
www.plosone.org/article/info:doi%2F10.1371%2Fjo
urnal.pone.0000308
Data repositories – different types
• Earthchem www.earthchem.org/
– EarthChem is a community-driven effort to facilitate the
preservation, discovery, access and visualization of data
generated in the geosciences, with particular emphasis on
geochemical, geochronlogical, and petrological data
• DRYAD Digital Repository http://datadryad.org/
– Datafiles associated with published articles in the biosciences
– www.youtube.com/watch?v=RP33cl8tL28
• Fig Share http://figshare.com/
– Figshare allows researchers to publish their research outputs
in an easily citable, sharable and discoverable manner
– www.youtube.com/watch?v=WlJlPmoJcJk
– http://figshare.com/faqs
Institutional Repository or Digital Library
Advantages of a repository
• Provides a metadata structure for you to fill in
• Publishes the data for you by giving your dataset
a unique identifier, e.g. DOI
• Serves as a backup vehicle for your data
• May preserve your data for the future
• Makes sharing your data easy
• Others may cite your research more
• May provide some computational tools for people
to use with your data
Locating relevant datasets
• Databib
– http://databib.org/
• Open Access Directory’s list of
repositories and databases for open
data
– http://oad.simmons.edu/oadwiki/Data_rep
ositories
• CalPoly’s LibGuide
– http://libguides.calpoly.edu/content.php?pid=2
77668&sid=2294712
Data storage, backup & security
• Key questions:
– Where will you store your data in the short term,
after acquisition?
– How is your data backed up?
– How often is your data backed up?
– How will you ensure the security of your data?
• https://docs.google.com/a/ucd.ie/spreadsheet/v
iewform?formkey=dGV3QUF4UGxTaTkweDFJWlh
iU1g2VVE6MA#gid=0
UCD Policies & Contacts
• Code of Good
Practice in Research
• www.ucd.ie/research
ethics/
• www.ucd.ie/innovati
on/researchers/
• Data Storage and
Retention Guidelines
• www.ucd.ie/itservice
(Research Ethics
s/researchit/
Committee)
• www.ucd.ie/library/s
upporting_you/resea
• Intellectual Property
rch_support/
Policy
UCD Library services for researchers
• Advice on:
– using resources such as Web of Science, Science
Direct and discipline-specific databases
– using academic search engines such as Google
Scholar/Publish or Perish etc.
– using patent databases
– using mapping resources
• www.youtube.com/watch?v=OKtEcU95_K4&list=PLR7vlXp1FB
n3NLd6dqO_6hWmT8w6-FPRq&index=2&feature=plpp_video
– managing your references using Endnote
– where to publish your research – impact factors of
journals
– measuring the impact of your published research
• www.youtube.com/watch?v=fepRJaccUqI&feature=autoplay&li
st=PLR7vlXp1FBn3NLd6dqO_6hWmT8w6-FPRq&playnext=1
– showcasing your research by depositing to Research
Repository UCD
UCD Library’s Research Services Unit
• Julia Barrett,
– Research Services
Manager
– Julia.barrett@ucd.ie
• www.ucd.ie/library/supporting_you/research_support/
• www.ucd.ie/library/supporting_you/research_support/data_m
anagement/ (data management checklist available here)
Download