Why do I need to know about data management?

advertisement
Why do I need to know
about data management?
Dr Richard R. Plant & Dr Andrew Thompson
Data Management Planning &
Storage for Psychology project
Mission statement
The DMSPpsych project will establish a culture of
data management planning, archiving, and ongoing
reuse of data acquired as a result of psychological
research within the Department of Psychology at The
University of Sheffield.
We recognise it is often difficult and time consuming
for the individual, research group or even department
to follow a coordinated approach especially where
there are no local or discipline specific exemplars to
follow.
By tackling these issues at a grass roots level, on a
one-to-one basis, we hope to provide support and
foster an atmosphere of collaboration with regard to
data management.
17/07/2016 © The University of Sheffield
Why?
• You will no longer be able to apply for funding unless
you have a data management plan, take better care
of your data and ultimately share it
• Increase your citations by at least 69%
• Increase your chances of further funding and
collaboration
• “Backstop” your research papers – journals are likely
to request datasets (Stapel fraud)
• You might want to reuse your own data!
• Universities need a better organizational memory
• Good for science, good for UK PLC
17/07/2016 © The University of Sheffield
Cognitive dissonance?
17/07/2016 © The University of Sheffield
Ahhh, that’s better
17/07/2016 © The University of Sheffield
That looks bad!
17/07/2016 © The University of Sheffield
Fail to plan,
plan to fail
http://www.ecs.soton.ac.uk/regenesis/pictures/
These pictures were taken by Harvey Rutt
17/07/2016 © The University of Sheffield
Any gamblers in the house?
• Anyone in the audience willing to let me smash up their
main work laptop, PC, Mac with a hammer for:
£5,000
• Cash in used notes (no waiting around)
• A brand new identical laptop, PC, Mac instantly for free –
you know what I’ll even upgrade you!
• No comeback - you can even have a go on the hammer
(think how good that’d feel )
• Limited time offer. Right here, right now…
17/07/2016 © The University of Sheffield
Anti-patterns
• In software engineering, an anti-pattern is a pattern that may be commonly
used but is ineffective and/or counterproductive in practice.
• Coined in 1995 by Andrew Koenig. Popularized three years later by the
book AntiPatterns; extended the use into general social interaction.
• At least two key elements present to formally distinguish an actual antipattern from a simple bad habit, bad practice, or bad idea:
• Some repeated pattern of action, process or structure that initially
appears to be beneficial, but ultimately produces more bad
consequences than beneficial results, and
• A refactored solution exists that is clearly documented,
proven in actual practice and repeatable
By formally describing repeated mistakes, one can recognize
the forces that lead to their repetition and learn how others
have refactored themselves out of these broken patterns.
17/07/2016 © The University of Sheffield
Data loss will happen to you
• As surely as death and taxes – when & how
• Not just catastrophic events you should worry about:
• Dropping your laptop
• Overwriting data/versioning
• Hard drive failures
• File formats
• Software updates
• Media degradation
(CDR’s, memory sticks,
SSD’s)
• Obsolescence/upgrades
• Poorly described data
(metadata)
• Theft of equipment
• People move on
• Research trends (follow
the money
consequences)
17/07/2016 © The University of Sheffield
Show me the money
• All UK research councils now require a data management
plan be submitted with all new funding bids. Odds are you
already need to do this if you have a grant!
17/07/2016 © The University of Sheffield
17/07/2016 © The University of Sheffield
•
NSF annual budget of about $6.9 billion (2010)
•
Funding source for 20% of federally supported research by America's colleges and
universities
•
Beginning January 18, 2011, plans for data management and sharing of the products of
research a requirement. Proposals must include a supplementary document of no more than
two pages labelled “Data Management Plan”. This supplement should describe how the
proposal will conform to NSF policy on the dissemination and sharing of research results
(see AAG Chapter VI.D.4), and may include:
1. the types of data, samples, physical collections, software, curriculum materials, and
other materials to be produced in the course of the project;
2. the standards to be used for data and metadata format and content (where existing
standards are absent or deemed inadequate, this should be documented along with any
proposed solutions or remedies);
3. policies for access and sharing including provisions for appropriate protection of
privacy, confidentiality, security, intellectual property, or other rights or requirements;
4. policies and provisions for re-use, re-distribution, and the production of derivatives; and
5. plans for archiving data, samples, and other research products, and for preservation of
access to them.
17/07/2016 © The University of Sheffield
Increased citations
• Increase your citations by 69% through sharing
data (Piwowar, Day & Fridsma 2007)
17/07/2016 © The University of Sheffield
Data papers
• Growing interest in
publishing data papers
which can be cited in a
similar method to normal
papers via DOI’s
• Get academic credit for
sharing data
• Such papers describe what
the data is, how it was
collected, methodology,
variables, suggested reuse
and a link to the actual data
• DataCite
(www.datacite.org) is a
classic example. New
Psychology journal from
Ubiquity press
17/07/2016 © The University of Sheffield
Better chance of further funding
and collaboration
17/07/2016 © The University of Sheffield
•
Suggestion that funders might bar or
otherwise penalize if you don’t share
•
3 strikes and your out proposal
•
ESRC funding of £10.8m for reusing
existing data sets (call open now)
•
More academic collaboration if you
could see exactly what people are
doing
Psychology repositories that work
already out there
17/07/2016 © The University of Sheffield
Backstop your research papers
17/07/2016 © The University of Sheffield
Our turn in the spotlight (again)
Cyril Burt
17/07/2016 © The University of Sheffield
•
"The Burt Affair“
•
Heritability of
intelligence (as
measured in IQ tests)
•
Twin studies
•
Published numerous
articles and books on a
host of topics
•
Two of Burt's supposed
collaborators, Margaret
Howard and J. Conway,
were invented by Burt
himself
•
First British
Psychologist to be
knighted
•
Earlier work is often
accepted as valid
•
All of his notes and
records had been burnt
Reuse your own data
•
•
•
•
•
•
•
In short more than one and the more basic and common the better, e.g. CSV
raw text is better than E-Prime E-DataAid files
Keep the original data along with any versions translated into new emerging
formats, e.g. MS Word 2 -> Word 4 -> Word 6 going forward...
Update to new storage media as it becomes available, Floppy disk -> CDR ->
DVDR… plus keep the originals
Bear in mind companies go bust and take their software and file formats with
them, e.g. WordStar, WordPerfect, Lotus 123… plus companies are taken over
and change direction IBM SPSS!
Be sure to describe your data properly using metadata (data about data) so you
or someone else can understand it!
You can fall under the proverbial bus and so can all your data so describe it fully
Printed copies aren’t all bad but remember these can go in the bin if space is
short
Metadata: describe your data and
methods used to create it
• Pages 40-1 of Alexander Graham Bell's unpublished laboratory notebook
(1875-76), describing first successful experiment with the telephone
17/07/2016 © The University of Sheffield
Copernicus stored his data with thesis &
explained how it was coded c. 500 years old!
For an English translation:
http://www.webexhibits.org/c
alendars/year-textCopernicus.html
17/07/2016 © The University of Sheffield
Organizational memory
• Institutions now realizing this needs to be done (research & data)
• Exploitable
• Funding implications, e.g. EPSRC
• Prestige
• Impact / REF (back to 1993)
• Institutional policies and repositories, e.g. eprints
17/07/2016 © The University of Sheffield
Too much? I’m swimming
against the tide!
Help!
I’m here as a pair of boots on the ground to give discipline
specific help:
• I can help you write Data Management Plans for grants
to increase your chances of getting funded
• Put plans in place to help existing projects
• Help you manage/describe/share your data more
effectively
• Aid DClinPsys with site files and data
• Currently working on a localised one-stop-shop website
Come talk to me: r.r.plant@sheffield.ac.uk
17/07/2016 © The University of Sheffield
Practical help!
Come talk to me: r.r.plant@sheffield.ac.uk
17/07/2016 © The University of Sheffield
Download