Why do I need to know about data management? Dr Richard R. Plant & Dr Andrew Thompson Data Management Planning & Storage for Psychology project Mission statement The DMSPpsych project will establish a culture of data management planning, archiving, and ongoing reuse of data acquired as a result of psychological research within the Department of Psychology at The University of Sheffield. We recognise it is often difficult and time consuming for the individual, research group or even department to follow a coordinated approach especially where there are no local or discipline specific exemplars to follow. By tackling these issues at a grass roots level, on a one-to-one basis, we hope to provide support and foster an atmosphere of collaboration with regard to data management. 17/07/2016 © The University of Sheffield Why? • You will no longer be able to apply for funding unless you have a data management plan, take better care of your data and ultimately share it • Increase your citations by at least 69% • Increase your chances of further funding and collaboration • “Backstop” your research papers – journals are likely to request datasets (Stapel fraud) • You might want to reuse your own data! • Universities need a better organizational memory • Good for science, good for UK PLC 17/07/2016 © The University of Sheffield Cognitive dissonance? 17/07/2016 © The University of Sheffield Ahhh, that’s better 17/07/2016 © The University of Sheffield That looks bad! 17/07/2016 © The University of Sheffield Fail to plan, plan to fail http://www.ecs.soton.ac.uk/regenesis/pictures/ These pictures were taken by Harvey Rutt 17/07/2016 © The University of Sheffield Any gamblers in the house? • Anyone in the audience willing to let me smash up their main work laptop, PC, Mac with a hammer for: £5,000 • Cash in used notes (no waiting around) • A brand new identical laptop, PC, Mac instantly for free – you know what I’ll even upgrade you! • No comeback - you can even have a go on the hammer (think how good that’d feel ) • Limited time offer. Right here, right now… 17/07/2016 © The University of Sheffield Anti-patterns • In software engineering, an anti-pattern is a pattern that may be commonly used but is ineffective and/or counterproductive in practice. • Coined in 1995 by Andrew Koenig. Popularized three years later by the book AntiPatterns; extended the use into general social interaction. • At least two key elements present to formally distinguish an actual antipattern from a simple bad habit, bad practice, or bad idea: • Some repeated pattern of action, process or structure that initially appears to be beneficial, but ultimately produces more bad consequences than beneficial results, and • A refactored solution exists that is clearly documented, proven in actual practice and repeatable By formally describing repeated mistakes, one can recognize the forces that lead to their repetition and learn how others have refactored themselves out of these broken patterns. 17/07/2016 © The University of Sheffield Data loss will happen to you • As surely as death and taxes – when & how • Not just catastrophic events you should worry about: • Dropping your laptop • Overwriting data/versioning • Hard drive failures • File formats • Software updates • Media degradation (CDR’s, memory sticks, SSD’s) • Obsolescence/upgrades • Poorly described data (metadata) • Theft of equipment • People move on • Research trends (follow the money consequences) 17/07/2016 © The University of Sheffield Show me the money • All UK research councils now require a data management plan be submitted with all new funding bids. Odds are you already need to do this if you have a grant! 17/07/2016 © The University of Sheffield 17/07/2016 © The University of Sheffield • NSF annual budget of about $6.9 billion (2010) • Funding source for 20% of federally supported research by America's colleges and universities • Beginning January 18, 2011, plans for data management and sharing of the products of research a requirement. Proposals must include a supplementary document of no more than two pages labelled “Data Management Plan”. This supplement should describe how the proposal will conform to NSF policy on the dissemination and sharing of research results (see AAG Chapter VI.D.4), and may include: 1. the types of data, samples, physical collections, software, curriculum materials, and other materials to be produced in the course of the project; 2. the standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies); 3. policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements; 4. policies and provisions for re-use, re-distribution, and the production of derivatives; and 5. plans for archiving data, samples, and other research products, and for preservation of access to them. 17/07/2016 © The University of Sheffield Increased citations • Increase your citations by 69% through sharing data (Piwowar, Day & Fridsma 2007) 17/07/2016 © The University of Sheffield Data papers • Growing interest in publishing data papers which can be cited in a similar method to normal papers via DOI’s • Get academic credit for sharing data • Such papers describe what the data is, how it was collected, methodology, variables, suggested reuse and a link to the actual data • DataCite (www.datacite.org) is a classic example. New Psychology journal from Ubiquity press 17/07/2016 © The University of Sheffield Better chance of further funding and collaboration 17/07/2016 © The University of Sheffield • Suggestion that funders might bar or otherwise penalize if you don’t share • 3 strikes and your out proposal • ESRC funding of £10.8m for reusing existing data sets (call open now) • More academic collaboration if you could see exactly what people are doing Psychology repositories that work already out there 17/07/2016 © The University of Sheffield Backstop your research papers 17/07/2016 © The University of Sheffield Our turn in the spotlight (again) Cyril Burt 17/07/2016 © The University of Sheffield • "The Burt Affair“ • Heritability of intelligence (as measured in IQ tests) • Twin studies • Published numerous articles and books on a host of topics • Two of Burt's supposed collaborators, Margaret Howard and J. Conway, were invented by Burt himself • First British Psychologist to be knighted • Earlier work is often accepted as valid • All of his notes and records had been burnt Reuse your own data • • • • • • • In short more than one and the more basic and common the better, e.g. CSV raw text is better than E-Prime E-DataAid files Keep the original data along with any versions translated into new emerging formats, e.g. MS Word 2 -> Word 4 -> Word 6 going forward... Update to new storage media as it becomes available, Floppy disk -> CDR -> DVDR… plus keep the originals Bear in mind companies go bust and take their software and file formats with them, e.g. WordStar, WordPerfect, Lotus 123… plus companies are taken over and change direction IBM SPSS! Be sure to describe your data properly using metadata (data about data) so you or someone else can understand it! You can fall under the proverbial bus and so can all your data so describe it fully Printed copies aren’t all bad but remember these can go in the bin if space is short Metadata: describe your data and methods used to create it • Pages 40-1 of Alexander Graham Bell's unpublished laboratory notebook (1875-76), describing first successful experiment with the telephone 17/07/2016 © The University of Sheffield Copernicus stored his data with thesis & explained how it was coded c. 500 years old! For an English translation: http://www.webexhibits.org/c alendars/year-textCopernicus.html 17/07/2016 © The University of Sheffield Organizational memory • Institutions now realizing this needs to be done (research & data) • Exploitable • Funding implications, e.g. EPSRC • Prestige • Impact / REF (back to 1993) • Institutional policies and repositories, e.g. eprints 17/07/2016 © The University of Sheffield Too much? I’m swimming against the tide! Help! I’m here as a pair of boots on the ground to give discipline specific help: • I can help you write Data Management Plans for grants to increase your chances of getting funded • Put plans in place to help existing projects • Help you manage/describe/share your data more effectively • Aid DClinPsys with site files and data • Currently working on a localised one-stop-shop website Come talk to me: r.r.plant@sheffield.ac.uk 17/07/2016 © The University of Sheffield Practical help! Come talk to me: r.r.plant@sheffield.ac.uk 17/07/2016 © The University of Sheffield