THE NEED AND DRIVE FOR HIGH QUALITY DATA PUBLICATION COASP, Paris, September 2014 Iain Hrynaszkiewicz Head of Data and HSS Publishing, Open Research Nature Publishing Group & Palgrave Macmillan iain.hrynaszkiewicz@nature.com @iainh_z Publishers and data/reproducibility • Policies on access (to data, code, reagents etc) • Supporting funder & community needs • Format and amount of content • Methodological details, supp info, data integration and links to repositories • Licensing for reuse • Incentives to share • Data citations • Data journals and articles • Quality assurance through peer review 2 Data/reproducibility and NPG Some important events 1996: Bermuda Principles – prepublication sharing 1998: Structural data accession codes (Nature & Science) 2002: MIAME-compliant microarray data deposition 2007: Removal of limitations on Methods sections online 2009: Ioannidis et al. Nat Gen 41, 2, 149 (2009) . 3 2013 Data/reproducibility and NPG Some important recent events 2013: Reproducibility checklist, source data from figures 2014: Endorsing the Joint Declaration of Data Citation Principles 2014: Launch of Scientific Data 5 Role of data journals/articles 6 • Credit • Unpublished data • Peer review focus • Value of data vs. analysis • Discoverability • Reusability • Narrative/context • “Intelligently open data” Data, data (journals) everywhere? 7 Scientific Data 2011 Market Research Scope of survey • How much data researchers produce, in what format and what they do with it • Perceived availability of public repositories • Perceptions of the Scientific Data concept • Level/nature of data journal peer review Respondent characteristics • 387 respondents (329 active researchers) • Physics (24%), Earth and environmental science (21%), Biology (20%) Chemistry (19%) Others (16%) 8 Scientific Data 2011 Market Research Key survey data • 60% share their data with their colleagues • 50% look at other researchers’ datasets at least once a month • 45% unaware of a repository for some of their data • 90% reacted positively to the concept of Scientific Data • 80% believed Scientific Data would increase data deposition rates 9 Scientific Data 2011 Market Research Key survey data – what do researchers want from a data publication? • 96% - increased visibility and discovery • 95% - increased usability of their research data • 93% - credit mechanism for deposit of data • 80% - peer review of content/datasets 10 Get Credit for Sharing Your Data Publications will be indexed and citeable. Open-access Creative Commons licenses (CC-BY/CC-BY-NC) for the main Data Descriptor. Each publication supported by CCO metadata. Focused on Data Reuse All the information others need to reuse the data; no interpretative analysis, or hypothesis testing Peer-reviewed Rigorous peer-review focused on technical data quality and reuse value Promoting Community Data Repositories Not a new data repository; data stored in community data repositories Scientific Data Scope An open access, peer-reviewed publication for descriptions of scientifically valuable datasets. Our primary article-type, the Data Descriptor, is designed to make your data more discoverable, interpretable and reusable. Editorial team Managing Editor (Andrew Hufton) Editorial Curator (Victoria Newman) Honorary Academic Editor (Susanna Sansone, Oxford) Advisory Panel and Editorial Board Open access article processing charge $1,000 USD / £650 GBP / €750 for each accepted article 12 The ‘Data Descriptor’ article Detailed descriptions of the methods and technical analyses supporting the quality of the measurements. Does not contain tests of new scientific hypotheses Sections: • Title • Abstract • Background & Summary • Methods • Technical Validation • Data Records • Usage Notes • Figures & Tables • References • Data Citations Peer review at Scientific Data Focuses on: • Completeness (can others reproduce?) • Consistency (were community standards followed?) • Integrity (are data in the best repository?) • Experimental rigour and technical quality (were the methods sound?) Does not focus on: • Perceived impact/importance • Size/complexity of data The ‘Data Descriptor’ article Article or narrative component (PDF and HTML) Experimental metadata or structured component (in-house curated, machine-readable formats) Zehr et al. Scientific Data 1, Article number: 140019 doi:10.1038/sdata.2014.19 17 Stem Cells • Associated Nature Article • Data at figshare & NCBI GEO • Integrated figshare data viewer Neuroscience • • • • New Dataset Data in OpenfMRI Source code in GitHub Big Data Code in GitHub The right licence Data Descriptor article: Licensed under one of two Creative Commons licenses, by author choice: Metadata: released under the CC0 waiver to maximize reuse and aid data miners Data: depends on public repositories. Partner repositories figshare and Dryad both use the CC0 waiver. Thank you For more information please contact IAIN HRYNASZKIEWICZ Head of Data and HSS Publishing, Open Research M: +44 (0)7814 290576 T: +44 (0)207 0146753 E: iain.hrynaszkiewicz@nature.com