SCIENTIFIC DATA Presentation to the California Digital Library, 20th June 2014 Ruth Wilson – Head of Publishing Services Andrew Hufton – Managing Editor Iain Hrynaszkiewicz – Head of Data and HSS Introduction • Open Access at NPG • Drivers for data publication • Scientific Data • Next steps Development of Open Access General landscape Open Access at NPG 2000: PubMed Central launched 2001: Nature, Science, and the Third World Academy of Sciences launch SciDev, a free online source of science news and research 2003: The Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities signed 2005: The Wellcome Trust introduced its open access mandate to Wellcome-funded research 2002: NPG ceases to require copyright transfer on research articles 2005: First full OA title launched, Molecular Systems Biology 2005: National Institutes of Health adopted 2009-2011: All non- Nature journals NIH Public Access Policy 2006: RCUK open access mandates come into effect offer OA option 2011: Scientific Reports launched 2013: Nature Publishing Group partners 2009: First international Open Access Week with open access publisher Frontiers. 2013: Obama administration US and 2014: Launch of Scientific Data HEFCE, UK both introduce open access 2014: Launch of Nature Partner Journals mandates for taxpayer-funded research 2014: Chinese science research funding agencies mandate open access 3 2014: 51% of NPG and Frontiers content is published open access Open Access at Nature Publishing Group Nature Communications Launched in 2010, NatComms now has an impact factor of 10.015 and receives more submissions than Nature Frontiers A community-oriented open-access academic publisher and research network. Scientific Data Open access publication publishing Data Descriptors, peer-reviewed, scientific publications that provide detailed descriptions of datasets. Scientific Reports Fully open access, Scientific Reports is a primary research publication covering all areas of the natural sciences. Society open access journals We publish 18 fully open access titles with society partners Nature Partner Journals A new series of online open access journals, published in collaboration with world-renowned international partners. Subscription journals offering open access option Over 40 journals in the NPG family offer an open access option. 4 Concept development Drivers for data publication Two important factors are driving to make research data more available and reusable: • To ensure the scientific process is transparent and can be scrutinised and research results reproduced • To speed the scientific process, lead to new insights and reduce duplicated and repeated work To achieve this research data needs to be Available, Discoverable, Interpretable, Re-usable, Citable Stakeholders Funders/researchers/research institutes/data repositories/libraries/learned societies/publishers/standards groups/curators 6 Researchers and data What do researchers do with their data? ~ 75% of researchers store their data locally and do not publish it. ~17% publish data in supplementary info ~14% delete research data ~10% deposit data in a public repository A strong collaborative culture exists among researchers: They share 60% of their data with their colleagues 50% look at other researchers’ datasets at least once a month Researchers are supportive of Scientific Data: Over 90% reacted positively to the concept of Scientific Data 80% believed that Scientific Data would increase repository deposition rates What was 96% 95% 93% 80% 7 important to them? - increased visibility and discovery of their research data - increased usability of their research data - credit mechanism for those who take the time to deposit and explain their data - peer review of content/datasets A new open-access publication for descriptions of scientifically valuable datasets Now Live! Get Credit for Sharing Your Data Publications will be indexed and citeable. Open-access Authors select from three Creative Commons licenses for the main Data Descriptor. Each publication supported by CCO metadata. Focused on Data Reuse All the information others need to reuse the data; no interpretative analysis, or hypothesis testing Peer-reviewed Rigorous peer-review focused on technical data quality and reuse value Promoting Community Data Repositories Not a new data repository; data stored in community data repositories Data Descriptor Focus on data reuse Detailed descriptions of the methods and technical analyses supporting the quality of the measurements. Does not contain tests of new scientific hypotheses Sections: • Title • Abstract • Background & Summary • Methods • Technical Validation • Data Records • Usage Notes • Figures & Tables • References • Data Citations Data Descriptor Article or narrative component (PDF and HTML) Experimental metadata or structured component (in-house curated, machine-readable formats) Data Citations Formally link Data Descriptor to external data records Joint Declaration of Data Citation Principles by the Data Citation Synthesis Group, incl.: - CODATA - Research Data Alliance, - Force11 Data Descriptor structured metadata (CC0) In-house curation team: • assists users to submit the structured content via simple templates and an internal authoring tool • performs value-added semantic annotation of the experimental metadata For advanced users/service providers willing to export ISA-Tab for direct submission, we have released a technical specification: analysis method Data file or record in a database script Our data policies Clear data sharing policies • Data must be deposited to an approved data repository before manuscript submission, prior to peer-review. • If datasets are private, they must be made accessible to editors and referees in a secure and confidential manner. • Must agree to release data to the public, without undue restrictions, at the time of publication. • Reasonable controls allowed for datasets with human privacy restrictions. Data repositories criteria 1. Broadly support and recognition within their scientific community 2. Ensure long-term persistence and preservation of datasets in their published form 3. Provide expert curation 4. Implement relevant, community-endorsed reporting requirements 5. Provide for confidential review of submitted datasets 6. Provide stable identifiers for submitted datasets 7. Allow public access to data without unnecessary restrictions 17 Our recommended repositories • We currently recognize over 60 public data repositories. • We have integrated systems with both figshare and Dryad • No institutional repositories yet, but we are open to adding them … 18 The right licence for the right content Data Descriptor article: Licensed under one of three Creative Commons licenses, by author choice: Metadata: released under the CC0 waiver to maximize reuse and aid data miners Data: the primary datasets will reside in public repositories. Partnering with figshare and Dryad, which both use the CC0 waiver. Diverse content from across the natural sciences Ecology • Associated Nature articles • Data in figshare • Integrated figshare data viewer • Citizen science project Neuroscience • • • • New Dataset Data in OpenfMRI Source code in GitHub Big Data Code in GitHub Data Descriptor relation with traditional articles Methods and technical analyses supporting the quality of the measurements. Do not contain tests of new scientific hypotheses What did I do to generate the data? How was the data processed? Where is the data? Who did what when Synthesis Analysis Conclusions Community Advisory Panel • Guide the development, policies, standards and editorial scope of Scientific Data. • Senior scientists from academia and industry along with representatives from the data repository, librarian, biocurator and funder communities. Editorial Board Active scientists oversee peer-review Peer-review assesses • The completeness of the description • Alignment with community standards • Data deposition in an appropriate repository • Technical quality of the measurements • Reuse value Scientific Data & the University of California Advisory Panel Patricia Cruse, CDL Joseph Ecker, Salk & UCSD Editorial Board Michelle Arkin, UCSF Trey Ideker, UCSD Maryann Martone, UCSD Adam Renslo, UCSF Amir AghaKouchak, UCI 27 Thanks for your time! Helping you publish, discover and reuse research data Now launched! Honorary Academic Editor Susanna-Assunta Sansone Advisory Panel and Editorial Board including senior researchers, funders, librarians and curators Supported by Visit nature.com/scientificdata Email scientificdata@nature.com Tweet @ScientificData