A Perspective Paul Price Dow Chemical Company pprice@dow.com Publications are changing • Leather-bound journals and dedicated libraries, the format of the scientific paper, weird abbreviations (Tox. & App. Pharm.) • Recent email on the need for packing materials • Dump the filing cabinets - PDF/HTML replaces paper (free color!) • Paper journals are evolving into curated web sites • Upsetting the status quo – – No technical reason for not sharing detailed technical findings Sharing data • Ethical issues for not sharing – Privacy of individuals • Economic reasons for not sharing – Intellectual property rights – Charging for access: the economics of journals and data owners – Academics: My career depends on mining my data on my schedule • Internet-based expectations – I expect to see everything from home using my web browser Social contracts • Permission to sell is contingent on demonstrating safety • Credence for findings is less contingent on peer review and more contingent on sharing relevant data • Science that supports regulatory decisions needs to be in the sunlight Parting thought • When I share data I am asking the world “can someone do a better job then me in understanding the data?” • When I withhold data I am saying “no one can do a better job then me in understanding the data” Therefore journals should require the sharing of raw data as a condition or publication Data Access: Issues and Opportunities Alan F. Karr National Institute of Statistical Sciences karr@niss.org February 13, 2012 6 Points for Discussion • The problem is hard – Players are responding rationally to incentives – Not “one size fits all” • “The data” is ill-defined • “Availability” is vague: what about – – – – – Cost Liability Tech support Co-authorship Data subjects • Reproducibility (data + code) vs. replicability (data only?) • There are effective mechanisms for access, based on statistical disclosure limitation 7 The Analysis Matters 8 Data Dissemination: High-Level View 9 Should Journals Require the Release of Supporting Data as a Condition of Publication? Jane C. Schroeder, DVM PhD Science Editor, Environmental Health Perspectives schroederjc@niehs.nih.gov No. 11 Why is access to raw data desirable? • To advance scientific knowledge Is it a given that access to raw data will advance knowledge? 12 How would access advance knowledge? 1. Identify unintentional errors • Data entry errors, transcribing, labeling • Errors in coding, misconstrued variables • Copy editing errors – Some can be identified by a careful review of reported results – Avoid via documentation, data management, internal review – Some would require truly raw data 13 How would access advance knowledge? 2. Identify scientific misconduct • If the perpetrator is competent, unlikely to be evident • If not competent, likely to be multiple cues – Plagiarism, inconsistent logic, incredible findings • If access to raw data is the only way to prevent fraud, we are in trouble 14 How would access advance knowledge? 3. Identify “errors” in decision-making •Such “errors” may represent legitimate differences – There is no single “best way” to analyze data •However, decision-making should be completely transparent 15 How would access advance knowledge? 4. Reduce the time from data collection to full dissemination • Investigators must be able to recoup their investment of time and effort – Loose jobs no data for anyone • Confidentiality, informed consent agreements 16 What should journals do? Careful & detailed reviews, including requests for code, data when appropriate • Require complete methods – Rationale/criteria for decisions – Information on data management, QA/QC • Require information to assess study quality – Missing data, participation, drop-out, numbers of observations • 17 What should journals do? • Require full reporting of all results used to support key analytic decisions and conclusions – Essential when interpretation is subjective or criteria are not widely accepted – Null findings as well as positive ones – Sensitivity analyses of assumptions, alternate approaches – Supplemental material, external archiving • Review and update policies when it is in the best interest of science communication to do so 18 What should the community do? • • • • • Discipline-appropriate standards for data management, QA/QC, and reporting Bona fide internal reviews before publication Support for costs of data sharing Encourage and reward analyses of combined data from multiple studies Avoid regulations that may ultimately impede scientific advancement by serving some members of the community at the expense of others 19 Introducing the Dryad Digital Repository Society of Toxicology webinar February 2013 Peggy Schaeffer 20 Many journals require data sharing upon request • Psychology – Requested data from 141 articles – “6 months later, after … 400 emails, [sending] detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes…” data was obtained from 27% of articles. – Wicherts et al. (2006). Am. Psych. 61:726-728 • Genetics – – – – 47% of respondents denied a request for data or materials w/in 3 yrs 28% unable confirm others’ published research as a result. #1 reason for data withholding (80%): effort required to share it. Campbell et al. (2002) JAMA (4):473-80. datadryad.org Data archiving has many benefits Direct Verification of published research Preserving accessibility to data Allowing reuse and repurposing of data Discoverability of data Indirect (costs avoided) Redundant data collection Inefficient legacy data curation Burden of sharing-upon-request Opportunity cost of science not done Near term Protection against personnel turnover Availability for review and validation Long term Secure long-term stewardship Increased impact per publication Private Increased citations New collaborations New research opportunities Fulfilling funding mandates Public More efficient use of research dollars Public trust in science Educational opportunities Improved methodologies More informed policy Modified from Beagrie et al. (2009) Keeping Research Data Safe 2 datadryad.org Joint Data Archiving Policy [Journal] requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive, such as [list of approved archives here]. Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. Authors may elect to have the data publicly available at time of publication, or, if the technology of the archive allows, may opt to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information such as human subject data or the location of endangered species. datadryad.org Why use Dryad rather than Supplementary Online Materials? Dryad SOM Discoverable: indexed and exposed to both web and bibliographic search engines ✔ ✗ Identifiable: DataCite DOIs within articles serve as permanent, resolvable identifiers ✔ ✗* Permanent: processes in place to promote preservation (incl. format migration) ✔ ✔/✗** Curated: quality control by both automated processes and human inspection ✔ ✗* Ease of deposit: streamlined deposit, allowance for large and complex datasets ✔ ✔/✗** Formatted for reuse: support for non-PDF file formats ✔ ✔/✗** Updatable: new versions of data files can be added, metadata can be enhanced ✔ ✗ Support for embargoes: can delay release of data in accordance with journal policy ✔ ✗ Free reuse: no paywall, clear terms of reuse (all data released under CC Zero) ✔ ✔/✗** Economy of scale: cost efficiency from shared infrastructure ✔ ✔/✗** Alignment to organizational mission: focus on archiving and reuse of scientific data ✔ ✗ * A few publisher SOM sites are exceptions to the general rule. ** Practices differ among publishers, see Smit (2011), doi:10.1045/january2011-smit datadryad.org Researchers are using Dryad for data archiving… As of 7 Feb-2013, Dryad contains 7306 data files associated with 2662 publications from 191 different journals datadryad.org and using the data for research… datadryad.org Over 25 integrated journals .. and 20 more on the way datadryad.org Trustworthy repository infrastructure Making data available is the primary mission of the organization No pay-walls or restrictive licenses (all released under CCZero) The same data may be hosted by other services (non-exclusivity) Built on the DSpace repository platform An open source framework used by hundreds of institutional repositories Multiple machine and human interfaces for discovery and access Dublin Core metadata harvestable through OAI-PMH DOIs registered through DataCite Curation-enhanced metadata to enhance keyword searching Indexed by Web of Science and other bibliographic services Assurance of data integrity and permanent availability Service mirroring and backup File migration and bit-level integrity assurance Organizational failover through DataONE and (soon) CLOCKSS datadryad.org Governance Not-for-profit organization Incorporated in North Carolina (USA) Membership is open to a diversity of stakeholder organizations Scientific societies, publishers, funding agencies, universities, libraries, etc. Members need not publish a partner journal Governed by a rotating 12-member Board of Directors, nominated and elected by the membership datadryad.org Sustainability Long-term preservation requires an organization with a viable business model Not dependent on the vagaries of grant funding Or the largesse of an institution that may have other priorities Revenue will be primarily from deposit fees This enables Dryad to make access to the data free in perpetuity The time of deposit is when the majority of costs are incurred Revenue scales with costs (i.e. volume of deposits) The costs are distributed both fairly and widely Additional revenue Membership fees ($1000/yr) will cover costs of annual Membership meetings Project grants will supplement the operational budget for R&D activities With research and development activities funded by grants at various institutions (e.g. Duke University, Univ. of North Carolina at Chapel Hill) datadryad.org Payment plans Plan Contract? Paid by Non-member Cost1 Subscription yes Journal, society, or publisher, in advance Based on total annual volume of research articles @ $30/article Deferred payment yes Journal or other sponsoring organization, invoiced periodically for prior deposits $75/data package2 Voucher yes Journal or other sponsoring organization, paid in advance $70/data package Pay on deposit no Author, at time of deposit $80/data package, with a process for granting waivers for authors from less-developed countries 1 2 Up to a fixed deposit size (currently 10GB). Additional charges for larger deposits. Data package = all the data associated with an article. datadryad.org The value proposition For researchers, Dryad… increases the impact of, and citations to, published research preserves and makes available others’ data frees researchers from the burden of data preservation and access For societies, journals, and publishers Dryad… offers more visibility for research outputs promotes prestige for the discipline supports a wide range of journal policies on data sharing frees journals from the burden of maintaining supplemental data For libraries and institutions, Dryad… makes data available at no cost, under clear terms of use helps fulfill their research data management mandates For funders, Dryad… provides a cost-effective mechanism to make research more accessible datadryad.org To learn more • • • • • Repository home: http://datadryad.org News: http://blog.datadryad.org Project documentation: http://wiki.datadryad.org Twitter: @datadryad Facebook: www.facebook.com/DataDryad contact us: • • • Todd Vision, Project Director, tjv@bio.unc.edu Laura Wendell, Executive Director, lwendell@datadryad.org Peggy Schaeffer, Communications Coordinator, pschaeffer@datadryad.org datadryad.org