A Perspective Paul Price Dow Chemical Company

A Perspective
Paul Price
Dow Chemical Company
Publications are changing
• Leather-bound journals and dedicated libraries,
the format of the scientific paper, weird
abbreviations (Tox. & App. Pharm.)
• Recent email on the need for packing materials
• Dump the filing cabinets - PDF/HTML replaces
paper (free color!)
• Paper journals are evolving into curated web sites
• Upsetting the status quo –
– No technical reason for not sharing detailed technical
Sharing data
• Ethical issues for not sharing
– Privacy of individuals
• Economic reasons for not sharing
– Intellectual property rights
– Charging for access: the economics of journals and
data owners
– Academics: My career depends on mining my data on
my schedule
• Internet-based expectations
– I expect to see everything from home using my web
Social contracts
• Permission to sell is contingent on
demonstrating safety
• Credence for findings is less contingent on
peer review and more contingent on sharing
relevant data
• Science that supports regulatory decisions
needs to be in the sunlight
Parting thought
• When I share data I am asking the world “can
someone do a better job then me in
understanding the data?”
• When I withhold data I am saying “no one can
do a better job then me in understanding the
Therefore journals should require the
sharing of raw data as a condition or
Data Access:
Issues and Opportunities
Alan F. Karr
National Institute of Statistical Sciences
February 13, 2012
Points for Discussion
• The problem is hard
– Players are responding rationally to incentives
– Not “one size fits all”
• “The data” is ill-defined
• “Availability” is vague: what about
Tech support
Data subjects
• Reproducibility (data + code) vs. replicability (data only?)
• There are effective mechanisms for access, based on
statistical disclosure limitation
The Analysis Matters
Data Dissemination: High-Level View
Should Journals Require the Release
of Supporting Data as a Condition of
Jane C. Schroeder, DVM PhD
Science Editor, Environmental Health Perspectives
Why is access to raw data desirable?
• To advance scientific knowledge
Is it a given that access to raw data
will advance knowledge?
How would access advance knowledge?
1. Identify unintentional errors
• Data entry errors, transcribing, labeling
• Errors in coding, misconstrued variables
• Copy editing errors
– Some can be identified by a careful review of
reported results
– Avoid via documentation, data management,
internal review
– Some would require truly raw data
How would access advance knowledge?
2. Identify scientific misconduct
• If the perpetrator is competent, unlikely to
be evident
• If not competent, likely to be multiple cues
– Plagiarism, inconsistent logic, incredible
If access to raw data is the only way to
prevent fraud, we are in trouble
How would access advance knowledge?
3. Identify “errors” in decision-making
•Such “errors” may represent legitimate
– There is no single “best way” to analyze data
decision-making should be
completely transparent
How would access advance knowledge?
4. Reduce the time from data collection
to full dissemination
• Investigators must be able to recoup their
investment of time and effort
– Loose jobs  no data for anyone
• Confidentiality, informed consent
What should journals do?
Careful & detailed reviews, including requests
for code, data when appropriate
• Require complete methods
– Rationale/criteria for decisions
– Information on data management, QA/QC
• Require information to assess study quality
– Missing data, participation, drop-out,
numbers of observations
What should journals do?
Require full reporting of all results used to
support key analytic decisions and conclusions
– Essential when interpretation is subjective or criteria
are not widely accepted
– Null findings as well as positive ones
– Sensitivity analyses of assumptions, alternate
– Supplemental material, external archiving
Review and update policies when it is in the
best interest of science communication to do so
What should the community do?
Discipline-appropriate standards for data
management, QA/QC, and reporting
Bona fide internal reviews before publication
Support for costs of data sharing
Encourage and reward analyses of combined
data from multiple studies
Avoid regulations that may ultimately
impede scientific advancement by serving
some members of the community at the
expense of others
Introducing the Dryad Digital
Society of Toxicology webinar
February 2013
Peggy Schaeffer
Many journals require data sharing upon request
• Psychology
– Requested data from 141 articles
– “6 months later, after … 400 emails, [sending] detailed descriptions
of our study aims, approvals of our ethical committee, signed
assurances not to share data with others, and even our full
resumes…” data was obtained from 27% of articles.
– Wicherts et al. (2006). Am. Psych. 61:726-728
• Genetics
47% of respondents denied a request for data or materials w/in 3 yrs
28% unable confirm others’ published research as a result.
#1 reason for data withholding (80%): effort required to share it.
Campbell et al. (2002) JAMA (4):473-80.
Data archiving has many benefits
Verification of published research
Preserving accessibility to data
Allowing reuse and repurposing of
Discoverability of data
Indirect (costs avoided)
Redundant data collection
Inefficient legacy data curation
Burden of sharing-upon-request
Opportunity cost of science not
Near term
Protection against personnel
Availability for review and validation
Long term
Secure long-term stewardship
Increased impact per publication
Increased citations
New collaborations
New research opportunities
Fulfilling funding mandates
More efficient use of research
Public trust in science
Educational opportunities
Improved methodologies
More informed policy
Modified from Beagrie et al. (2009) Keeping Research Data
Safe 2
Joint Data Archiving Policy
[Journal] requires, as a condition for publication, that data
supporting the results in the paper should be archived in an
appropriate public archive, such as [list of approved
archives here].
Data are important products of the scientific enterprise,
and they should be preserved and usable for decades in the
Authors may elect to have the data publicly available at
time of publication, or, if the technology of the archive
allows, may opt to embargo access to the data for a period
up to a year after publication.
Exceptions may be granted at the discretion of the editor,
especially for sensitive information such as human subject
data or the location of endangered species.
Why use Dryad rather than Supplementary Online Materials?
Discoverable: indexed and exposed to both web and bibliographic search engines
Identifiable: DataCite DOIs within articles serve as permanent, resolvable identifiers
Permanent: processes in place to promote preservation (incl. format migration)
Curated: quality control by both automated processes and human inspection
Ease of deposit: streamlined deposit, allowance for large and complex datasets
Formatted for reuse: support for non-PDF file formats
Updatable: new versions of data files can be added, metadata can be enhanced
Support for embargoes: can delay release of data in accordance with journal policy
Free reuse: no paywall, clear terms of reuse (all data released under CC Zero)
Economy of scale: cost efficiency from shared infrastructure
Alignment to organizational mission: focus on archiving and reuse of scientific data
* A few publisher SOM sites are exceptions to the general rule.
** Practices differ among publishers, see Smit (2011), doi:10.1045/january2011-smit
Researchers are using Dryad for data archiving…
As of 7 Feb-2013, Dryad contains 7306 data files associated with
2662 publications from 191 different journals
and using the data for research…
Over 25 integrated journals
.. and 20 more on the way
Trustworthy repository infrastructure
Making data available is the primary mission of the organization
 No pay-walls or restrictive licenses (all released under CCZero)
 The same data may be hosted by other services (non-exclusivity)
Built on the DSpace repository platform
 An open source framework used by hundreds of institutional repositories
Multiple machine and human interfaces for discovery and access
 Dublin Core metadata harvestable through OAI-PMH
 DOIs registered through DataCite
 Curation-enhanced metadata to enhance keyword searching
 Indexed by Web of Science and other bibliographic services
Assurance of data integrity and permanent availability
 Service mirroring and backup
 File migration and bit-level integrity assurance
 Organizational failover through DataONE and (soon) CLOCKSS
 Not-for-profit organization
 Incorporated in North Carolina (USA)
 Membership is open to a diversity of stakeholder
 Scientific societies, publishers, funding agencies,
universities, libraries, etc.
 Members need not publish a partner journal
 Governed by a rotating 12-member Board of Directors,
nominated and elected by the membership
 Long-term preservation requires an organization with a viable
business model
 Not dependent on the vagaries of grant funding
 Or the largesse of an institution that may have other priorities
 Revenue will be primarily from deposit fees
This enables Dryad to make access to the data free in perpetuity
The time of deposit is when the majority of costs are incurred
Revenue scales with costs (i.e. volume of deposits)
The costs are distributed both fairly and widely
 Additional revenue
 Membership fees ($1000/yr) will cover costs of annual Membership
 Project grants will supplement the operational budget for R&D activities
 With research and development activities funded by grants at various
institutions (e.g. Duke University, Univ. of North Carolina at Chapel Hill)
Payment plans
Paid by
Non-member Cost1
Journal, society, or
publisher, in advance
Based on total annual volume of
research articles @ $30/article
Journal or other sponsoring
organization, invoiced
periodically for prior
$75/data package2
Journal or other sponsoring
organization, paid in
$70/data package
Pay on
Author, at time of deposit
$80/data package, with a process
for granting waivers for authors
from less-developed countries
Up to a fixed deposit size (currently 10GB). Additional charges for larger deposits.
Data package = all the data associated with an article.
The value proposition
 For researchers, Dryad…
 increases the impact of, and citations to, published research
 preserves and makes available others’ data
 frees researchers from the burden of data preservation and access
 For societies, journals, and publishers Dryad…
offers more visibility for research outputs
promotes prestige for the discipline
supports a wide range of journal policies on data sharing
frees journals from the burden of maintaining supplemental data
 For libraries and institutions, Dryad…
 makes data available at no cost, under clear terms of use
 helps fulfill their research data management mandates
 For funders, Dryad…
 provides a cost-effective mechanism to make research more accessible
To learn more
Repository home: http://datadryad.org
News: http://blog.datadryad.org
Project documentation: http://wiki.datadryad.org
Twitter: @datadryad
Facebook: www.facebook.com/DataDryad
contact us:
Todd Vision, Project Director, tjv@bio.unc.edu
Laura Wendell, Executive Director, lwendell@datadryad.org
Peggy Schaeffer, Communications Coordinator, pschaeffer@datadryad.org