A Perspective Paul Price Dow Chemical Company

advertisement
A Perspective
Paul Price
Dow Chemical Company
pprice@dow.com
Publications are changing
• Leather-bound journals and dedicated libraries,
the format of the scientific paper, weird
abbreviations (Tox. & App. Pharm.)
• Recent email on the need for packing materials
• Dump the filing cabinets - PDF/HTML replaces
paper (free color!)
• Paper journals are evolving into curated web sites
• Upsetting the status quo –
– No technical reason for not sharing detailed technical
findings
Sharing data
• Ethical issues for not sharing
– Privacy of individuals
• Economic reasons for not sharing
– Intellectual property rights
– Charging for access: the economics of journals and
data owners
– Academics: My career depends on mining my data on
my schedule
• Internet-based expectations
– I expect to see everything from home using my web
browser
Social contracts
• Permission to sell is contingent on
demonstrating safety
• Credence for findings is less contingent on
peer review and more contingent on sharing
relevant data
• Science that supports regulatory decisions
needs to be in the sunlight
Parting thought
• When I share data I am asking the world “can
someone do a better job then me in
understanding the data?”
• When I withhold data I am saying “no one can
do a better job then me in understanding the
data”
Therefore journals should require the
sharing of raw data as a condition or
publication
Data Access:
Issues and Opportunities
Alan F. Karr
National Institute of Statistical Sciences
karr@niss.org
February 13, 2012
6
Points for Discussion
• The problem is hard
– Players are responding rationally to incentives
– Not “one size fits all”
• “The data” is ill-defined
• “Availability” is vague: what about
–
–
–
–
–
Cost
Liability
Tech support
Co-authorship
Data subjects
• Reproducibility (data + code) vs. replicability (data only?)
• There are effective mechanisms for access, based on
statistical disclosure limitation
7
The Analysis Matters
8
Data Dissemination: High-Level View
9
Should Journals Require the Release
of Supporting Data as a Condition of
Publication?
Jane C. Schroeder, DVM PhD
Science Editor, Environmental Health Perspectives
schroederjc@niehs.nih.gov
No.
11
Why is access to raw data desirable?
• To advance scientific knowledge
Is it a given that access to raw data
will advance knowledge?
12
How would access advance knowledge?
1. Identify unintentional errors
• Data entry errors, transcribing, labeling
• Errors in coding, misconstrued variables
• Copy editing errors
– Some can be identified by a careful review of
reported results
– Avoid via documentation, data management,
internal review
– Some would require truly raw data
13
How would access advance knowledge?
2. Identify scientific misconduct
• If the perpetrator is competent, unlikely to
be evident
• If not competent, likely to be multiple cues
– Plagiarism, inconsistent logic, incredible
findings
•
If access to raw data is the only way to
prevent fraud, we are in trouble
14
How would access advance knowledge?
3. Identify “errors” in decision-making
•Such “errors” may represent legitimate
differences
– There is no single “best way” to analyze data
•However,
decision-making should be
completely transparent
15
How would access advance knowledge?
4. Reduce the time from data collection
to full dissemination
• Investigators must be able to recoup their
investment of time and effort
– Loose jobs  no data for anyone
• Confidentiality, informed consent
agreements
16
What should journals do?
Careful & detailed reviews, including requests
for code, data when appropriate
• Require complete methods
– Rationale/criteria for decisions
– Information on data management, QA/QC
• Require information to assess study quality
– Missing data, participation, drop-out,
numbers of observations
•
17
What should journals do?
•
Require full reporting of all results used to
support key analytic decisions and conclusions
– Essential when interpretation is subjective or criteria
are not widely accepted
– Null findings as well as positive ones
– Sensitivity analyses of assumptions, alternate
approaches
– Supplemental material, external archiving
•
Review and update policies when it is in the
best interest of science communication to do so
18
What should the community do?
•
•
•
•
•
Discipline-appropriate standards for data
management, QA/QC, and reporting
Bona fide internal reviews before publication
Support for costs of data sharing
Encourage and reward analyses of combined
data from multiple studies
Avoid regulations that may ultimately
impede scientific advancement by serving
some members of the community at the
expense of others
19
Introducing the Dryad Digital
Repository
Society of Toxicology webinar
February 2013
Peggy Schaeffer
20
Many journals require data sharing upon request
• Psychology
– Requested data from 141 articles
– “6 months later, after … 400 emails, [sending] detailed descriptions
of our study aims, approvals of our ethical committee, signed
assurances not to share data with others, and even our full
resumes…” data was obtained from 27% of articles.
– Wicherts et al. (2006). Am. Psych. 61:726-728
• Genetics
–
–
–
–
47% of respondents denied a request for data or materials w/in 3 yrs
28% unable confirm others’ published research as a result.
#1 reason for data withholding (80%): effort required to share it.
Campbell et al. (2002) JAMA (4):473-80.
datadryad.org
Data archiving has many benefits
Direct
Verification of published research
Preserving accessibility to data
Allowing reuse and repurposing of
data
Discoverability of data
Indirect (costs avoided)
Redundant data collection
Inefficient legacy data curation
Burden of sharing-upon-request
Opportunity cost of science not
done
Near term
Protection against personnel
turnover
Availability for review and validation
Long term
Secure long-term stewardship
Increased impact per publication
Private
Increased citations
New collaborations
New research opportunities
Fulfilling funding mandates
Public
More efficient use of research
dollars
Public trust in science
Educational opportunities
Improved methodologies
More informed policy
Modified from Beagrie et al. (2009) Keeping Research Data
Safe 2
datadryad.org
Joint Data Archiving Policy
[Journal] requires, as a condition for publication, that data
supporting the results in the paper should be archived in an
appropriate public archive, such as [list of approved
archives here].
Data are important products of the scientific enterprise,
and they should be preserved and usable for decades in the
future.
Authors may elect to have the data publicly available at
time of publication, or, if the technology of the archive
allows, may opt to embargo access to the data for a period
up to a year after publication.
Exceptions may be granted at the discretion of the editor,
especially for sensitive information such as human subject
data or the location of endangered species.
datadryad.org
Why use Dryad rather than Supplementary Online Materials?
Dryad
SOM
Discoverable: indexed and exposed to both web and bibliographic search engines
✔
✗
Identifiable: DataCite DOIs within articles serve as permanent, resolvable identifiers
✔
✗*
Permanent: processes in place to promote preservation (incl. format migration)
✔
✔/✗**
Curated: quality control by both automated processes and human inspection
✔
✗*
Ease of deposit: streamlined deposit, allowance for large and complex datasets
✔
✔/✗**
Formatted for reuse: support for non-PDF file formats
✔
✔/✗**
Updatable: new versions of data files can be added, metadata can be enhanced
✔
✗
Support for embargoes: can delay release of data in accordance with journal policy
✔
✗
Free reuse: no paywall, clear terms of reuse (all data released under CC Zero)
✔
✔/✗**
Economy of scale: cost efficiency from shared infrastructure
✔
✔/✗**
Alignment to organizational mission: focus on archiving and reuse of scientific data
✔
✗
* A few publisher SOM sites are exceptions to the general rule.
** Practices differ among publishers, see Smit (2011), doi:10.1045/january2011-smit
datadryad.org
Researchers are using Dryad for data archiving…
As of 7 Feb-2013, Dryad contains 7306 data files associated with
2662 publications from 191 different journals
datadryad.org
and using the data for research…
datadryad.org
Over 25 integrated journals
.. and 20 more on the way
datadryad.org
Trustworthy repository infrastructure




Making data available is the primary mission of the organization
 No pay-walls or restrictive licenses (all released under CCZero)
 The same data may be hosted by other services (non-exclusivity)
Built on the DSpace repository platform
 An open source framework used by hundreds of institutional repositories
Multiple machine and human interfaces for discovery and access
 Dublin Core metadata harvestable through OAI-PMH
 DOIs registered through DataCite
 Curation-enhanced metadata to enhance keyword searching
 Indexed by Web of Science and other bibliographic services
Assurance of data integrity and permanent availability
 Service mirroring and backup
 File migration and bit-level integrity assurance
 Organizational failover through DataONE and (soon) CLOCKSS
datadryad.org
Governance
 Not-for-profit organization
 Incorporated in North Carolina (USA)
 Membership is open to a diversity of stakeholder
organizations
 Scientific societies, publishers, funding agencies,
universities, libraries, etc.
 Members need not publish a partner journal
 Governed by a rotating 12-member Board of Directors,
nominated and elected by the membership
datadryad.org
Sustainability
 Long-term preservation requires an organization with a viable
business model
 Not dependent on the vagaries of grant funding
 Or the largesse of an institution that may have other priorities
 Revenue will be primarily from deposit fees




This enables Dryad to make access to the data free in perpetuity
The time of deposit is when the majority of costs are incurred
Revenue scales with costs (i.e. volume of deposits)
The costs are distributed both fairly and widely
 Additional revenue
 Membership fees ($1000/yr) will cover costs of annual Membership
meetings
 Project grants will supplement the operational budget for R&D activities
 With research and development activities funded by grants at various
institutions (e.g. Duke University, Univ. of North Carolina at Chapel Hill)
datadryad.org
Payment plans
Plan
Contract?
Paid by
Non-member Cost1
Subscription
yes
Journal, society, or
publisher, in advance
Based on total annual volume of
research articles @ $30/article
Deferred
payment
yes
Journal or other sponsoring
organization, invoiced
periodically for prior
deposits
$75/data package2
Voucher
yes
Journal or other sponsoring
organization, paid in
advance
$70/data package
Pay on
deposit
no
Author, at time of deposit
$80/data package, with a process
for granting waivers for authors
from less-developed countries
1
2
Up to a fixed deposit size (currently 10GB). Additional charges for larger deposits.
Data package = all the data associated with an article.
datadryad.org
The value proposition
 For researchers, Dryad…
 increases the impact of, and citations to, published research
 preserves and makes available others’ data
 frees researchers from the burden of data preservation and access
 For societies, journals, and publishers Dryad…




offers more visibility for research outputs
promotes prestige for the discipline
supports a wide range of journal policies on data sharing
frees journals from the burden of maintaining supplemental data
 For libraries and institutions, Dryad…
 makes data available at no cost, under clear terms of use
 helps fulfill their research data management mandates
 For funders, Dryad…
 provides a cost-effective mechanism to make research more accessible
datadryad.org
To learn more
•
•
•
•
•
Repository home: http://datadryad.org
News: http://blog.datadryad.org
Project documentation: http://wiki.datadryad.org
Twitter: @datadryad
Facebook: www.facebook.com/DataDryad
contact us:
•
•
•
Todd Vision, Project Director, tjv@bio.unc.edu
Laura Wendell, Executive Director, lwendell@datadryad.org
Peggy Schaeffer, Communications Coordinator, pschaeffer@datadryad.org
datadryad.org
Download