
Data Citation Pilot
Tim Clark, Ph.D.
Harvard Medical School & Massachusetts General Hospital
Maryann Martone, Ph.D.,
University of California at San Diego
October 13, 2015
© 2015 FORCE11.org
BD2K Aim #1: “To facilitate broad use of biomedical digital assets by
making them discoverable, accessible and citable.” (NIH 2015)1
Data robustly archived, and directly cited in journal articles can
provide powerful input to BioCADDIE content and operations.
Significant work has been done on data citation and will provide a
foundation on which to proceed.
Several top-tier publishers are planning to implement this approach
but need assistance.
This pilot will organize a coordination activity and provide
communication across participating groups
Background Documents
CODATA & National Academies Reports (2012-2013) 2, 3
Joint Declaration of Data Citation Principles / JDDCP (2014) 4
Collins & Tabak (2014) on NIH reproducibility initiatives 5
JDDCP Implementation Guidelines (Starr et al. 2015) 6
ELIXIR/BD2K& BioCADDIE/FORCE11 Workshops (Jan 2015)
BioCADDIE supplement for Data Citation Implementation Pilot,
subcontract to FORCE11 thru UCSD (Oct 2015)
Provide coordination & guidance for early adopters of data citation:
publishers, repositories and ID / metadata services.
Help establish one or more benchmark implementations by
important early adopters across key use cases.
Focus on archiving and citing primary research data.
Coordinate with CODATA’s international workshops on data
citation, complementary to the focused early adopter pilot.
Publish several peer-reviewed articles and a final report.
Provide report on lessons learned to the community.
DCIP Executive Committee
Tim Clark, Harvard Medical School.
Carole Goble, U of Manchester, ELIXIR Deputy Director for the UK.
Jeff Grethe, UC San Diego, BioCADDIE Executive Committee.
Simon Hodson, Executive Director, CODATA.
Maryann Martone, UC San Diego & Hypothes.is.
Jo McEntyre, EMBL/EBI, European PubMed Central.
Joan Starr, California Digital Library.
Agreed Participants to Date
Dryad, Figshare, PDB, European PMC (EMBL/EBI)
Columbia University Library, Harvard Dataverse
Metadata & ID
California Digital Library, DataCite, CrossRef, ORCID
Standards, Academic & Scholarly Organizations
Elsevier, PLoS, Biomed Central, eLife, F1000, GigaScience
JATS Standing Committee, CODATA, ELIXIR
use JATS 1.1d2/3 schema for documents
common data citation workflows & core metadata
use JDDCP implementation guidelines
provide authors a common FAQ web page
assist publisher operations group in supporting authors
Proposed Deliverables
1. Principles and Entailments for Direct Scientific Data Citation on the Web - peer
reviewed archival version of the JDDCP with background material
2. Five Steps to Citing Research Data - summary JDDCP implementation guidelines
3. Citing Data with the Journal Article Tag Suite - detailed guidance for using the 1d.2
and 1d.3 NISO JATS revisions for data citation in publishing
4. Data Citation FAQ - dynamic feature on the FORCE11 site
5. Data Citation Ask the Experts - social media feature on FORCE11 website
6. Ongoing input to CODATA Global Workshops on Data Citation
7. Ongoing implementation guidance and coordination across Pilot stakeholders
8. Citing Data in Action: Experiences and lessons learned from the BioCADDIE Data
Citation Implementation Pilot - 1 year report on the pilot
BioCADDIE Integration
All metadata and identifiers available for harvesting
Ensure that archived content is indexable
Coordination with BioCADDIE use case & activities
To Sum Up
Data Citation Implementation Pilot over 1 year
Organize “early adopter” stakeholders
Provide expert feedback, authoritative guidance
Enable stakeholders to implement successfully
Publish guidance and lessons learned
Strong BioCADDIE support and integration
1. NIH: About BD2K. National Institutes of Health, 2015. Accessed October 12, 2015.
2. Uhlir P: For Attribution - Developing Data Attribution and Citation Practices and
Standards: Summary of an International Workshop (2012) In.: The National Academies
Press; 2012: 220 [http://www.nap.edu/catalog.php?record_id=13564].
3. CODATA/ITSCI Task Force on Data Citation: Out of cite, out of mind: The Current State
of Practice, Policy and Technology for Data Citation. Data Science Journal 2013, 12:1-75
4. Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. Edited by
Martone M. San Diego CA: Future of Research Communication and e-Scholarship
(FORCE11); 2014 [https://www.force11.org/datacitation].
56. Collins FS, Tabak LA: Policy: NIH plans to enhance reproducibility. Nature 2014,
6. Starr J, Castro E, Crosas M, Dumontier M, Downs RR, Duerr R, Haak LL, Haendel M,
Herman I, Hodson S, ́ HH, Kratz JE, Lin J, Nielsen LH, Nurnberger A, Proell S, Rauber A,
Sacchi S, Smith A, Taylor M, Clark T: Achieving human and machine accessibility of
cited data in scholarly publications. PeerJ 2015, 1: e1.