The BioCADDIE / FORCE11 Data Citation Pilot Tim Clark, Ph.D. Harvard Medical School & Massachusetts General Hospital Maryann Martone, Ph.D., University of California at San Diego October 13, 2015 © 2015 FORCE11.org Background • BD2K Aim #1: “To facilitate broad use of biomedical digital assets by making them discoverable, accessible and citable.” (NIH 2015)1 • Data robustly archived, and directly cited in journal articles can provide powerful input to BioCADDIE content and operations. • Significant work has been done on data citation and will provide a foundation on which to proceed. • Several top-tier publishers are planning to implement this approach but need assistance. • This pilot will organize a coordination activity and provide communication across participating groups Background Documents • CODATA & National Academies Reports (2012-2013) 2, 3 • Joint Declaration of Data Citation Principles / JDDCP (2014) 4 • Collins & Tabak (2014) on NIH reproducibility initiatives 5 • JDDCP Implementation Guidelines (Starr et al. 2015) 6 • ELIXIR/BD2K& BioCADDIE/FORCE11 Workshops (Jan 2015) 7 • BioCADDIE supplement for Data Citation Implementation Pilot, subcontract to FORCE11 thru UCSD (Oct 2015) Objectives • Provide coordination & guidance for early adopters of data citation: publishers, repositories and ID / metadata services. • Help establish one or more benchmark implementations by important early adopters across key use cases. • Focus on archiving and citing primary research data. • Coordinate with CODATA’s international workshops on data citation, complementary to the focused early adopter pilot. • Publish several peer-reviewed articles and a final report. • Provide report on lessons learned to the community. DCIP Executive Committee • Tim Clark, Harvard Medical School. • Carole Goble, U of Manchester, ELIXIR Deputy Director for the UK. • Jeff Grethe, UC San Diego, BioCADDIE Executive Committee. • Simon Hodson, Executive Director, CODATA. • Maryann Martone, UC San Diego & Hypothes.is. • Jo McEntyre, EMBL/EBI, European PubMed Central. • Joan Starr, California Digital Library. Agreed Participants to Date • Publishers • • • Repositories • Dryad, Figshare, PDB, European PMC (EMBL/EBI) • Columbia University Library, Harvard Dataverse Metadata & ID • • California Digital Library, DataCite, CrossRef, ORCID Standards, Academic & Scholarly Organizations • • Elsevier, PLoS, Biomed Central, eLife, F1000, GigaScience JATS Standing Committee, CODATA, ELIXIR BioCADDIE Approach • • Publishers • use JATS 1.1d2/3 schema for documents • common data citation workflows & core metadata Repositories • • use JDDCP implementation guidelines Authors • provide authors a common FAQ web page • assist publisher operations group in supporting authors Proposed Deliverables 1. Principles and Entailments for Direct Scientific Data Citation on the Web - peer reviewed archival version of the JDDCP with background material 2. Five Steps to Citing Research Data - summary JDDCP implementation guidelines 3. Citing Data with the Journal Article Tag Suite - detailed guidance for using the 1d.2 and 1d.3 NISO JATS revisions for data citation in publishing 4. Data Citation FAQ - dynamic feature on the FORCE11 site 5. Data Citation Ask the Experts - social media feature on FORCE11 website 6. Ongoing input to CODATA Global Workshops on Data Citation 7. Ongoing implementation guidance and coordination across Pilot stakeholders 8. Citing Data in Action: Experiences and lessons learned from the BioCADDIE Data Citation Implementation Pilot - 1 year report on the pilot BioCADDIE Integration • All metadata and identifiers available for harvesting • Ensure that archived content is indexable • Coordination with BioCADDIE use case & activities To Sum Up • Data Citation Implementation Pilot over 1 year • Organize “early adopter” stakeholders • Provide expert feedback, authoritative guidance • Enable stakeholders to implement successfully • Publish guidance and lessons learned • Strong BioCADDIE support and integration References 1. NIH: About BD2K. National Institutes of Health, 2015. Accessed October 12, 2015. [https://datascience.nih.gov/bd2k/about]. 2. Uhlir P: For Attribution - Developing Data Attribution and Citation Practices and Standards: Summary of an International Workshop (2012) In.: The National Academies Press; 2012: 220 [http://www.nap.edu/catalog.php?record_id=13564]. 3. CODATA/ITSCI Task Force on Data Citation: Out of cite, out of mind: The Current State of Practice, Policy and Technology for Data Citation. Data Science Journal 2013, 12:1-75 4. Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. Edited by Martone M. San Diego CA: Future of Research Communication and e-Scholarship (FORCE11); 2014 [https://www.force11.org/datacitation]. 56. Collins FS, Tabak LA: Policy: NIH plans to enhance reproducibility. Nature 2014, 505(7485):612 6. Starr J, Castro E, Crosas M, Dumontier M, Downs RR, Duerr R, Haak LL, Haendel M, Herman I, Hodson S, ́ HH, Kratz JE, Lin J, Nielsen LH, Nurnberger A, Proell S, Rauber A, Sacchi S, Smith A, Taylor M, Clark T: Achieving human and machine accessibility of cited data in scholarly publications. PeerJ 2015, 1: e1.