Prepared for the Board on Research Data and Information “Developing Data Attribution and Citation Practices and Standards An International Symposium and Workshop” August 22-23, 2011 Data Citation in The Dataverse Network ® Micah Altman, Institute for Quantitative Social Science, Harvard University Collaborators* Leonid Andreev, Ed Bachman, Adam Buchbinder, Ken Bollen, Bryan Beecher, Steve Burling, Kevin Condon, Jonathan Crabtree, Merce Crosas, Gary King, Patrick King, Tom Lipkis, Freeman Lo, Jared Lyle, Marc Maynard, Nancy McGovern, Lois Timms-Ferrarra, Akio Sone, Bob Treacy Research Support Thanks to the Library of Congress (PA#NDP03-1), the National Science Foundation (DMS-0835500, SES 0112072), IMLS (LG-05-090041-09), the Harvard University Library, the Institute for Quantitative Social Science, the Harvard-MIT Data Center, and the Murray Research Archive. * And co-conspirators Data Citation in The Dataverse Network ® Related Work M. Crosas, 2011, “The Dataverse Network: An Open-Source Application for Sharing, Discovering and Preserving Data”, D-Lib Magazine 17(1/2). M. Altman,2008, "A Fingerprint Method for Verification of Scientific Data" in, Advances in Systems, Computing Sciences and Software Engineering, (Proceedings of the International Conference on Systems, Computing Sciences and Software Engineering 2007) , Springer Verlag. M. Altman and G. King. 2007. “A Proposed Standard for the Scholarly Citation of Quantitative Data”, D-Lib, 13, 3/4 (March/April). G. King, 2007, " An Introduction to the Dataverse Network as an Infrastructure for Data Sharing", Sociological Methods and Research,Vol. 32, No. 2, pp. 173-199 Data Citation in The Dataverse Network ® Some Terminology Data Citation in The Dataverse Network ® An Open-Source Application for Publishing, Citing and Discovering Research Data “dataverse” = a virtual archive “Dataverse Network” = a server “Study” = a work Data Citation in The Dataverse Network ® Examples Data Citation in The Dataverse Network ® Josh Angrist’s Dataverse Data Citation in The Dataverse Network ® “Data” Citation = Study Citation Two-for-one Sorta-Kinda-Meta Data Citation in The Dataverse Network ® Required Author Joshua D. Angrist; Eric Bettinger; Erik Bloom; Elizabeth King; Michael Kremer Date 2008 Title "Replication data for: Vouchers for Private Schooling in Colombia: Evidence from a Randomized Natural Experiment” Persistent http://hdl.handle.net/1902.1/11298 ID Recommended UNF UNF:3:4v7GYq3uSEeCpk8M567ITw== Murray Research Archive DDI 2 ExtensionsV1 [Distributor] [Version] Data Citation in The Dataverse Network ® What’s a UNF? UNF = “Universal Numeric Fingerprint”=~ Semantic Fingerprint Data Citation in The Dataverse Network ® Variations Proxy Handle Dataset specific – Same Id, part specified, UNF is for part Citation for subset of Variables/columns/measures state,year,data_access_who [VarGrp/@var(DDI)]; UNF:5:X4QdWp04aCZntvxZKSHLzQ== (NOT observations!) Data Citation in The Dataverse Network ® Use Cases Attribution • Cite data as first class work • Identify contributors to data Discovery Provenance • Locate data via identifier • Locate data integral to article • Locate works related to data – articles, derivatives, sources • Associate work with version of evidence used • Verify fixity of information Access • Access to surrogate • On-line access to object • Machine understandability • Long-term human understandability Persistence • Evidence persists as long as assertions based on evidence? • Durability of data transparent? Data Citation in The Dataverse Network ® Contact Us Micah Altman maltman.hmdc.harvard.edu The Dataverse Network ® thedata.org Data Citation in The Dataverse Network ®