Dataverse - The National Academies

advertisement
Prepared for the Board on Research Data and Information
“Developing Data Attribution and Citation Practices and Standards
An International Symposium and Workshop”
August 22-23, 2011
Data Citation in
The Dataverse Network ®
Micah Altman, Institute for Quantitative Social Science, Harvard University
Collaborators*

Leonid Andreev, Ed Bachman, Adam Buchbinder, Ken Bollen,
Bryan Beecher, Steve Burling, Kevin Condon, Jonathan
Crabtree, Merce Crosas, Gary King, Patrick King, Tom Lipkis,
Freeman Lo, Jared Lyle, Marc Maynard, Nancy McGovern, Lois
Timms-Ferrarra, Akio Sone, Bob Treacy

Research Support
Thanks to the Library of Congress (PA#NDP03-1), the National
Science Foundation (DMS-0835500, SES 0112072), IMLS (LG-05-090041-09), the Harvard University Library, the Institute for
Quantitative Social Science, the Harvard-MIT Data Center, and the
Murray Research Archive.
* And co-conspirators
Data Citation in The Dataverse Network ®
Related Work

M. Crosas, 2011, “The Dataverse Network: An Open-Source Application for
Sharing, Discovering and Preserving Data”, D-Lib Magazine 17(1/2).

M. Altman,2008, "A Fingerprint Method for Verification of Scientific Data" in,
Advances in Systems, Computing Sciences and Software Engineering, (Proceedings of the
International Conference on Systems, Computing Sciences and Software
Engineering 2007) , Springer Verlag.

M. Altman and G. King. 2007. “A Proposed Standard for the Scholarly Citation of
Quantitative Data”, D-Lib, 13, 3/4 (March/April).
G. King, 2007, " An Introduction to the Dataverse Network as an Infrastructure for
Data Sharing", Sociological Methods and Research,Vol. 32, No. 2, pp. 173-199

Data Citation in The Dataverse Network ®
Some Terminology
Data Citation in The Dataverse Network ®
An Open-Source Application for
Publishing, Citing and Discovering
Research Data
“dataverse”
= a virtual archive
“Dataverse Network” = a server
“Study”
= a work
Data Citation in The Dataverse Network ®
Examples
Data Citation in The Dataverse Network ®
Josh Angrist’s Dataverse
Data Citation in The Dataverse Network ®
“Data” Citation = Study Citation
Two-for-one
Sorta-Kinda-Meta
Data Citation in The Dataverse Network ®
Required
Author
Joshua D. Angrist; Eric Bettinger; Erik Bloom; Elizabeth
King; Michael Kremer
Date
2008
Title
"Replication data for: Vouchers for Private Schooling in
Colombia: Evidence from a Randomized Natural
Experiment”
Persistent
http://hdl.handle.net/1902.1/11298
ID
Recommended
UNF
UNF:3:4v7GYq3uSEeCpk8M567ITw==
Murray Research Archive
DDI 2
ExtensionsV1
[Distributor]
[Version]
Data Citation in The Dataverse Network ®
What’s a UNF?
UNF = “Universal Numeric Fingerprint”=~ Semantic Fingerprint
Data Citation in The Dataverse Network ®
Variations
Proxy Handle
Dataset specific –
Same Id,
part specified,
UNF is for part
Citation for subset of
Variables/columns/measures
state,year,data_access_who
[VarGrp/@var(DDI)];
UNF:5:X4QdWp04aCZntvxZKSHLzQ==
(NOT observations!)
Data Citation in The Dataverse Network ®
Use Cases
Attribution
• Cite data as first class work
• Identify contributors to data
Discovery
Provenance
• Locate data via identifier
• Locate data integral to
article
• Locate works related to data
– articles, derivatives,
sources
• Associate work with
version of evidence used
• Verify fixity of information
Access
• Access to surrogate
• On-line access to object
• Machine understandability
• Long-term human
understandability
Persistence
• Evidence persists as long as
assertions based on evidence?
• Durability of data transparent?
Data Citation in The Dataverse Network ®
Contact Us
Micah Altman
maltman.hmdc.harvard.edu
The Dataverse Network ®
thedata.org
Data Citation in The Dataverse Network ®
Download