ICPSR’s Approach to Data Citation and Persistent Identifiers Mary Vardigan Assistant Director, ICPSR Workshop on Persistent Identifiers in the Social Sciences -- Bonn, Germany February 1, 2011 Today’s Presentation • ICPSR’s use of data citations and persistent identifiers • Ways that ICPSR encourages good practice • Issues to be resolved • Future directions ICPSR’s Use of Citations • ICPSR has been providing citations to its data since 1990 • Citations based on “Cataloging Machine-Readable Data Files“ by Sue Dodd, American Library Association, 1982 What Makes Up an ICPSR Citation? • • • • • • • • Content Creator/Principal Investigator Title Distributor [ICPSR] Distribution place and date ICPSR study number Version number Materials designation [Computer file] DOI Example Schneider, Barbara, and Linda J Waite. The 500 Family Study [1998-2000: United States] [Computer file]. ICPSR04549-v1. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2008-05-30. doi:10.3886/ICPSR04549 ICPSR’s Use of DOIs • ICPSR started assigning DOIs in 2008 • DOIs apply at the study or collection level (a study can have multiple datasets) • DOIs are of the form: doi:10.3886/ICPSR04549 • DOIs resolve to the study homepage (metadata record) How ICPSR Obtains DOIs • ICPSR uses the CrossRef service, “the official DOI® link registration agency for scholarly and professional publications” • ICPSR pays a modest annual Publisher Fee (based on publishing revenues) and pays 6 cents per DOI • To begin assigning DOIs, in 2008 sent CrossRef an XML file containing metadata on all ICPSR 7000+ studies • Now get DOIs weekly Weekly Process • ICPSR runs script to create XML metadata in CrossRef format: – Contributors and their roles – Title – Publication date – Update date – Study number – DOI – URL Weekly Process, continued • ICPSR submits XML file to register new DOIs • CrossRef sends email confirming the file is correct • At that point, the DOI has an associated URL on the ICPSR Web site Alternative Process • Registration could happen in a scriptdriven manner through an API • This would happen without human intervention • ICPSR database could communicate with the CrossRef database with DOIs registered automatically Requests for DOIs • Journals are requiring that authors provide PIDs to data they analyzed for their articles • Authors are coming to ICPSR for DOIs pre-publication, generally depositing data into the Publication-Related Archive Encouraging Good Practice • Bibliography of Data-Related Literature includes 60,000 citations to publications based on ICPSR data • Two-way linking: Studies link to publications, Bibliography links back to studies • Widely used DOIs for data would make searching for and harvesting related publications much easier Making Citations and DOIs More Prominent • ICPSR provides RIS export for data citations into bibliographic citation software • ICPSR highlights the data citation and DOI in several places For each study Working with Vendors to Promote Links to Data • ICPSR has a project with Thomson Reuters to display data linkages in Web of Knowledge • Full and summary records in Web of Knowledge will link to related data when appropriate • ICPSR is providing a periodic data feed of datasets and related publications to TR • TR is integrating data feeds from others including UK Data Archive Influencing Journals • On behalf of the Data-PASS partners, ICPSR wrote to professional associations in sociology, political science, and economics • Letters urged them to raise the standards for data citations in their journals • Professional associations are in a position to set standards for their members and for journal editors (including copy editors) More on Influencing Journals • Approach was to point to the variety of ways that data were cited in specific journal issues • The letter stressed the importance of citing data the same way that publications are cited and the value of persistent identifiers • Organizations discussed the letters at recent national meetings • American Sociological Review just revised its Notice to Contributors to reflect the importance of data citations and DOIs Updating Citation Software • ICPSR worked with EndNote (owned by Thomson Reuters) to ensure that data citations display correctly • The result is that “Dataset” is now a Reference Type in EndNote. • Zotero also needs adjustment for datasets Working with the Community • ICPSR has joined DataCite as an associate member • ICPSR has joined ORCID – Open Researcher and Contributor ID. ORCID aims to create a central registry of unique identifiers for individual researchers • ICPSR is heading up an IASSIST special interest group on data citation (SIGDC) IASSIST Session • IASSIST SIGDC has proposed a session as part of a data citation track including DataCite: • Tracking Data Reuse: Motivations, Methods, and Obstacles -- Heather Piwowar, NESCent, University of British Columbia • Building Data Citations for Discovery – Hailey Mooney, Michigan State University, and Mark Newton, Purdue University • ICPSR’s Efforts to Encourage Data Citation -Elizabeth Moss, Inter-university Consortium for Political and Social Research (ICPSR) • Reactor Panel from SIGDC Issues to Resolve • With the community, address situations when data resources have multiple distributors (and multiple DOIs) • Implement versioning in DOIs • Address level of granularity for DOIs • Move to DataCite Multiple DOIs for “Same” Data • Eurobarometer 72.2 (Nuclear Energy, Corruption, Gender Equality, Healthcare, and Civil Protection) DOI: doi:10.4232/1.10009 Principal Investigator: Antonis Papacostas Publication Agent: GESIS - Leibniz-Institut für Sozialwissenschaften • Papacostas, Antonis. Eurobarometer 72.2: Nuclear Energy, Corruption, Gender Equality, Healthcare, and Civil Protection, September-October 2009 [Computer file]. ICPSR28186-v1. Cologne, Germany: GESIS/Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributors], 2010-07-19. doi:10.3886/ICPSR28186 From CrossRef’s Publisher Rules: “CrossRef only registers DOIs for Definitive Works… but not for Duplicative Works, as defined in the CrossRef Glossary. …CrossRef does not permit multiple DOIs to be assigned to certain closely related versions of a work… Where a CrossRef member has content which is substantially Duplicative of Definitive Works, the member must … retrieve the DOIs of Definitive Works for display in such substantially Duplicative Works and must link from the substantially Duplicative Works to the Definitive Works.” More on Multiple DOIs • CrossRef policy oriented toward publications not data • Arrangement between ICPSR and GESIS is clear, but there are other co-distributor relationships • How much of a problem is this and can we develop a community solution? • Can we use the DataCite metadata kernel (relationType) to specify relationships? • Would providing explanatory text and crossreferencing DOIs in archives’ metadata records be useful? Versioning and DOIs • ICPSR has decided to add version numbers to its DOIs • ICPSR may not have previous versions online • User will have to contact ICPSR for access • So far the number of users requesting older versions has been very small Level of Granularity for DOIs • ICPSR’s current practice is to assign the DOI at the study level • DOI resolves to the study homepage, which includes Version History detailing changes to all files in the collection • Assigning dataset-level DOIs is a challenge because ICPSR has over 65,000 datasets • ICPSR is undertaking a large project to revamp archival management and dataset-level DOIs will be integrated in the new infrastructure Moving to DataCite for DOIs • DataCite offers several advantages because of its focus on data • Metadata kernel more robust and intended to describe data • Community of trusted data centers is a shared goal Future Directions • Address situations when data resources have multiple distributors and multiple DOIs • Approach other vendors including Google Scholar after TR service deployed • Contact other professional associations and journals • Work with other data producers on providing visible citations and DOIs and encouraging their use • Continue spreading the word about data citation and persistent identifiers! Thank you… Questions?