ICPSR`s Approach to Data Citation and Persistent Identifiers

advertisement
ICPSR’s Approach
to Data Citation
and Persistent
Identifiers
Mary Vardigan
Assistant Director, ICPSR
Workshop on Persistent Identifiers in the
Social Sciences -- Bonn, Germany
February 1, 2011
Today’s Presentation
• ICPSR’s use of data citations and
persistent identifiers
• Ways that ICPSR encourages good
practice
• Issues to be resolved
• Future directions
ICPSR’s Use of Citations
• ICPSR has been providing citations
to its data since 1990
• Citations based on “Cataloging
Machine-Readable Data Files“ by
Sue Dodd, American Library
Association, 1982
What Makes Up an ICPSR
Citation?
•
•
•
•
•
•
•
•
Content Creator/Principal Investigator
Title
Distributor [ICPSR]
Distribution place and date
ICPSR study number
Version number
Materials designation [Computer file]
DOI
Example
Schneider, Barbara, and Linda J Waite.
The 500 Family Study [1998-2000: United
States] [Computer file]. ICPSR04549-v1.
Ann Arbor, MI: Inter-university
Consortium for Political and Social
Research [distributor], 2008-05-30.
doi:10.3886/ICPSR04549
ICPSR’s Use of DOIs
• ICPSR started assigning DOIs in 2008
• DOIs apply at the study or collection
level (a study can have multiple
datasets)
• DOIs are of the form:
doi:10.3886/ICPSR04549
• DOIs resolve to the study homepage
(metadata record)
How ICPSR Obtains DOIs
• ICPSR uses the CrossRef service, “the official
DOI® link registration agency for scholarly and
professional publications”
• ICPSR pays a modest annual Publisher Fee
(based on publishing revenues) and pays 6
cents per DOI
• To begin assigning DOIs, in 2008 sent
CrossRef an XML file containing metadata on
all ICPSR 7000+ studies
• Now get DOIs weekly
Weekly Process
• ICPSR runs script to create XML metadata in
CrossRef format:
– Contributors and their roles
– Title
– Publication date
– Update date
– Study number
– DOI
– URL
Weekly Process, continued
• ICPSR submits XML file to register new
DOIs
• CrossRef sends email confirming the file
is correct
• At that point, the DOI has an associated
URL on the ICPSR Web site
Alternative Process
• Registration could happen in a scriptdriven manner through an API
• This would happen without human
intervention
• ICPSR database could communicate with
the CrossRef database with DOIs
registered automatically
Requests for DOIs
• Journals are requiring that authors
provide PIDs to data they analyzed for
their articles
• Authors are coming to ICPSR for DOIs
pre-publication, generally depositing
data into the Publication-Related Archive
Encouraging Good Practice
• Bibliography of Data-Related Literature
includes 60,000 citations to publications
based on ICPSR data
• Two-way linking: Studies link to
publications, Bibliography links back to
studies
• Widely used DOIs for data would make
searching for and harvesting related
publications much easier
Making Citations and DOIs
More Prominent
• ICPSR provides RIS export for data
citations into bibliographic citation
software
• ICPSR highlights the data citation and
DOI in several places
For each study
Working with Vendors to
Promote Links to Data
• ICPSR has a project with Thomson Reuters to
display data linkages in Web of Knowledge
• Full and summary records in Web of
Knowledge will link to related data when
appropriate
• ICPSR is providing a periodic data feed of
datasets and related publications to TR
• TR is integrating data feeds from others
including UK Data Archive
Influencing Journals
• On behalf of the Data-PASS partners,
ICPSR wrote to professional associations in
sociology, political science, and economics
• Letters urged them to raise the standards
for data citations in their journals
• Professional associations are in a position
to set standards for their members and for
journal editors (including copy editors)
More on Influencing Journals
• Approach was to point to the variety of ways
that data were cited in specific journal issues
• The letter stressed the importance of citing
data the same way that publications are cited
and the value of persistent identifiers
• Organizations discussed the letters at recent
national meetings
• American Sociological Review just revised its
Notice to Contributors to reflect the importance
of data citations and DOIs
Updating Citation Software
• ICPSR worked with EndNote (owned by
Thomson Reuters) to ensure that data citations
display correctly
• The result is that “Dataset” is now a Reference
Type in EndNote.
• Zotero also needs adjustment for datasets
Working with the Community
• ICPSR has joined DataCite as an
associate member
• ICPSR has joined ORCID – Open
Researcher and Contributor ID. ORCID
aims to create a central registry of
unique identifiers for individual
researchers
• ICPSR is heading up an IASSIST special
interest group on data citation (SIGDC)
IASSIST Session
• IASSIST SIGDC has proposed a session as part of a
data citation track including DataCite:
• Tracking Data Reuse: Motivations, Methods, and
Obstacles -- Heather Piwowar, NESCent, University of
British Columbia
• Building Data Citations for Discovery – Hailey
Mooney, Michigan State University, and Mark Newton,
Purdue University
• ICPSR’s Efforts to Encourage Data Citation -Elizabeth Moss, Inter-university Consortium for Political
and Social Research (ICPSR)
• Reactor Panel from SIGDC
Issues to Resolve
• With the community, address situations
when data resources have multiple
distributors (and multiple DOIs)
• Implement versioning in DOIs
• Address level of granularity for DOIs
• Move to DataCite
Multiple DOIs for “Same” Data
• Eurobarometer 72.2 (Nuclear Energy, Corruption,
Gender Equality, Healthcare, and Civil Protection)
DOI: doi:10.4232/1.10009
Principal Investigator: Antonis Papacostas
Publication Agent: GESIS - Leibniz-Institut für
Sozialwissenschaften
• Papacostas, Antonis. Eurobarometer 72.2: Nuclear
Energy, Corruption, Gender Equality, Healthcare, and
Civil Protection, September-October 2009 [Computer
file]. ICPSR28186-v1. Cologne, Germany: GESIS/Ann
Arbor, MI: Inter-university Consortium for Political and
Social Research [distributors], 2010-07-19.
doi:10.3886/ICPSR28186
From CrossRef’s Publisher
Rules:
“CrossRef only registers DOIs for Definitive Works…
but not for Duplicative Works, as defined in the
CrossRef Glossary. …CrossRef does not permit
multiple DOIs to be assigned to certain closely related
versions of a work… Where a CrossRef member has
content which is substantially Duplicative of Definitive
Works, the member must … retrieve the DOIs of
Definitive Works for display in such substantially
Duplicative Works and must link from the
substantially Duplicative Works to the Definitive
Works.”
More on Multiple DOIs
• CrossRef policy oriented toward publications not data
• Arrangement between ICPSR and GESIS is clear, but
there are other co-distributor relationships
• How much of a problem is this and can we develop a
community solution?
• Can we use the DataCite metadata kernel
(relationType) to specify relationships?
• Would providing explanatory text and crossreferencing DOIs in archives’ metadata records be
useful?
Versioning and DOIs
• ICPSR has decided to add version numbers to
its DOIs
• ICPSR may not have previous versions online
• User will have to contact ICPSR for access
• So far the number of users requesting older
versions has been very small
Level of Granularity for DOIs
• ICPSR’s current practice is to assign the DOI at
the study level
• DOI resolves to the study homepage, which
includes Version History detailing changes to
all files in the collection
• Assigning dataset-level DOIs is a challenge
because ICPSR has over 65,000 datasets
• ICPSR is undertaking a large project to revamp
archival management and dataset-level DOIs
will be integrated in the new infrastructure
Moving to DataCite for DOIs
• DataCite offers several advantages
because of its focus on data
• Metadata kernel more robust and
intended to describe data
• Community of trusted data centers is a
shared goal
Future Directions
• Address situations when data resources have
multiple distributors and multiple DOIs
• Approach other vendors including Google Scholar
after TR service deployed
• Contact other professional associations and journals
• Work with other data producers on providing visible
citations and DOIs and encouraging their use
• Continue spreading the word about data citation
and persistent identifiers!
Thank you…
Questions?
Download