Introduction to DataCite Adam Farquhar, PhD Head of Digital Library Technology, The British Library President, DataCite June, 2010 The British Library Exists for everyone who wants to do research – for academic, personal, and commercial purposes. Covers all subject areas – sciences, technology, medicine, arts, humanities, social sciences… Receives a copy of every item published in the UK. Holds over 150 million items, with 3 million items added each year. Used by over 16,000 people each day (on site and online). 2 Data and the Digital Landscape Seismic measurements taken by a geologist. Genetic data collected by a medical researcher. A survey of public opinions collected by a sociologist. 3 Data: The Foundation of Research Data is a crucial component of the scholarly record Re-acquisition may be impossible Datasets are essential to the British Library’s mission to advance the World’s knowledge 4 Widening Gap Articles Underlying data No effective way to link between datasets and articles No widely used method to identify datasets No widely used method to cite datasets 5 As a result… Datasets are Difficult to discover Difficult to access Being lost 6 Datasets – First Class Citizens? Data is difficult to manage after project funding ceases Informal networks provide the primary means of sharing Only 21% use a national or international facility Datasets are not included in impact analysis Good luck finding it or getting permission to use it (your discipline may vary) Source: UKRDS Study 7 DataCite – An Award Winning Global Consortium DataCite aims to: Establish easier access to scientific research data Increase acceptance of research data Support archiving of data for verification and re-use 8 DataCite – Supporting the Research Community DataCite: Supports researchers by enabling them to locate, identify, and cite research datasets with confidence Supports data centres by providing persistent identifiers for datasets, workflows and standards for data publication Supports publishers by enabling research articles to be linked to the underlying data 9 DataCite uses DOIs for Data: DataCite : Data Centres :: CrossRef : Publishers URLs are not persistent (e.g. Wren JD: URL decay in MEDLINE- a 4-year follow-up study. Bioinformatics. 2008, Jun 1;24(11):1381-5). Digital Object Identifiers (DOIs) offer a solution Mostly widely used identifier for scientific articles Researchers, authors, publishers know how to use them Put datasets on the same playing field as articles Dataset Yancheva et al (2007). Analyses on sediment of Lake Maar. PANGAEA. doi:10.1594/PANGAEA.587840 10 Membership AUS Australian National Data Service (ANDS) CAN Canada Institute for Scientific and Technical Information Library of the ETH Zurich CH Technical Information Center of Denmark DK Institute for Scientific and Technical Information FR GER German National Library of Science and Technology (TIB) German National Library of Medicine (ZB MED) GESIS - Leibniz Institute for Social Science TU Delft Library NL The British Library UK USA California Digital Library (CDL) Purdue University Libraries UK USA From Canada to Australia Currently twelve members across nine countries Over 800,000 records registered with DOI names so far Associated Members Digital Curation Centre Microsoft Research 11 Rapid Progress Builds on Foundational Work 05 TIB begins to issue DOIs for datasets 03. 09 12. 09 Paris DataCite Memorandum Association founded in London 7 members 06. 10 12 members All members assigned DOIs Over 800,000 items registered Pilot projects with Data Centres 12. 10 Production services with Data Centres Shared technical infrastructure Integrated services with key partners 12 DataCite – Roles and Responsibilities The DataCite registration agency Maintains the resolution infrastructure Maintains a searchable database of metadata Manages identifiers over the long term Establishes and shares best practice Publishing agents (data centres, research institutes, publishers) are responsible for Quality assurance Content storage and access Creating the identifier Creating and updating metadata 13 DataCite Structure International DOI Foundation Global Handle System Member DataCite Member Institution Member Institution Works with … DataCentre Centre Data Data Centre Associate Stakeholder DataCentre Centre Data Data Centre 14 Strengths and Weaknesses of DOI DOIs have some strong advantages Accepted by researchers and scientists Mature infrastructure Put datasets on the same playing field as articles But perceived as Expensive The current IDF business model favours larger registration agencies Publisher oriented The largest registration agency is the publisher-oriented CrossRef 15 The Cost of Visibility €0.01 – €1 €50 – €500 DOI Assignment Management Storage Quality Assurance Metadata Collection Production (approx 1% of data creation cost) €5,000 – €5,000,000 16 BL – Search Our Catalogue 17 DE Service – Elsevier Science Direct 18 Research Data in Articles 19 Publishing Primary Data 20 Rapidly Growing Ecosystem Microsoft works with CDL to embed DataCite into Excel plug-in UK National Sound Archive assigns DataCite DOIs to archival recordings Dryad integrates DataCite DOIs into publisher workflows for supplementary material and datasets in US ANDS integrates DataCite DOIs into dataset services Thieme Publishing Group uses DataCite DOIs to link articles and primary research data (at FIZ) Active discussions with key research information service providers and data centres 21 What Next? Require clear unambiguous citations for datasets Integrate links to datasets into delivery platforms Integrate into workflows for researchers, data centres, and publishers Collaborate to understand roles and responsibilities among publishers, data centres, and libraries Improve attribution and credit for data producers Roll out services DataCite supports researchers by enabling them to locate, identify, and cite research datasets with confidence We welcome your comments, questions, and ideas! Contact: www.datacite.org adam.farquhar {@} bl.uk jan.brase {@} tib.unihannover.de 22