Adam Farquhar, Introduction to DataCite President, DataCite

advertisement
Introduction to DataCite
Adam Farquhar, PhD
Head of Digital Library Technology, The British Library
President, DataCite
June, 2010
The British Library

Exists for everyone who wants to do
research – for academic, personal, and
commercial purposes.

Covers all subject areas – sciences,
technology, medicine, arts, humanities,
social sciences…

Receives a copy of every item
published in the UK.

Holds over 150 million items, with 3
million items added each year.

Used by over 16,000 people each day
(on site and online).
2
Data and the Digital Landscape

Seismic measurements taken by a
geologist.

Genetic data collected by a medical
researcher.

A survey of public opinions collected
by a sociologist.
3
Data: The Foundation of Research

Data is a crucial component of the scholarly record

Re-acquisition may be impossible

Datasets are essential to the British Library’s mission
to advance the World’s knowledge
4
Widening Gap
Articles
Underlying
data

No effective way to link
between datasets and
articles

No widely used method to
identify datasets

No widely used method to
cite datasets
5
As a result…
Datasets are
 Difficult to discover
 Difficult to access
 Being lost
6
Datasets – First Class Citizens?





Data is difficult to manage after
project funding ceases
Informal networks provide the
primary means of sharing
Only 21% use a national or
international facility
Datasets are not included in
impact analysis
Good luck finding it or getting
permission to use it (your
discipline may vary)
Source: UKRDS Study
7
DataCite – An Award Winning Global Consortium
DataCite aims to:

Establish easier access to scientific research data

Increase acceptance of research data

Support archiving of data for verification and re-use
8
DataCite – Supporting the Research Community
DataCite:

Supports researchers by enabling them to locate,
identify, and cite research datasets with
confidence

Supports data centres by providing persistent
identifiers for datasets, workflows and standards
for data publication

Supports publishers by enabling research articles
to be linked to the underlying data
9
DataCite uses DOIs for Data:
DataCite : Data Centres :: CrossRef : Publishers
URLs are not persistent
 (e.g. Wren JD: URL decay in
MEDLINE- a 4-year follow-up
study. Bioinformatics. 2008, Jun
1;24(11):1381-5).
Digital Object Identifiers (DOIs)
offer a solution
 Mostly widely used identifier for
scientific articles
 Researchers, authors, publishers
know how to use them
 Put datasets on the same playing
field as articles


Dataset
Yancheva et al (2007). Analyses
on sediment of Lake Maar.
PANGAEA.
doi:10.1594/PANGAEA.587840
10
Membership
AUS Australian National Data Service (ANDS)
CAN Canada Institute for Scientific and Technical
Information
Library of the ETH Zurich
CH
Technical Information Center of Denmark
DK
Institute for Scientific and Technical Information
FR
GER German National Library of Science and
Technology (TIB)
German National Library of Medicine (ZB MED)
GESIS - Leibniz Institute for Social Science
TU Delft Library
NL
The British Library
UK
USA California Digital Library (CDL)
Purdue University Libraries
UK
USA
From Canada to Australia
Currently twelve members
across nine countries
Over 800,000 records
registered with DOI names
so far
Associated Members
Digital Curation Centre
Microsoft Research
11
Rapid Progress Builds on Foundational Work
05

TIB begins
to issue
DOIs for
datasets
03.
09

12.
09
Paris
 DataCite
Memorandum Association
founded in
London
 7 members
06.
10
12 members 
 All members
assigned DOIs
 Over 800,000 
items
registered
 Pilot projects 
with Data
Centres

12.
10
Production
services with
Data Centres
Shared
technical
infrastructure
Integrated
services with
key partners
12
DataCite – Roles and Responsibilities
The DataCite registration agency
 Maintains the resolution infrastructure
 Maintains a searchable database of metadata
 Manages identifiers over the long term
 Establishes and shares best practice
Publishing agents (data centres, research institutes, publishers) are
responsible for
 Quality assurance
 Content storage and access
 Creating the identifier
 Creating and updating metadata
13
DataCite Structure
International DOI
Foundation
Global Handle
System
Member
DataCite
Member
Institution
Member
Institution
Works
with
…
DataCentre
Centre
Data
Data Centre
Associate
Stakeholder
DataCentre
Centre
Data
Data Centre
14
Strengths and Weaknesses of DOI
DOIs have some strong advantages
 Accepted by researchers and scientists
 Mature infrastructure
 Put datasets on the same playing field as articles
But perceived as
 Expensive
 The current IDF business model favours larger
registration agencies
 Publisher oriented
 The largest registration agency is the publisher-oriented
CrossRef
15
The Cost of Visibility

€0.01 – €1

€50 – €500
DOI Assignment
Management
Storage
Quality Assurance
Metadata
Collection
Production
(approx 1% of data creation cost)

€5,000 – €5,000,000
16
BL – Search Our Catalogue
17
DE Service – Elsevier Science Direct
18
Research Data in Articles
19
Publishing Primary Data
20
Rapidly Growing Ecosystem






Microsoft works with CDL to embed DataCite into Excel
plug-in
UK National Sound Archive assigns DataCite DOIs to
archival recordings
Dryad integrates DataCite DOIs into publisher workflows
for supplementary material and datasets in US
ANDS integrates DataCite DOIs into dataset services
Thieme Publishing Group uses DataCite DOIs to link
articles and primary research data (at FIZ)
Active discussions with key research information service
providers and data centres
21
What Next?






Require clear unambiguous
citations for datasets
Integrate links to datasets into
delivery platforms
Integrate into workflows for
researchers, data centres,
and publishers
Collaborate to understand
roles and responsibilities
among publishers, data
centres, and libraries
Improve attribution and credit
for data producers
Roll out services
DataCite supports researchers
by enabling them to locate,
identify, and cite research
datasets with confidence
We welcome your comments,
questions, and ideas!
Contact:
www.datacite.org
adam.farquhar {@} bl.uk
jan.brase {@} tib.unihannover.de
22
Download