data management plans & the data citation index

advertisement
DATA MANAGEMENT PLANS & THE
DATA CITATION INDEX
NIGEL ROBINSON
19 MAY 2014
THE INCREASING VISIBILITY OF DATA
• Grant funding agencies
• Journal publishers
• Publisher website
• Electronic articles
• Data repositories &
registration agencies
©2010 Thomson Reuters
“Data is the new gold”
– Neellie Kroes, EU Digital Agenda Commissioner
DIGITAL SCHOLARSHIP
Digital
Scholarship
©2010 Thomson Reuters
Interested
Parties
Content
• Very visible within the literature as a concept
• Articles, projects, university labs all devoted
to digital scholarship in various ways
•
•
•
•
•
Authors/researchers
Research administrators
Librarians, data archivists
Publishers
Grant funding organizations
• Discipline-specific and multidisciplinary
content
• Needs and requirements vary by discipline
• Diverse content formats, with few standards
OVERVIEW
• Emerging landscape
• Opportunities
©2010 Thomson Reuters
• Citation and attribution
THE EMERGENCE OF FUNDING
MANDATES
NIH (2003) Data Sharing Policy that all
funding applications of $500,000 or more per
year are expected to address data-sharing
in their application.
©2010 Thomson Reuters
NSF (2011) All funding proposals submitted
on or after January 18, 2011, must include a
“Data Management Plan” describing how the
proposal will conform to NSF policy on the
dissemination and sharing of research results.
DATA MANAGEMENT REQUIREMENTS
EXTEND ACROSS THE GLOBE
©2010 Thomson Reuters
Aug 2011… “expectation that all our
funded researchers should maximise
access to their research data with as
few restrictions as possible. …. submit a
data management and sharing plan as
part of the application process.”
2007… “Researchers are to retain
research data and primary materials,
manage storage of research data and
primary materials, maintain confidentiality
of research data and primary materials.”
©2010 Thomson Reuters
DATA MANAGEMENT REQUIREMENTS
EXTEND ACROSS THE GLOBE
• “A further new element in Horizon 2020
is the use of Data Management Plans
(DMPs) detailing what data the project
will generate, whether and how it will be
exploited or made accessible for
verification and re-use, and how it will
be curated and preserved. The use of a
Data Management Plan is required for
projects participating in the Open
Research Data Pilot. Other projects are
invited to submit a Data Management
Plan if relevant for their planned
research.”
©2010 Thomson Reuters
IMPACT ON RESEARCH LIBRARIES
8
FUNDING MANDATES BECOMING
STRONGER
©2010 Thomson Reuters
January 14, 2013… “failure to provide the
requisite Data Management Plan will result in
the application being rejected or terminated.”
WHY SHARE DATA?
• Verification - Findings can be verified
• Extend original findings, address new
questions
• Reduce costs
• Training – data reuse
©2010 Thomson Reuters
• Increased primary publications
A. Pienta, G. Alter, J. Lyle (2010). The Enduring Value of Social
Science Research: The Use and Reuse of Primary Research Data.
http://hdl.handle.net/2027.42/78307
Data sharing leads to more science &
more knowledge
DATA SHARING
©2010 Thomson Reuters
• The level and
timing of data
sharing varies
28% only
share prior
to
publication
35% only
share after
publication
25% share before
and after
publication
INCREASED CITATION WITH SHARED
DATA
Bibliometrics
©2010 Thomson Reuters
35% to 69% more
citations
courtesy of Jon Sears (AGU)
Piwowar HA, Day RS, Fridsma DB (2007) Sharing
Detailed Research Data Is Associated with
Increased Citation Rate. PLoS ONE 2(3): e308.
doi:10.1371/journal.pone.0000308
DEPOSITION OF DATA BY RESEARCHERS
Publisher website
24%
Repository managed by a
third party (e.g, domain-…
36%
Department or institutional
repository
47%
Personal website
©2010 Thomson Reuters
Other
51%
17%
Q16. Where do you place your non-traditional scholarly output to
make it available to others? (n=471)
13
RESEARCHERS NOT RECEIVING CREDIT
Barriers to creating and
sharing data:
• Researchers are hesitant to spend
time and effort to create and share
data because they don’t feel the
work is adequately exposed or
accredited
©2010 Thomson Reuters
•Researchers find it difficult to
expose data they have produced
because data repositories do not
have clear standards or
mechanisms in place for doing so
14
RESEARCHER PROBLEMS
• Access & discovery
• Citation standards
• Lack of willingness to deposit and cite
©2010 Thomson Reuters
• Lack of recognition / credit
DATA MANAGEMENT PLAN BENEFITS
Data deposit
Active
©2010 Thomson Reuters
Persistence
Data reuse
• Repository must hold data
• Repository must provide access to data
• Material added/updated
• Provide statistics on deposited data
• Actively curate data in the archive
• Persistent IDs, DOIs or other permanent ID
• Contacts available for confirmation of interpretation
• Indication of intention to preserve data or provide
access over the long term
• Contingency if repository was to cease to operate
• Make data accessible (or state licensing terms)
• Sustainable
• Funding information available for repository and
deposited data
• Links to literature
• Citation in literature databases
CHALLENGES
• Metadata
– Resources
– Expertise
• Citable data source
• Metadata quality
– Unique & persistent identifiers
– Consistency
• Data repositories are not static
©2010 Thomson Reuters
– How is version control handled?
• Partnerships
DATA CITATION
Current citation style
(in full text of article as informal citations)
Desired/future citation style
(as formally cited references)
©2010 Thomson Reuters
U.S. Dept. of Justice, Bureau of Justice Statistics
(1996): MURDER CASES IN 33 LARGE URBAN
COUNTIES IN THE UNITED STATES, 1988.
Version 1. Inter-university Consortium for Political
and Social Research.
http://dx.doi.org/10.3886/ICPSR09907.v1
Lee, Seung-Jae; Lee, He-Jin; Cho, Ji-Hoon; Rho,
Sangchul; Hwang, Daehee (2008): GSE11574: The
responses of astrocytes stimulated by extracellular asynuclein. Gene Expression Omnibus.
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=G
SE11574
DATA CITATION
Lee, Seung-Jae; Lee, He-Jin; Cho, Ji-Hoon; Rho,
Sangchul; Hwang, Daehee (2008): GSE11574: The
responses of astrocytes stimulated by extracellular asynuclein. Gene Expression Omnibus.
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=G
SE11574
©2010 Thomson Reuters
Published data sets
Data
Citation
Index
Scientific literature
New data metrics
DATA CITATION INDEX AIMS
• Enable the discovery of data
repositories, data studies and data
sets in the context of traditional
literature
• Link data to research publications
• Help researchers find data sets and
studies and track the full impact of
their research output
©2010 Thomson Reuters
• Provide expanded measurement of
researcher and institutional research
output and assessment
• Facilitate more accurate and
comprehensive bibliometric analyses
Launched October 2012
4M data records
INDEXING A DATA REPOSITORY
ON WEB OF SCIENCE
Record Types
Descriptive
metadata
feed from
repository
• Repository/Source: Comprises data
studies, data sets and/or microcitations.
Stores and provides access to the raw
data.
Repository
raw
metadata is
analysed
• Data Study: Descriptions of studies or
experiments with associated data which
have been used in the data study.
Includes serial or longitudinal studies
over time.
Metadata
added
• Data Set: A single or coherent set of
data or a data file provided by the
repository, as part of a collection, data
study or experiment.
Repository
Data study
• Microcitation: (nanopublication) An
assertion about concepts that have
been found to be linked by scientific
enquiry, and can be uniquely identified
and attributed to its author. Made up of
three separate parts: a subject, a
predicate and an object.
©2010 Thomson Reuters
Data set
Microcitation
21
©2010 Thomson Reuters
Search Results within the
Data Citation Index
present the powerful Web
of Knowledge options for
exploring a body of
information. Data
becomes discoverable
alongside literature
Data deposition makes it
possible to show related data
from the repository
Because data are
accessible and able to be
cited, they can be linked
to publications describing
research which uses them
Link out directly to the
original item, in this case
a Data Study.
Start to build citation
maps associated with
data through the
association of data and
literature
Provide assistance in how
to associate data and
literature through citation
DATA CITATION INDEX & DATA
MANAGEMENT PLANS
• Discovery of data most important to scholarly
research
• Data linked to published research literature
• Measures of data citation, use and reuse with
attribution assisted by identifiers
©2010 Thomson Reuters
• New metrics for digital scholarship
Thank you
Nigel Robinson
©2010 Thomson Reuters
nigel.robinson@thomsonreuters.com
REPOSITORY SELECTION & EVALUATION
As we evaluate repositories for
inclusion, some of the things we
consider are:
• Editorial Content - ensuring that
material is desirable to the
research community.
©2010 Thomson Reuters
• Persistence and stability of the
repository, with a steady flow of
new information.
• Thoroughness and detail of
descriptive information.
• Links from data to research
literature.
DATA REPOSITORIES
©2010 Thomson Reuters
• Over 1000 repositories identified
TYPES OF DATA BY DISCIPLINE
ART & HUMANITIES
SOCIAL SCIENCES
SCIENCE &
TECHNOLOGY
CULTURAL
HERITAGE
POLL DATA
MAPS
LANGUAGE CORPUS
ECONOMIC
STATISTICS
IMAGE
COLLECTIONS
LONGITUDINAL DATA
NATIONAL CENSUS
RECORDINGS
©2010 Thomson Reuters
PUBLIC OPINION
SURVEYS
ALGORITHMS
GENOMICS
SKY SURVEYS
ASTROPHYSICS
REMOTE SENSING
MUSEUM SPECIMENS
Download