DATA MANAGEMENT PLANS & THE DATA CITATION INDEX NIGEL ROBINSON 19 MAY 2014 THE INCREASING VISIBILITY OF DATA • Grant funding agencies • Journal publishers • Publisher website • Electronic articles • Data repositories & registration agencies ©2010 Thomson Reuters “Data is the new gold” – Neellie Kroes, EU Digital Agenda Commissioner DIGITAL SCHOLARSHIP Digital Scholarship ©2010 Thomson Reuters Interested Parties Content • Very visible within the literature as a concept • Articles, projects, university labs all devoted to digital scholarship in various ways • • • • • Authors/researchers Research administrators Librarians, data archivists Publishers Grant funding organizations • Discipline-specific and multidisciplinary content • Needs and requirements vary by discipline • Diverse content formats, with few standards OVERVIEW • Emerging landscape • Opportunities ©2010 Thomson Reuters • Citation and attribution THE EMERGENCE OF FUNDING MANDATES NIH (2003) Data Sharing Policy that all funding applications of $500,000 or more per year are expected to address data-sharing in their application. ©2010 Thomson Reuters NSF (2011) All funding proposals submitted on or after January 18, 2011, must include a “Data Management Plan” describing how the proposal will conform to NSF policy on the dissemination and sharing of research results. DATA MANAGEMENT REQUIREMENTS EXTEND ACROSS THE GLOBE ©2010 Thomson Reuters Aug 2011… “expectation that all our funded researchers should maximise access to their research data with as few restrictions as possible. …. submit a data management and sharing plan as part of the application process.” 2007… “Researchers are to retain research data and primary materials, manage storage of research data and primary materials, maintain confidentiality of research data and primary materials.” ©2010 Thomson Reuters DATA MANAGEMENT REQUIREMENTS EXTEND ACROSS THE GLOBE • “A further new element in Horizon 2020 is the use of Data Management Plans (DMPs) detailing what data the project will generate, whether and how it will be exploited or made accessible for verification and re-use, and how it will be curated and preserved. The use of a Data Management Plan is required for projects participating in the Open Research Data Pilot. Other projects are invited to submit a Data Management Plan if relevant for their planned research.” ©2010 Thomson Reuters IMPACT ON RESEARCH LIBRARIES 8 FUNDING MANDATES BECOMING STRONGER ©2010 Thomson Reuters January 14, 2013… “failure to provide the requisite Data Management Plan will result in the application being rejected or terminated.” WHY SHARE DATA? • Verification - Findings can be verified • Extend original findings, address new questions • Reduce costs • Training – data reuse ©2010 Thomson Reuters • Increased primary publications A. Pienta, G. Alter, J. Lyle (2010). The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data. http://hdl.handle.net/2027.42/78307 Data sharing leads to more science & more knowledge DATA SHARING ©2010 Thomson Reuters • The level and timing of data sharing varies 28% only share prior to publication 35% only share after publication 25% share before and after publication INCREASED CITATION WITH SHARED DATA Bibliometrics ©2010 Thomson Reuters 35% to 69% more citations courtesy of Jon Sears (AGU) Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308 DEPOSITION OF DATA BY RESEARCHERS Publisher website 24% Repository managed by a third party (e.g, domain-… 36% Department or institutional repository 47% Personal website ©2010 Thomson Reuters Other 51% 17% Q16. Where do you place your non-traditional scholarly output to make it available to others? (n=471) 13 RESEARCHERS NOT RECEIVING CREDIT Barriers to creating and sharing data: • Researchers are hesitant to spend time and effort to create and share data because they don’t feel the work is adequately exposed or accredited ©2010 Thomson Reuters •Researchers find it difficult to expose data they have produced because data repositories do not have clear standards or mechanisms in place for doing so 14 RESEARCHER PROBLEMS • Access & discovery • Citation standards • Lack of willingness to deposit and cite ©2010 Thomson Reuters • Lack of recognition / credit DATA MANAGEMENT PLAN BENEFITS Data deposit Active ©2010 Thomson Reuters Persistence Data reuse • Repository must hold data • Repository must provide access to data • Material added/updated • Provide statistics on deposited data • Actively curate data in the archive • Persistent IDs, DOIs or other permanent ID • Contacts available for confirmation of interpretation • Indication of intention to preserve data or provide access over the long term • Contingency if repository was to cease to operate • Make data accessible (or state licensing terms) • Sustainable • Funding information available for repository and deposited data • Links to literature • Citation in literature databases CHALLENGES • Metadata – Resources – Expertise • Citable data source • Metadata quality – Unique & persistent identifiers – Consistency • Data repositories are not static ©2010 Thomson Reuters – How is version control handled? • Partnerships DATA CITATION Current citation style (in full text of article as informal citations) Desired/future citation style (as formally cited references) ©2010 Thomson Reuters U.S. Dept. of Justice, Bureau of Justice Statistics (1996): MURDER CASES IN 33 LARGE URBAN COUNTIES IN THE UNITED STATES, 1988. Version 1. Inter-university Consortium for Political and Social Research. http://dx.doi.org/10.3886/ICPSR09907.v1 Lee, Seung-Jae; Lee, He-Jin; Cho, Ji-Hoon; Rho, Sangchul; Hwang, Daehee (2008): GSE11574: The responses of astrocytes stimulated by extracellular asynuclein. Gene Expression Omnibus. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=G SE11574 DATA CITATION Lee, Seung-Jae; Lee, He-Jin; Cho, Ji-Hoon; Rho, Sangchul; Hwang, Daehee (2008): GSE11574: The responses of astrocytes stimulated by extracellular asynuclein. Gene Expression Omnibus. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=G SE11574 ©2010 Thomson Reuters Published data sets Data Citation Index Scientific literature New data metrics DATA CITATION INDEX AIMS • Enable the discovery of data repositories, data studies and data sets in the context of traditional literature • Link data to research publications • Help researchers find data sets and studies and track the full impact of their research output ©2010 Thomson Reuters • Provide expanded measurement of researcher and institutional research output and assessment • Facilitate more accurate and comprehensive bibliometric analyses Launched October 2012 4M data records INDEXING A DATA REPOSITORY ON WEB OF SCIENCE Record Types Descriptive metadata feed from repository • Repository/Source: Comprises data studies, data sets and/or microcitations. Stores and provides access to the raw data. Repository raw metadata is analysed • Data Study: Descriptions of studies or experiments with associated data which have been used in the data study. Includes serial or longitudinal studies over time. Metadata added • Data Set: A single or coherent set of data or a data file provided by the repository, as part of a collection, data study or experiment. Repository Data study • Microcitation: (nanopublication) An assertion about concepts that have been found to be linked by scientific enquiry, and can be uniquely identified and attributed to its author. Made up of three separate parts: a subject, a predicate and an object. ©2010 Thomson Reuters Data set Microcitation 21 ©2010 Thomson Reuters Search Results within the Data Citation Index present the powerful Web of Knowledge options for exploring a body of information. Data becomes discoverable alongside literature Data deposition makes it possible to show related data from the repository Because data are accessible and able to be cited, they can be linked to publications describing research which uses them Link out directly to the original item, in this case a Data Study. Start to build citation maps associated with data through the association of data and literature Provide assistance in how to associate data and literature through citation DATA CITATION INDEX & DATA MANAGEMENT PLANS • Discovery of data most important to scholarly research • Data linked to published research literature • Measures of data citation, use and reuse with attribution assisted by identifiers ©2010 Thomson Reuters • New metrics for digital scholarship Thank you Nigel Robinson ©2010 Thomson Reuters nigel.robinson@thomsonreuters.com REPOSITORY SELECTION & EVALUATION As we evaluate repositories for inclusion, some of the things we consider are: • Editorial Content - ensuring that material is desirable to the research community. ©2010 Thomson Reuters • Persistence and stability of the repository, with a steady flow of new information. • Thoroughness and detail of descriptive information. • Links from data to research literature. DATA REPOSITORIES ©2010 Thomson Reuters • Over 1000 repositories identified TYPES OF DATA BY DISCIPLINE ART & HUMANITIES SOCIAL SCIENCES SCIENCE & TECHNOLOGY CULTURAL HERITAGE POLL DATA MAPS LANGUAGE CORPUS ECONOMIC STATISTICS IMAGE COLLECTIONS LONGITUDINAL DATA NATIONAL CENSUS RECORDINGS ©2010 Thomson Reuters PUBLIC OPINION SURVEYS ALGORITHMS GENOMICS SKY SURVEYS ASTROPHYSICS REMOTE SENSING MUSEUM SPECIMENS