Using assessment of NSF data management plans to enable evidence-based evolution of research data services Amanda Whitmire, Jake Carlson, Patricia Hswe, Susan Wells Parham, Lizzy Rolando & Brian Westra @DMPResearch Acknowledgements Jake Carlson ─ University of Michigan Library Patricia M. Hswe ─ Pennsylvania State University Libraries Susan Wells Parham ─ Georgia Institute of Technology Library Lizzy Rolando ─ Georgia Institute of Technology Library Brian Westra ─ University of Oregon Libraries This project was made possible in part by the Institute of Museum and Library Services grant number LG-07-13-0328. 23 April 2015 2 23 April 2015 3 23 April 2015 4 23 April 2015 5 23 April 2015 6 23 April 2015 7 levels of data services high level mid-level the basics 23 April 2015 infrastructure metadata support DMP review data curation facilitate deposit in DRs consults website dedicated “research services” workshops From: Reznik-Zellen, Rebecca C.; Adamick, Jessica; and McGinty, Stephen. (2012). "Tiers of Research Data Support Services." Journal of eScience Librarianship 1(1): Article 5. http://dx.doi.org/10.7191/jeslib.2012.1002 8 Informed data services development surveys 23 April 2015 9 Informed data services development data curation profiles 23 April 2015 10 Informed data services development DMP data mgmt. plans 23 April 2015 11 DART Premise Research Data Management DMP knowledge capabilities practices needs researcher 23 April 2015 12 DART Premise Research Data Management knowledge capabilities practices needs 23 April 2015 Research Data Services 13 “Of the 181 NSF DMPs that were analyzed, 39 (22%) identified Georgia Tech’s institutional repository, SMARTech.” “We have a clear road ahead of us: we will target specific schools for outreach; develop consistent language about repository services for research data; and focus on the widespread dissemination of information about our new digital preservation strategy.” 23 April 2015 14 We need a tool 23 April 2015 15 Solution: An analytic rubric Performance Criteria Performance Levels 23 April 2015 High Medium Low Thing 1 Thing 2 Thing 3 16 Literature review on creating & using analytic rubrics 23 April 2015 17 NSF-tangent & 3rd-party DMP guidance 23 April 2015 18 NSF DMP guidance 23 April 2015 19 * * * * NSF Directorate or Division BIO DBI DEB EF IOS MCB CISE ACI CCF CNS IIS EHR DGE DRL Information & Intelligent Systems Education & Human Resources Division of Graduate Education Research on Learning in Formal & Informal Settings Undergraduate Education HRD Human Resources Development ENG CBET CMMI ECCS EEC EFRI Engineering Chemical, Bioengineering, Environmental, & Transport Systems Civil, Mechanical & Manufacturing Innovation Electrical, Communications & Cyber Systems Engineering Education & Centers Emerging Frontiers in Research & Innovation IIP Industrial Innovation & Partnerships GEO AGS EAR OCE Geosciences Atmospheric & Geospace Sciences Earth Sciences Ocean Sciences MPS AST CHE DMR 23 April 2015 Molecular & Cellular Biosciences Computer & Information Science & Engineering Advanced Cyberinfrastructure Computing & Communication Foundations Computer & Network Systems DUE PLR * Biological Sciences Biological Infrastructure Environmental Biology Emerging Frontiers Office Integrative Organismal Systems Polar Programs Mathematical & Physical Sciences Astronomical Sciences Chemistry Materials Research DMS Mathematical Sciences PHY Physics SBE ******** division-specific guidance Social, Behavioral & Economic Sciences BCS Behavioral & Cognitive Sciences SES Social & Economic Sciences 20 Consolidated guidance Source Guidance text NSF guidelines The standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies) BIO Describe the data that will be collected, and the data and metadata formats and standards used. CSE The DMP should cover the following, as appropriate for the project: ...other types of information that would be maintained and shared regarding data, e.g. the means by which it was generated, detailed analytical and procedural information required to reproduce experimental results, and other metadata ENG Data formats and dissemination. The DMP should describe the specific data formats, media, and dissemination approaches that will be used to make data available to others, including any metadata GEO AGS 23 April 2015 Data Format: Describe the format in which the data or products are stored (e.g. hardcopy logs and/or instrument outputs, ASCII, XML files, HDF5, CDF, etc). 21 Project team testing & revisions Feedback & iteration Rubric Advisory Board 23 April 2015 22 Directorate- or divisionspecific assessment criteria General Assessment Criteria Performance Level Addressed issue, but incomplete Did not address issue Performance Criteria Complete / detailed Describes what types of data will be captured, created or collected Clearly defines data type(s). E.g. text, spreadsheets, images, 3D models, software, audio files, video files, reports, surveys, patient records, samples, final or intermediate numerical results from theoretical calculations, etc. Also defines data as: observational, experimental, simulation, model output or assimilation Some details about data types are included, but DMP is missing details or wouldn’t be well understood by someone outside of the project No details included, fails to adequately describe data types. All Describes how data will be collected, captured, or created (whether new observations, results from models, reuse of other data, etc.) Clearly defines how data will be captured or created, including methods, instruments, software, or infrastructure where relevant. Missing some details regarding how some of the data will be produced, makes assumptions about reviewer knowledge of methods or practices. Does not clearly address how data will be captured or created. GEO AGS, GEO EAR SGP, MPS AST Identifies how much data (volume) will be produced Amount of expected data (MB, GB, TB, etc.) is clearly specified. Amount of expected data (GB, TB, etc.) is vaguely specified. Amount of expected data (GB, TB, etc.) is NOT specified. GEO EAR SGP, GEO AGS 23 April 2015 Directorates 23 Directorate- or divisionspecific assessment criteria General Assessment Criteria Performance Level Addressed issue, but incomplete Did not address issue Performance Criteria Complete / detailed Describes what types of data will be captured, created or collected Clearly defines data type(s). E.g. text, spreadsheets, images, 3D models, software, audio files, video files, reports, surveys, patient records, samples, final or intermediate numerical results from theoretical calculations, etc. Also defines data as: observational, experimental, simulation, model output or assimilation Some details about data types are included, but DMP is missing details or wouldn’t be well understood by someone outside of the project No details included, fails to adequately describe data types. All Describes how data will be collected, captured, or created (whether new observations, results from models, reuse of other data, etc.) Clearly defines how data will be captured or created, including methods, instruments, software, or infrastructure where relevant. Missing some details regarding how some of the data will be produced, makes assumptions about reviewer knowledge of methods or practices. Does not clearly address how data will be captured or created. GEO AGS, GEO EAR SGP, MPS AST Identifies how much data (volume) will be produced Amount of expected data (MB, GB, TB, etc.) is clearly specified. Amount of expected data (GB, TB, etc.) is vaguely specified. Amount of expected data (GB, TB, etc.) is NOT specified. GEO EAR SGP, GEO AGS 23 April 2015 Directorates 24 Directorate- or divisionspecific assessment criteria General Assessment Criteria Performance Level Addressed issue, but incomplete Did not address issue Performance Criteria Complete / detailed Describes what types of data will be captured, created or collected Clearly defines data type(s). E.g. text, spreadsheets, images, 3D models, software, audio files, video files, reports, surveys, patient records, samples, final or intermediate numerical results from theoretical calculations, etc. Also defines data as: observational, experimental, simulation, model output or assimilation Some details about data types are included, but DMP is missing details or wouldn’t be well understood by someone outside of the project No details included, fails to adequately describe data types. All Describes how data will be collected, captured, or created (whether new observations, results from models, reuse of other data, etc.) Clearly defines how data will be captured or created, including methods, instruments, software, or infrastructure where relevant. Missing some details regarding how some of the data will be produced, makes assumptions about reviewer knowledge of methods or practices. Does not clearly address how data will be captured or created. GEO AGS, GEO EAR SGP, MPS AST Identifies how much data (volume) will be produced Amount of expected data (MB, GB, TB, etc.) is clearly specified. Amount of expected data (GB, TB, etc.) is vaguely specified. Amount of expected data (GB, TB, etc.) is NOT specified. GEO EAR SGP, GEO AGS 23 April 2015 Directorates 25 Directorate- or divisionspecific assessment criteria General Assessment Criteria Performance Level Addressed issue, but incomplete Did not address issue Performance Criteria Complete / detailed Describes what types of data will be captured, created or collected Clearly defines data type(s). E.g. text, spreadsheets, images, 3D models, software, audio files, video files, reports, surveys, patient records, samples, final or intermediate numerical results from theoretical calculations, etc. Also defines data as: observational, experimental, simulation, model output or assimilation Some details about data types are included, but DMP is missing details or wouldn’t be well understood by someone outside of the project No details included, fails to adequately describe data types. All Describes how data will be collected, captured, or created (whether new observations, results from models, reuse of other data, etc.) Clearly defines how data will be captured or created, including methods, instruments, software, or infrastructure where relevant. Missing some details regarding how some of the data will be produced, makes assumptions about reviewer knowledge of methods or practices. Does not clearly address how data will be captured or created. GEO AGS, GEO EAR SGP, MPS AST Identifies how much data (volume) will be produced Amount of expected data (MB, GB, TB, etc.) is clearly specified. Amount of expected data (GB, TB, etc.) is vaguely specified. Amount of expected data (GB, TB, etc.) is NOT specified. GEO EAR SGP, GEO AGS 23 April 2015 Directorates 26 Directorate- or divisionspecific assessment criteria General Assessment Criteria Performance Level Addressed issue, but incomplete Did not address issue Performance Criteria Complete / detailed Describes what types of data will be captured, created or collected Clearly defines data type(s). E.g. text, spreadsheets, images, 3D models, software, audio files, video files, reports, surveys, patient records, samples, final or intermediate numerical results from theoretical calculations, etc. Also defines data as: observational, experimental, simulation, model output or assimilation Some details about data types are included, but DMP is missing details or wouldn’t be well understood by someone outside of the project No details included, fails to adequately describe data types. All Describes how data will be collected, captured, or created (whether new observations, results from models, reuse of other data, etc.) Clearly defines how data will be captured or created, including methods, instruments, software, or infrastructure where relevant. Missing some details regarding how some of the data will be produced, makes assumptions about reviewer knowledge of methods or practices. Does not clearly address how data will be captured or created. GEO AGS, GEO EAR SGP, MPS AST Identifies how much data (volume) will be produced Amount of expected data (MB, GB, TB, etc.) is clearly specified. Amount of expected data (GB, TB, etc.) is vaguely specified. Amount of expected data (GB, TB, etc.) is NOT specified. GEO EAR SGP, GEO AGS 23 April 2015 Directorates 27 Directorate- or divisionspecific assessment criteria General Assessment Criteria Performance Level Addressed issue, but incomplete Did not address issue Performance Criteria Complete / detailed Describes what types of data will be captured, created or collected Clearly defines data type(s). E.g. text, spreadsheets, images, 3D models, software, audio files, video files, reports, surveys, patient records, samples, final or intermediate numerical results from theoretical calculations, etc. Also defines data as: observational, experimental, simulation, model output or assimilation Some details about data types are included, but DMP is missing details or wouldn’t be well understood by someone outside of the project No details included, fails to adequately describe data types. All Describes how data will be collected, captured, or created (whether new observations, results from models, reuse of other data, etc.) Clearly defines how data will be captured or created, including methods, instruments, software, or infrastructure where relevant. Missing some details regarding how some of the data will be produced, makes assumptions about reviewer knowledge of methods or practices. Does not clearly address how data will be captured or created. GEO AGS, GEO EAR SGP, MPS AST Identifies how much data (volume) will be produced Amount of expected data (MB, GB, TB, etc.) is clearly specified. Amount of expected data (GB, TB, etc.) is vaguely specified. Amount of expected data (GB, TB, etc.) is NOT specified. GEO EAR SGP, GEO AGS 23 April 2015 Directorates 28 “Mini-reviews 1 & 2” 29 23 April 2015 30 Complete / detailed Addressed issue, but incomplete Describes what types of data will be captured, created or collected 18 Identifies metadata standards or formats that will used for the proposed project 4 Describes data formats created or used during project 14 Provides details on when the data will be made publicly available 8 Describes how the data will be made publicly available 22 Describes security measures that will be in place to protect the data from unauthorized access 8 Describes the policies or provisions in place governing the use and reuse of the data 5 Describes the policies or provisions for redistribution of the data 4 Describes policies or provisions for building off of the data, such as through the creation of derivatives 3 Indicates whether or not the data will be archived 17 Describes plans for archiving and preserving digital data* 12 Plan discusses the types or formats of data the investigator expects to retain in their possession* 23 April 2015 1 Did not address the issue 3 4 4 17 4 7 6 11 2 1 1 16 10 10 7 14 7 15 5 6 2 3 4 7 31 data sharing methods Not planning to share data 0 Conference / proceedings ETD 3 1 On request 9 Personal website 8 Book Other data repository or method 1 7 National data center 3 Journal / supplement Institutional repository 10 4 Did not specify 0 0 23 April 2015 2 4 6 8 10 12 32 To sum up… Developing a rubric to empower academic librarians in providing research data support http://bit.ly/dmpresearch @DMPResearch 33 34