USING DATA MANAGEMENT PLANS as a RESEARCH TOOL for IMPROVING DATA SERVICES in ACADEMIC LIBRARIES Amanda Whitmire Patricia Hswe & Brian Westra DLF Forum 2015 Vancouver, BC Canada 26-28 October 2015 Jake Carlson in absentia Susan Wells Parham D A R T Team DART Project | @DMPResearch Amanda Whitmire | @AWhitTwit Jake Carlson | @jrcarlso Patricia M. Hswe | @pmhswe Susan Wells Parham | Lizzy Rolando | Brian Westra | @bdwestra http://dmpresearch.library.oregonstate.edu 27 Oct 2015 2 D A R T Team Acknowledgements Amanda Whitmire | Oregon State University Libraries Jake Carlson | University of Michigan Library Patricia M. Hswe | Pennsylvania State University Libraries Susan Wells Parham | Georgia Institute of Technology Library Brian Westra | University of Oregon Libraries This project was made possible in part by the Institute of Museum and Library Services grant number LG-07-13-0328. 27 Oct 2015 3 @DMPResearch 27 Oct 2015 4 @DMPResearch 27 Oct 2015 5 @DMPResearch 27 Oct 2015 6 @DMPResearch 27 Oct 2015 7 transition slide @DMPResearch 27 Oct 2015 8 Levels of data services high level mid-level the basics infrastructure metadata support DMP review data curation facilitate deposit in DRs consults website dedicated “research services” workshops From: Reznik-Zellen, Rebecca C.; Adamick, Jessica; and McGinty, Stephen. (2012). "Tiers of Research Data Support Services." Journal of eScience Librarianship 1(1): Article 5. http://dx.doi.org/10.7191/jeslib.2012.1002 27 Oct 2015 9 @DMPResearch Informed data services development Survey @DMPResearch 27 Oct 2015 10 Informed data services development Survey DCPs @DMPResearch 27 Oct 2015 11 Informed data services development DMP Survey DCPs @DMPResearch DMPs 27 Oct 2015 12 DART Premise DMP researcher Research Data Management knowledge capabilities practices needs @DMPResearch 27 Oct 2015 13 @DMPResearch 27 Oct 2015 14 DART Premise Research Data Management knowledge capabilities Research Data Services practices needs @DMPResearch 27 Oct 2015 15 DART Premise @DMPResearch 27 Oct 2015 16 We need a tool @DMPResearch 27 Oct 2015 17 Solution: an analytic rubric Performance Criteria Performance Levels Winning Okay No Thing 1 Thing 2 Thing 3 @DMPResearch 27 Oct 2015 18 NSF Directorate or Division NSF Directorate or Division BIO ENG DBI DEB EF IOS MCB CISE ACI CCF CNS IIS EHR DGE DRL DUE HRD SBE BCS SES SMA Biological Sciences Biological Infrastructure Environmental Biology Emerging Frontiers Office Integrative Organismal Systems Molecular & Cellular Biosciences Computer & Information Science & Engineering Advanced Cyberinfrastructure Computing & Communication Foundations Computer & Network Systems Information & Intelligent Systems Education & Human Resources Division of Graduate Education Research on Learning in Formal & Informal Settings Undergraduate Education Human Resources Development Social, Behavioral & Economic Sciences Behavioral & Cognitive Sciences Social & Economic Sciences SBE Office of Multidisciplinary Activities Engineering CBET CMMI ECCS EEC EFRI IIP GEO Chemical, Bioengineering, Environmental, & Transport Systems Civil, Mechanical & Manufacturing Innovation Electrical, Communications & Cyber Systems Engineering Education & Centers Emerging Frontiers in Research & Innovation Industrial Innovation & Partnerships Geosciences AGS EAR OCE PLR MPS Atmospheric & Geospace Sciences Earth Sciences Ocean Sciences Polar Programs Mathematical & Physical Sciences AST CHE DMR DMS PHY @DMPResearch Astronomical Sciences Chemistry Materials Research Mathematical Sciences Physics 27 Oct 2015 19 Source Guidance text NSF guidelines The standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies) BIO Describe the data that will be collected, and the data and metadata formats and standards used. CSE The DMP should cover the following, as appropriate for the project: ...other types of information that would be maintained and shared regarding data, e.g. the means by which it was generated, detailed analytical and procedural information required to reproduce experimental results, and other metadata ENG Data formats and dissemination. The DMP should describe the specific data formats, media, and dissemination approaches that will be used to make data available to others, including any metadata GEO AGS Data Format: Describe the format in which the data or products are stored (e.g. hardcopy @DMPResearch 27 Oct 2015 20 Project team testing & revisions Feedback & iteration Rubric @DMPResearch Advisory Board 27 Oct 2015 21 Inter-rater reliability 5 June 2015 22 Performance Level Directorate- or divisionspecific assessment criteria General Assessment Criteria Performance Criteria Complete / detailed Addressed issue, but incomplete Did not address issue Directorates Describes what types of data will be captured, created or collected Clearly defines data type(s). E.g. text, spreadsheets, images, 3D models, software, audio files, video files, reports, surveys, patient records, samples, final or intermediate numerical results from theoretical calculations, etc. Also defines data as: observational, experimental, simulation, model output or assimilation Some details about data types are included, but DMP is missing details or wouldn’t be well understood by someone outside of the project No details included, fails to adequately describe data types. All Describes how data will be collected, captured, or created (whether new observations, results from models, reuse of other data, etc.) Clearly defines how data will be captured or created, including methods, instruments, software, or infrastructure where relevant. Missing some details regarding how some of the data will be produced, makes assumptions about reviewer knowledge of methods or practices. Does not clearly address how data will be captured or created. GEO AGS, GEO EAR SGP, MPS AST Identifies how much data (volume) will be produced Amount of expected data (MB, GB, TB, etc.) is clearly specified. Amount of expected data (GB, TB, etc.) is vaguely specified. Amount of expected data (GB, TB, etc.) is NOT specified. GEO EAR SGP, GEO AGS @DMPResearch 27 Oct 2015 23 @DMPResearch 27 Oct 2015 24 DMP text “The results of this research will be presented at major biological science conferences, including the Ecological Society of America meeting and the annual Soil Ecology Society meeting, and published in peer-reviewed journals. All data and sample access will be handled according to NSF data-sharing policies. Samples of soil strata will be stored appropriately for future use or for lending to other institutions.” Performance Criterion Provides details on when the data will be made publicly available Complete / detailed Clearly specifies when the data will be made available to people outside of the project. Addressed issue, but incomplete Vaguely specifies that the data will be made available outside of the project but does not include a date or specific time frame. @DMPResearch Did not address issue Does not specify when the data will be made available outside of the project. 27 Oct 2015 25 DMP text “The results of this research will be presented at major biological science conferences, including the Ecological Society of America meeting and the annual Soil Ecology Society meeting, and published in peer-reviewed journals. All data and sample access will be handled according to NSF data-sharing policies. Samples of soil strata will be stored appropriately for future use or for lending to other institutions.” Performance Criterion Describes how the data will be made publicly available Complete / detailed Includes specific details on the means by which the data will be made available. Addressed issue, but incomplete Includes vague/limited details on the means by which the data will be made available, or sharing details can be inferred because the plan indicates that data will be deposited with a repository or archive. @DMPResearch Did not address issue Includes no details on the means by which the data will be made available. 27 Oct 2015 26 General results @DMPResearch 27 Oct 2015 27 Distribution of DMPs across directorates @DMPResearch 27 Oct 2015 28 DMP section 1 n=436 2 3 n=105 n=111 4 5 @DMPResearch 27 Oct 2015 29 Identifies a metadata standard @DMPResearch 27 Oct 2015 30 Sharing method @DMPResearch 27 Oct 2015 31 Data types, metadata, data formats, reuse/redistribution/derivatives BRIEF ANALYSIS OF BIO DMPS 27 October 2015 3 2 DMPs from the BIO Directorate 45 BIO DMPs (< 10% of all reviewed DMPs) – University of Oregon: 17 – Penn State: 10 – University of Michigan: 7 – Oregon State: 6 – Georgia Tech: 5 However, one DMP stated it would not be collecting any data. As a result, analysis was done with 44 DMPs. 27 October 2015 33 Distribution across BIO directorates 27 October 2015 34 27 October 2015 3 4 BIO: Description of data types 27 October 2015 35 27 October 2015 35 BIO: Metadata standards? Which ones? Rubric requirement: Does the plan mention a specific metadata standard? If yes, please describe. 27 October 2015 36 27 October 2015 3 6 BIO: Policies for Sharing and Public Access 27 October 2015 37 27 October 2015 37 BIO: Policies for reuse and redistribution 27 October 2015 38 27 October 2015 3 8 Looking across the data SHARING AND ARCHIVING 27 October 2015 39 27 October 2015 3 9 BIO: How will they share the data? 27 October 2015 40 27 October 2015 4 0 BIO: How will they archive the data? 27 October 2015 41 BIO: Thoughts/Looking ahead • Connects & disconnects • Cross-directorate & cross-institutional comparisons • Implications for library services / librarianship • Implications for NSF and the requirement • Implications for curation infrastructure • Compare with reviews of post-2013 BIO DMPs 42 27 October 2015 4 2 A brief look into SBE DMPs @DMPResearch 27 Oct 2015 43 Social, Behavioral and Economic Sciences (SBE) Directorate (n=50) SBE Office of Multidisciplinary Activities 1 Social & Economic Sciences 3 Behavioral & Cognitive Sciences 4 7 17 2 1 1 1 4 5 4 GT OSU PSU UMICH UO 27 October 2015 4 4 Data description – SBE Guidance “Expected data. The DMP should describe the types of data, samples, physical collections, software, curriculum materials, or other materials to be produced in the course of the project.” (SBE guidance: http://www.nsf.gov/sbe/SBE_DataMgmtPlanPolicy.pdf) 27 October 2015 4 5 Describes types of data to be produced Did not address Addressed but incomplete Complete/ detailed 0 10 20 30 40 50 27 October 2015 4 6 Example: Data to be produced • • • • raw electroencephlogram (EEG) autonomic nervous system (ANS) physiology recordings behavioral data data from questionnaires and other assessments Data will be stored in native formats, in secure spreadsheets (e.g., Excel), statistical data files (e.g., SPSS), and matrices (MATLAB). 27 October 2015 4 7 Restrictions on sharing “Factors that might impinge on their ability to manage data, e.g. legal and ethical restrictions on access to non-aggregated data.” 27 October 2015 4 8 Describes protections for sensitive data Did not address (2%) N/A (32%) Complete (48%) Incomplete (18%) 27 October 2015 4 9 Describes protections for sensitive data “No research material that includes personally identifiable information will be re-used or re-distributed publicly without specific written consent from participants.” 27 October 2015 5 0 SBE: Data Sharing 42% 26% 20% 22% 18% 12% 10% 8% 2% 2% 27 October 2015 5 1 Named data centers and repositories • Archbishopric Archive of Lima (AAL), Peruvian National Archive (AGN). • Chandel Endangered Languages Committee archive • CUAHSI-Consortium of Universities for the Advancement of Hydrologic Science, Inc • ICPSR (4) • Intensively Monitored Watershed server • International Tree-Ring Data Bank • Laboratory of Linguistic and Anthropological Documentation and Research in Argentina. • National Science Digital Library web site • NBER's data website • Paleomagnetic Database Portal (MAGICPMAG) • PANGAEA, NOAA Paleoclimatology archive, CUAHSI • Online Repository of African American Language Corpora (ORAAL) 27 October 2015 5 2 Named data centers and repositories What can this tell us? Similar to other elements of the DMP, it may provide insight into: • Intent • Knowledge • Previous or current practice 27 October 2015 5 3 A take-home lesson funder guidance + requirements + personal practices + domain practices + intentions = DMP 27 October 2015 5 4 Summing up @DMPResearch 27 Oct 2015 55 27 October 2015 56 transition slide @DMPResearch 27 Oct 2015 57 content slide 27 October 2015 5 8 DMP text “The data that we generate will be digital (video and audio recordings, plus transcriptions). We have made arrangements to have the 40 hours of audio and video recordings, corresponding transcriptions and educational materials archived at the Endangered Languages Archive. Archived materials will be accessible to the public in accordance to access restrictions specified by each speaker.” Performance Criterion Complete / detailed Indicates whether Clearly indicates or not the data whether or not data will be archived will be archived, including digital data and physical samples where applicable. Addressed issue, but incomplete Did not address issue Generally describes intent to preserve some aspects of data, but lacks clarity on portions of the dataset. E.g., indicates that digital or physical data will be archived but isn't explicit about both. Makes no mention of intent to archive or preserve digital or physical data. @DMPResearch 27 Oct 2015 59