USING DATA MANAGEMENT PLANS as a RESEARCH TOOL for IMPROVING DATA SERVICES in ACADEMIC LIBRARIES Amanda Whitmire, Lizzy Rolando & Brian Westra IASSIST 2015 Minneapolis, MN 2-6 June 2015 Jake Carlson, Patricia Hswe & Susan Wells Parham D A R T Team DART Project | @DMPResearch Amanda Whitmire | @AWhitTwit Jake Carlson | @jrcarlso Patricia M. Hswe | @pmhswe Susan Wells Parham | Lizzy Rolando | Brian Westra | @bdwestra http://bit.ly/dmpresearch 5 June 2015 2 D A R T Team Acknowledgements Amanda Whitmire | Oregon State University Libraries Jake Carlson | University of Michigan Library Patricia M. Hswe | Pennsylvania State University Libraries Susan Wells Parham | Georgia Institute of Technology Library Lizzy Rolando | Georgia Institute of Technology Library Brian Westra | University of Oregon Libraries This project was made possible in part by the Institute of Museum and Library Services grant number LG-07-13-0328. 5 June 2015 3 5 June 2015 4 5 June 2015 5 5 June 2015 6 5 June 2015 7 transition slide 5 June 2015 8 Levels of data services high level mid-level the basics infrastructure metadata support DMP review data curation facilitate deposit in DRs consults website dedicated “research services” workshops From: Reznik-Zellen, Rebecca C.; Adamick, Jessica; and McGinty, Stephen. (2012). "Tiers of Research Data Support Services." Journal of eScience Librarianship 1(1): Article 5. http://dx.doi.org/10.7191/jeslib.2012.1002 5 June 2015 9 Informed data services development Survey 5 June 2015 10 Informed data services development Survey DCPs 5 June 2015 11 Informed data services development DMP Survey DCPs DMPs 5 June 2015 12 DART Premise DMP researcher Research Data Management knowledge capabilities practices needs 5 June 2015 13 5 June 2015 14 DART Premise Research Data Management knowledge capabilities Research Data Services practices needs 5 June 2015 15 DART Premise 5 June 2015 16 We need a tool 5 June 2015 17 Solution: an analytic rubric Performance Criteria Performance Levels High Medium Low Thing 1 Thing 2 Thing 3 5 June 2015 18 NSF Directorate or Division BIO Biological Sciences DBI DEB EF IOS MCB CISE ACI CCF CNS IIS EHR DGE DRL DUE HRD Biological Infrastructure Environmental Biology Emerging Frontiers Office Integrative Organismal Systems Molecular & Cellular Biosciences Computer & Information Science & Engineering Advanced Cyberinfrastructure Computing & Communication Foundations Computer & Network Systems Information & Intelligent Systems Education & Human Resources Division of Graduate Education Research on Learning in Formal & Informal Settings Undergraduate Education Human Resources Development NSF Directorate or Division ENG Engineering Chemical, Bioengineering, Environmental, & CBET Transport Systems CMMI Civil, Mechanical & Manufacturing Innovation ECCS Electrical, Communications & Cyber Systems EEC Engineering Education & Centers EFRI Emerging Frontiers in Research & Innovation IIP Industrial Innovation & Partnerships GEO AGS EAR OCE PLR Geosciences Atmospheric & Geospace Sciences Earth Sciences Ocean Sciences Polar Programs MPS AST CHE DMR DMS PHY Mathematical & Physical Sciences Astronomical Sciences Chemistry Materials Research Mathematical Sciences Physics division-specific guidance 5 June 2015 19 Source Guidance text NSF guidelines The standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies) BIO Describe the data that will be collected, and the data and metadata formats and standards used. CSE The DMP should cover the following, as appropriate for the project: ...other types of information that would be maintained and shared regarding data, e.g. the means by which it was generated, detailed analytical and procedural information required to reproduce experimental results, and other metadata ENG Data formats and dissemination. The DMP should describe the specific data formats, media, and dissemination approaches that will be used to make data available to others, including any metadata GEO AGS Data Format: Describe the format in which the data or products are stored (e.g. hardcopy logs and/or instrument outputs, ASCII, XML files, HDF5, CDF, etc). 5 June 2015 20 Project team testing & revisions Feedback & iteration Rubric Advisory Board 5 June 2015 21 Performance Level Directorate- or divisionspecific assessment criteria General Assessment Criteria Performance Criteria Complete / detailed Addressed issue, but incomplete Did not address issue Directorates Describes what types of data will be captured, created or collected Clearly defines data type(s). E.g. text, spreadsheets, images, 3D models, software, audio files, video files, reports, surveys, patient records, samples, final or intermediate numerical results from theoretical calculations, etc. Also defines data as: observational, experimental, simulation, model output or assimilation Some details about data types are included, but DMP is missing details or wouldn’t be well understood by someone outside of the project No details included, fails to adequately describe data types. All Describes how data will be collected, captured, or created (whether new observations, results from models, reuse of other data, etc.) Clearly defines how data will be captured or created, including methods, instruments, software, or infrastructure where relevant. Missing some details regarding how some of the data will be produced, makes assumptions about reviewer knowledge of methods or practices. Does not clearly address how data will be captured or created. GEO AGS, GEO EAR SGP, MPS AST Identifies how much data (volume) will be produced Amount of expected data (MB, GB, TB, etc.) is clearly specified. Amount of expected data (GB, TB, etc.) is vaguely specified. Amount of expected data (GB, TB, etc.) is NOT specified. GEO EAR SGP, GEO AGS 5 June 2015 22 Mini-reviews 1&2 5 June 2015 23 5 June 2015 24 Inter-rater reliability 5 June 2015 25 Inter-rater reliability Wherein I try not to put you to sleep. 5 June 2015 26 A primer on scoring X=T+E Very helpful excerpts from: Hallgren, Kevin A. “Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial.” Tutorials in Quantitative Methods for Psychology 8, no. 1 (2012): 23–34. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3402032/. 5 June 2015 27 A primer on scoring X=T+E Observed Score True Score Measurement Error 5 June 2015 28 A primer on scoring X=T+E Observed Score True Score If there were no error Measurement Error noise 5 June 2015 29 A primer on scoring X=T+E Observed Score True Score Measurement Error Could be issues of: • • • internal consistency test-retest reliability inter-rater reliability 5 June 2015 30 A primer on scoring Var(X) = Var(T) + Var(E) Variance in Observed Scores Variance in True Scores Variance in Errors 5 June 2015 31 Inter-rater reliability “IRR analysis aims to determine how much of the variance in the observed scores is due to variance in the true scores after the variance due to measurement error between coders has been removed.” Hallgren, Kevin A. “Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial.” Tutorials in Quantitative Methods for Psychology 8, no. 1 (2012): 23–34. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3402032/. 5 June 2015 32 Inter-rater reliability “IRR analysis aims to determine how much of the variance in the observed scores is due to variance in the true scores after the variance due to measurement error between coders has been removed.” If IRR = 0.80: 80% of Var(X) is due to Var(T) 20% of Var(X) is due to Var(E) Var(X) = Var(T) + Var(E) Hallgren, Kevin A. “Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial.” Tutorials in Quantitative Methods for Psychology 8, no. 1 (2012): 23–34. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3402032/. 5 June 2015 33 Measures of IRR 1. Percentage agreement | not for ordinal data; overestimates agreement 2. Cronbach’s alpha | works for 2 raters only 3. Cohen’s kappa | used for nominal data; works for 2 raters only 4. Fleiss’s kappa | for nominal variables 5. Intra-class correlation (ICC) | perfect! 5 June 2015 34 Intra-class correlation (ICC) Variance due to rated subjects (DMPs) ICC = (Variance due to DMPs + Variance due to raters + Residual Variance) 6 variations of ICC – must choose carefully based on study design Shrout PE, Fleiss JL. Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin. 1979; 86(2):420–428. McGraw KO, Wong SP. Forming inferences about some intraclass correlation coefficients. Psychological Methods. 1996; 1(1):30–46. 5 June 2015 35 Intra-class correlation (ICC) ICC_results <- icc(ratingsData, model="twoway", type="agreement", unit="single") “two-way” | vs. one-way; raters are random & DMPs are random “agreement” | vs. consistency; looking for absolute agreement b/w raters “single” | vs. average; single ratings are used, not averages of ratings 5 June 2015 36 ICC: consistency vs. agreement Rater 2 always rates 4 points higher than Rater 1 Rater 2 = 1.5 x Rater 1 Rater 2 = Rater 1 5 June 2015 37 Intra-class correlation (ICC) ICC_results <- icc(ratingsData, model="twoway", type="agreement", unit="single") “two-way” | vs. one-way; raters are random & DMPs are random “agreement” | vs. consistency; looking for absolute agreement b/w raters “single” | vs. average; single ratings are used, not averages of ratings 5 June 2015 38 Inter-rater reliability 5 16 3 1 Mean = 0.487 | Median = 0.464 Standard Deviation = 0.112 Mean = 0.731 | Median = 0.759 Standard Deviation = 0.146 0-0.39 = poor | 0.40 – 0.59 = fair | 0.60 – 0.74 = good | 0.75 – 1 = excellent 5 June 2015 39 Inter-rater reliability 5 5 16 7 3 1 12 Mean = 0.487 | Median = 0.464 Standard Deviation = 0.112 Mean = 0.731 | Median = 0.759 Standard Deviation = 0.146 0-0.39 = poor | 0.40 – 0.59 = fair | 0.60 – 0.74 = good | 0.75 – 1 = excellent 5 June 2015 40 Inter-rater reliability Mean = 0.487 | Median = 0.464 Standard Deviation = 0.112 Mean = 0.731 | Median = 0.759 Standard Deviation = 0.146 0-0.39 = poor | 0.40 – 0.59 = fair | 0.60 – 0.74 = good | 0.75 – 1 = excellent 5 June 2015 41 excellent good fair poor 5 June 2015 42