NSSE’s Psychometric Portfolio: Evidence for Reliability, Validity, and Other Quality Indicators Thank you for joining us. The Webinar will begin at 3:00pm (EST) Sound tests will begin around 2:55pm (EST) Some general advice before we begin: Additional materials: You may want to open an additional tab or window for easy access to the NSSE Psychometric Portfolio at nsse.iub.edu/links/psychometric_portfolio Sound: Turn up your computer speakers or plug in your headphones to listen to the Webinar. For best results, close all other applications – they may interfere with the audio feed. What to do if you can’t hear anything: • Click on “Meeting” in the left side of the dark grey tool bar at the top of the screen and select “Audio Setup Wizard.” Complete the first part of the Wizard, which ends with a speaker test, to ensure you are properly connected for webinar audio. If you still can’t hear anything, consult your technology support person. • The Webinar will be recorded. If you can’t fix the problem, you will be able to access the recorded session on the NSSE Web site which will be posted a few days after the live session. Using the Chat feature: The Chat window will be available throughout the presentation for participants to interact with presenters and each other. Please use chat to pose questions, suggest a resource etc. NSSE’s Psychometric Portfolio: Evidence for Reliability, Validity, and Other Quality Indicators NSSE Webinar Tuesday, August 31, 2010 Bob Gonyea, NSSE Associate Director Angie Miller, Research Analyst Overview 1. Introduction 2. NSSE Psychometric Portfolio 3. The Framework A. Reliability B. Validity C. Other Quality Indicators 4. Summary of Results 5. Discussion with you Introduction Psychometric testing? What is a psychometric portfolio? Who is this for? NSSE’s Psychometric Framework A framework for organizing and presenting studies about the quality of NSSE Consists of three areas of analysis, each containing multiple approaches 1. Reliability 2. Validity 3. Other Quality Indicators Report Template Each brief report in the portfolio contains: Purpose of the analysis and research question(s) Data description Methods of analysis Results Summary References and additional resources Report Template View the portfolio at: nsse.iub.edu/links/psychometric_portfolio - Take a brief tour of the portfolio - Reliability Reliability refers to the consistency of results • Are results similar across different forms of the instrument or across time periods of data collection? • Reliable instruments/scales imply that data/results are reproducible • Strongly related to error—large amounts of error can lead to unreliable measurements • Reliability measurements can be calculated with data from single or multiple survey administrations Reliability Brief working definitions of reliability: • Internal Consistency • “Item homogeneity, or the degree to which the items on a test jointly measure the same construct” (Henson, 2001) • Temporal Stability • Also called test-retest reliability • “How constant scores remain from one occasion to another” (DeVellis, 2003) • Equivalence • Also called inter-method or parallel forms reliability • “The reproducibility between different versions of an instrument” (Cook & Beckman, 2006) Reliability Example questions of NSSE reliability: • How well do the items within the NSSE benchmarks intercorrelate? • How stable are institutional benchmark scores over time? • Does the NSSE survey produce similar results when administered to the same person at different times? • Are students able to reliably estimate the number of papers they write? • Do different versions of NSSE questions produce similar results, specifically, how often is often? Reliability Results are presented • At the student-level and/or institution-level • Student-level Temporal Stability • Institution-level Temporal Stability • For various subgroups • Internal Consistency • From original studies or previously written papers • Equivalence: How often is often • From multiple years of testing • Reliability Framework Summary of Reliability Results • Internal Consistency- 3 of 5 NSSE benchmarks are generally reliable across various subpopulations • Internal Consistency- Enriching Educational Experiences and Active & Collaborative Learning are less reliable than other benchmarks • Temporal Stability- Benchmarks are highly reliable from year to year at the institution level • Temporal Stability- Students’ responses relatively stable between periods of a few weeks • Equivalence- Students reliably respond to vague quantifiers such as “sometimes” or “often” Validity Validity refers to “the degree to which evidence and theory support the interpretations of test scores entailed by the proposed uses of tests” (Messick, 1989). • Validity is a property of the inference, not the instrument • Involves empirical and theoretical support for the interpretation of the construct • Hypotheses driven (evidences collected) Validity Relied on literature • No consensus on a framework but guided us to include different dimensions • American Educational Research Association, Psychological Association, & National Council on Measurement in Education. (1999), Messick (1989), Cook & Beckman(2006), Borden & Young (2008). • Evolving concept • Not a fixed characteristic (depends on use, population and sample) Validity Brief working definitions of validity: • Response process (cognitive interviews, focus groups) • Reviewing the actions and thought processes of test takers or observers • Content validity (theory, expert reviews) • The extent to which a measure represents all facets of a given construct • Construct validity (factor analyses) • The extent to which a construct actually measures what the theory says it does • Concurrent validity (relations to other variables) • The extent to which a construct correlates with other measures of the same construct that are measured at the same time Validity Brief working definitions of validity: • Known groups validity • The extent to which the construct measures the differences and similarities in various groups similar to expected/known (literature) differences or similarities • Predictive validity • The extent to which a score on a scale or test predicts scores on some criterion measure • Consequential validity • Investigating both positive/negative and intended/unintended consequences of inferences to properly evaluate the validity of construct/assessment Validity Example questions of NSSE validity: • Do BCSSE scales predict NSSE benchmarks? • Is there a relationship between student engagement and selected measures of student success? • Do students interpret the survey questions in the same way the authors intended? • Do students’ responses differ according to group membership in a predictable way? • Do institutions appropriately use the survey data and results? Summary of Validity Results • Response Process- Overall, cognitive interviews and focus groups found survey questions to be clearly worded and understandable. • Content Validity- NSSE conceptual framework summarizes the history of student engagement, NSSE purpose, philosophy, and development of survey • Construct Validity- Deep learning scale shows good factor solution; benchmarks don’t show construct structure. They were developed as additive measures, but improving factor structure is an important consideration for future survey revisions • Concurrent Validity- BCSSE scales are highly related to NSSE benchmarks Summary of Validity Results • Predictive Validity- Student engagement has a positive effect on first-year persistence and cumulative credits taken. Small effects are also seen for GPA • Known Groups Validity- The NSSE benchmarks are able to detect differences between groups in a predictable way. • Consequential Validity- Institutional uses of NSSE data coincide with the intended purposes of the NSSE instrument. Other Quality Indicators “Other Quality Indicators” includes procedures, standards, and other evaluations implemented by NSSE to reduce error and bias, and to increase the precision and rigor of the data. – Assesses NSSE’s adherence to the best practices in survey design, including sampling, survey administration, and reporting – Related to both reliability and validity Other Quality Indicators Brief working definitions for Other Quality Indicators: • Institution participation • Need to look for self-selection bias with those institutions that participate • Item bias • Need to explore if items are functioning differently for different groups of people (DeVellis, 2003) • Measurement error • Verify that procedures, policies, and administrative processes are appropriate and do not introduce error into the data (NCES, 2002) • Data quality issues • Need to look for impact of item non-response and missing data Other Quality Indicators Brief working definitions for Other Quality Indicators: • Mode analysis • Need to examine if mode of completion (paper vs. web) impacts responses (Carini et al., 2003) • Non-response error • Need to examine if responders differ from non-responders (Groves et al., 1992) • Sampling error • Need to determine acceptable rates of sampling error for participating institutions (Dillman, 2007) • Social desirability bias • Need to examine if respondents are answering untruthfully in order to provide socially appropriate responses (DeVellis, 2003) Other Quality Indicators Example questions: • Do NSSE policies and practices adhere to NCES recommended standards and guidelines? • Are institutions that participate in NSSE different from other baccalaureate-granting colleges and universities? • Are responses to the NSSE questionnaire influenced by a tendency to respond in a socially desirable manner? Summary of Other Quality Indicators Results • Institution participation- Institutions participating in NSSE are generally similar to other institutions • Measurement error- NSSE has stringent policies and procedures for data collection and reporting • Mode analysis- No strong evidence for mode bias • Non-response error- Some evidence for non-response issues, such as high rate of drop-off for certain items. However, there is also evidence that neither high school engagement nor attitudes toward engagement influence response to NSSE • Sampling error- Institutions participating in NSSE generally have an adequate number of respondents Conclusion 1. This is responsible survey research. 2. This portfolio is about transparency – so we value feedback. 3. NSSE strives for continuous improvement. 4. We are very excited about the possibilities this presents (e.g., NSSE 2.0) Bob Gonyea rgonyea@indiana.edu Angie Miller anglmill@indiana.edu Discussion • Questions? Bob Gonyea rgonyea@indiana.edu Angie Miller anglmill@indiana.edu Register Now! NSSE Users Workshop Oct. 7-8, 2010 Registration Deadline: Oct. 1, 2010 Fall 2010 NSSE Users Workshop - Dillard University, New Orleans Want to learn more about working with your 2010 NSSE results or your NSSE data from past years as part of your assessment efforts? Join us in New Orleans for the Fall 2010 NSSE Users Workshop, co-hosted by NSSE and Dillard University. The workshop will take place on Thursday and Friday, October 7-8, 2010, on the Dillard campus. Online registration and workshop details are available now at: nsse.iub.edu/links/fall_workshop