Chapter 6: Validity Concepts Definition • Accuracy: validate the interpretation of test performance Face Validity • Degree to which a test superficially appears to measure domain • Math test Establishing Construct Validity • Content Validity • Criterion-related Validity • Comprehensive evaluation of the theoretical framework for a test Content-related Validity • Systematic examination • Free from irrelevant variable influence Content-related validity: Process • Complete an examination of the literature • Generate an adequate sampling of the “item universe” • Domain must be proportionately represented in test Content-related Validity: Procedure • Domain in consideration must be fully described • Description of procedures for item appropriateness & representativeness • Cover subject matter and objectives of testing Content Validity Ratio (CVR) • Content Validity can be quantifiably measured ne = number of panelists who agree an item is essential • N = total number of panelists CVR Example • Gonzalez Anxiety Scale has 50 items • 20 experts rate each item • not essential, somewhat essential, and essential • What is the CVR if 9 panelists rate item 1 essential? • Table provided on p. 179 • Should we keep item 1? Content Validity: Limitations • Biases • Cultural relativism • Level of expertise of the panelists Criterion-related validity • Index of relationship between test and criterion • A criterion should be similar to the test, reliable, and valid • SAT predicts college performance (GPA) Two kinds of Criterion Validity • Concurrent • Predictive – Based on temporal (time) estimates Concurrent Criterion-related Validity • Test and criterion are measured at roughly the same time • Impractical to wait for a secondary evaluation – e.g., a diagnostic measure to generate diagnostic impression Predictive Criterion-related Validity • Test and criterion are compared over a period of time • Used in Decision Theory – e.g., A job abilities test is used to predict job performance CRV: Limitations • Possible problems from "criterion contamination" • Coefficient affected by range of the sample • Homogeneous vs. heterogeneous sample Construct-related validity • Extent to which a test measures a theoretical construct • Construct: psychological trait Construct-related validity: Process • Theoretical relationships specified • Empirical relationships examined • Empirical evidence interpreted Construct Validity: Techniques • Convergent validation • Discriminant validation • Factor Analysis • Multitrait-Multimethod Matrix • Reliability Convergent Validation • A test should correlate highly with another test that is theoretically related – e.g., a math test and numerical reasoning test Discriminant Validation • A test ought not to correlate with a theoretically unrelated test – e.g., a self-esteem test and a comprehension test Factor Analysis • Descriptive statistical technique • Analyzing the factors/dimensions of the test • Factorial validity Internal Consistency • Consider homogeneity of a test • Subtests (or items) correlate with test total score • Provides evidence that the test measures a single concept Predicted Change Over Time • Examining pre and post test scores • Assessing predicted change after an experimental intervention – e.g., a depression intervention should improve (change) scores on a depression scale Predicted Differences Between Distinct Groups • Analyzing scores of contrasted groups • Depressed sample scores should differ from the non-depressed sample Multitrait Multimethod Matrix (MTMM Matrix) • Campbell & Fiske (1959) • Correlation of 2 or more traits by 2 or more methods • Methods: self-report vs. spousal ratings vs. peer observations • Traits: job satisfaction vs. marital satisfaction vs. self-satisfaction Reliability Coefficients • Monotrait monomethod – Same trait, same method Validity Coefficients • Squaring the validity coefficient computes the proportion of variance that could be accounted for as a result of the test (predictor) Monotrait Heteromethod • Same trait, different method Heterotrait monomethod • Different trait, same method Heterotrait heteromethod • Different trait, different method Validity Coefficient Magnitude • Nature of the group • Variability in gender, age, education, race • Validity coefficient tends to decrease across groups Sample range • Homogeneity v. heterogeneity of the sample • The wider range of scores (variability) the higher the correlation • Comparison of extremely different (contrasted) groups Test Reliability • A validity coefficient is limited by the reliability of the test and reliability of the criterion • An unreliable test is an invalid test • rxy = validity coefficient • rxx =test reliability • ryy = criterion reliability Test-criterion Relationship • Both assumed to have linear and equal variances • Homoscedascity means equal variances • Curvilinear or unequal variances Test Bias • Constant or systematic error in a test • A consideration when looking at cross-cultural issues • Is the test fair to all groups? Differential Validity • Evaluate differences between the validity coefficients using cross-validation • Analysis could reveal shrinkage Predictive Validity Coefficient Error • Margin of error to be expected in individuals predicted criterion score • Is there error in test validity? • Perform a Standard Error of Estimate (SEE) Standard Error of Estimate (SEE) • sy2 = standard deviation of criterion score rxy = square of the validity coefficient of the criterion • • Example Y = 70, sy = 10, rxy = .80 • What is SEE? Decision Theory Cronbach & Glaser (1965) • Criterion related-predictive validity • Expectancy data used in job selection testing • How well does a test predict job performance? Possible Outcomes • 1) Valid acceptance: True positive • 2) Valid rejection: True Negative • 3) False negative • 4) False positive Incremental Validity • Base rate • Cut-off score • Incremental validity is the increase in predictive validity, over the base rate, because of a test Validity Summary • Examiner is interested in obtaining information about: – Examinee's knowledge of a particular domain – Amount of construct possessed by examinee's on a specified domain – Examinee's likely performance