Chapter 6: Validity

Chapter 6: Validity Concepts
Accuracy: validate the interpretation of test performance
Face Validity
Degree to which a test superficially appears to measure domain
Math test
Establishing Construct Validity
Content Validity
Criterion-related Validity
Comprehensive evaluation of the theoretical framework for a test
Content-related Validity
Systematic examination
Free from irrelevant variable influence
Content-related validity: Process
Complete an examination of the literature
Generate an adequate sampling of the “item universe”
Domain must be proportionately represented in test
Content-related Validity: Procedure
Domain in consideration must be fully described
Description of procedures for item appropriateness & representativeness
Cover subject matter and objectives of testing
Content Validity Ratio (CVR)
Content Validity can be quantifiably measured
ne = number of panelists who agree an item is essential
N = total number of panelists
CVR Example
Gonzalez Anxiety Scale has 50 items
20 experts rate each item
not essential, somewhat essential, and essential
What is the CVR if 9 panelists rate item 1 essential?
Table provided on p. 179
Should we keep item 1?
Content Validity: Limitations
Cultural relativism
Level of expertise of the panelists
Criterion-related validity
Index of relationship between test and criterion
A criterion should be similar to the test, reliable, and valid
SAT predicts college performance (GPA)
Two kinds of Criterion Validity
– Based on temporal (time) estimates
Concurrent Criterion-related Validity
Test and criterion are measured at roughly the same time
Impractical to wait for a secondary evaluation
– e.g., a diagnostic measure to generate diagnostic impression
Predictive Criterion-related Validity
Test and criterion are compared over a period of time
Used in Decision Theory
– e.g., A job abilities test is used to predict job performance
CRV: Limitations
Possible problems from "criterion contamination"
Coefficient affected by range of the sample
Homogeneous vs. heterogeneous sample
Construct-related validity
Extent to which a test measures a theoretical construct
Construct: psychological trait
Construct-related validity: Process
Theoretical relationships specified
Empirical relationships examined
Empirical evidence interpreted
Construct Validity: Techniques
Convergent validation
Discriminant validation
Factor Analysis
Multitrait-Multimethod Matrix
Convergent Validation
A test should correlate highly with another test that is theoretically related
– e.g., a math test and numerical reasoning test
Discriminant Validation
A test ought not to correlate with a theoretically unrelated test
– e.g., a self-esteem test and a comprehension test
Factor Analysis
Descriptive statistical technique
Analyzing the factors/dimensions of the test
Factorial validity
Internal Consistency
Consider homogeneity of a test
Subtests (or items) correlate with test total score
Provides evidence that the test measures a single concept
Predicted Change Over Time
Examining pre and post test scores
Assessing predicted change after an experimental intervention
– e.g., a depression intervention should improve (change) scores on a depression scale
Predicted Differences Between Distinct Groups
Analyzing scores of contrasted groups
Depressed sample scores should differ from the non-depressed sample
Multitrait Multimethod Matrix (MTMM Matrix)
Campbell & Fiske (1959)
Correlation of 2 or more traits by 2 or more methods
Methods: self-report vs. spousal ratings vs. peer observations
Traits: job satisfaction vs. marital satisfaction vs. self-satisfaction
Reliability Coefficients
Monotrait monomethod
– Same trait, same method
Validity Coefficients
Squaring the validity coefficient computes the proportion of variance that could be
accounted for as a result of the test (predictor)
Monotrait Heteromethod
• Same trait, different method
Heterotrait monomethod
• Different trait, same method
Heterotrait heteromethod
• Different trait, different method
Validity Coefficient Magnitude
Nature of the group
Variability in gender, age, education, race
Validity coefficient tends to decrease across groups
Sample range
Homogeneity v. heterogeneity of the sample
The wider range of scores (variability) the higher the correlation
Comparison of extremely different (contrasted) groups
Test Reliability
A validity coefficient is limited by the reliability of the test and reliability of the criterion
An unreliable test is an invalid test
rxy = validity coefficient
rxx =test reliability
ryy = criterion reliability
Test-criterion Relationship
Both assumed to have linear and equal variances
Homoscedascity means equal variances
Curvilinear or unequal variances
Test Bias
Constant or systematic error in a test
A consideration when looking at cross-cultural issues
Is the test fair to all groups?
Differential Validity
Evaluate differences between the validity coefficients using cross-validation
Analysis could reveal shrinkage
Predictive Validity Coefficient Error
Margin of error to be expected in individuals predicted criterion score
Is there error in test validity?
Perform a Standard Error of Estimate (SEE)
Standard Error of Estimate (SEE)
sy2 = standard deviation of criterion score
rxy = square of the validity coefficient of the criterion
Y = 70, sy = 10, rxy = .80
What is SEE?
Decision Theory Cronbach & Glaser (1965)
Criterion related-predictive validity
Expectancy data used in job selection testing
How well does a test predict job performance?
Possible Outcomes
1) Valid acceptance: True positive
2) Valid rejection: True Negative
3) False negative
4) False positive
Incremental Validity
• Base rate
• Cut-off score
• Incremental validity is the increase in predictive validity, over the base rate, because of a
Validity Summary
• Examiner is interested in obtaining information about:
– Examinee's knowledge of a particular domain
– Amount of construct possessed by examinee's on a specified domain
– Examinee's likely performance