Test Validity

advertisement
Test Validity
“… the development of a valid test requires multiple procedures,
which are employed at different stages of test construction …
The validation process begins with the formulation of detailed trait or
construct definitions … Test items are then prepared to fit the
construct definitions. Empirical item analyses follow … Other
appropriate internal analyses may then be carried out including
factor analyses of item clusters or subtests … The final stage
includes validation and cross-validation of various scores and
interpretive combinations of scores through statistical analyses
against external, real-life criteria.” (Anastasi, 1986, p.3)
Almost any information gathered in the process of developing or
using a test is relevant to its validity … If we think of test validity in
terms of understanding what a particular test measures, it
should be apparent that virtually any empirical data obtained with
the test represent a potential source of validity information.”
(Anastasi, 1986, p.3)
Types of Validity
Content Validity
[the extent to which test items represent a domain]
a) Subject Matter Expert Opinions (e.g., CVR statistic)
b) Internal consistency reliability
c) Correlation with other similar tests
Content relevance
Domain specification
Content coverage
Domain representativeness
Steps in a Content Validation Effort
1) Perform a job analysis
•
Description of job tasks
•
Rating of job tasks on various criteria
•
Specification of KSAs
•
Rating of KSAs on various criteria
•
Link/connect tasks to KSAs
From SIOP Principles: “The characterization of the work domain should be based on accurate
and thorough information about the work including analysis of work behaviors and
activities, responsibilities of the job incumbents, and/or the KSAOs prerequisite to effective
to effective performance on the job. The researcher should indicate what important work
behaviors , activities, and worker KSAOs are included in the domain, describe how the
content of the domain is linked to the selection procedure, and explain why
certain parts of the domain were or were not included in the selection procedure.” (p. 22)
2) Selection of SMEs
From SIOP Principles: “ The success of the content-based study is closely related to the
qualifications of the subject matter experts (SMEs) … The experts should have
thorough knowledge of the work behaviors and activities, responsibilities of job
incumbents, and the KSAOs prerequisite to effective to effective performance on the job” (p.
22)
3) Writing (or choosing) and evaluation of selection test items
TASK -- KSA MATRIX
To what extent is each KSA needed when performing each job task?
5 = Extremely necessary, the job task cannot be performed without the KSA
4 = Very necessary, the KSA is very helpful when performing the job task
3 = Moderately necessary, the KSA is moderately helpful when performing the job task
2 = Slightly necessary, the KSA is slightly helpful when performing the job task
1 = Not necessary, the KSA is not used when performing the job task
KSA
Job
Tasks
1
2
3
4
5
6
7
8
9
10
11
12
13
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
Sample Item Rating Form
Connect each
item to a KSA or
two
Rate difficulty of
each item (5point scale)
relative to the
level of KSA
needed in the
job)
Content Validity Issues
• Are the job activities and requirements stable across time?
• Does successful performance on the test require the same
KSAs as successful performance on the job?
• Is the type (or mode) of testing procedure the same as
that required on the job?
• Do some KSAs not required on the job exist on the test?*
(Stutts v. Freeman (1983) GATB test scores, dyslexia (manual job)
• Limited usefulness when abstract constructs are being measured (a
small inferential leap is required between the test content
and job requirements)***
From Anastasi (1986): “When tests are designed for use within special contexts, the relevant constructs are usually derived from
content analysis of particular behavior domains” (p. 7).
From SIOP Principles: “ When selection procedure content is linked to job content, content-oriented strategies are useful. When
selection procedure content is less clearly linked to job content, other sources of validity evidence take precedence” (p. 23).
Section 1607.C(1) of the Uniform Guidelines
“Appropriateness of content validity studies”
A selection procedure based on inferences about mental processes
cannot be supported solely or primarily on the basis of content validity. Thus, a
content strategy is not appropriate for demonstrating the validity of selection
procedures which purport to measure traits or constructs such as intelligence,
aptitude, personality, common sense, judgment, leadership and spatial ability
SIOP Principles stress a “unitarian” perspective where ANY EVIDENCE of validity
supports inferences of job relatedness regardless of method used
Guardians v. Civil Service (1980) – rank ordering of scores based on content evidence
1. Suitable job analysis
2. Reasonable competence in test construction
3. Test content related to job content
4. Test content representative of job content
5. Scoring systems selecting applicants who are better job performers
Types of Validity (cont.)
Criterion-related Validity
Concurrent
Correlation between test scores
and performance scores
collected at the same time (e.g.,
correlating test scores with
existing performance scores of
employees)
• Motivation level
• Guessing, Faking
• Job experience factor
• Range restriction issue on
performance scores
Predictive
Correlation between test
scores and performance
scores after some time
interval has passed (e.g.,
correlating test scores of
applicants and subsequent
performance scores collected
6 months to a year later)
• Range restriction issue on
performance scores
• Time, cost, & pragmatic
concerns
Criterion-related Validity Issues
A) Job Stability
B) Reliable and relevant measure of job performance
From SIOP Principles: “A relevant, reliable, and uncontaminated criterion(s) must
be obtained or developed. Of these characteristics, the most important is
relevance. A relevant criterion is one that reflects the relative standing of
employees with respect to important work behavior(s) or outcome measure(s).
If such a criterion measure does not exist or cannot be developed, use of a
criterion-related validation strategy is not feasible (p. 14).
C) Use of a representative sample of people and jobs
D) Large sample (on predictor and criterion)
From SIOP Principles: “A competent criterion-related validity study should be
based on a sample that is reasonably representative of the work and
candidate pool … A number of factors related to statistical power can influence
the feasibility of a criterion-related study. Among these factors are the degree
(and type) of range restriction in the predictor or the criterion, reliability of the
criterion, and statistical power (p. 14)
Legal Issues and Criterion-related Validity
• Court focus on the content of measures as opposed to
criterion validity evidence (relationship between test cores
and job performance)
• Emphasis on the legal history of tests
• Criterion-validity emphasis versus concurrent validity
designs
• Statistical significant relationships are not always
acceptable (consideration of other factors such as test utility)
Factors Affecting the Validity Coefficient
[correlation between a test and job performance]
• Reliability of both the criterion (job performance) and the predictor (test)
• Restriction of range (on both the test and job performance measure)
• Contamination of the criterion (e.g., measure of job performance is
affected by other variables rather than one’s ability or knowledge)
y = standard deviation of y
(criterion)
Standard error of estimate
(validity coefficient):
y’ = y
2
1 - r xy
2
r xy = correlation between x
and y squared
Correction for Attenuation
T
x y=
xy
0
Observed validity coefficient
 yy
Criterion reliability
Validity coefficient

 =
 of unrestricted sample
S1
S1
2
1- +
 of restricted sample
2
S 12
2
2
 =
1-
S1
2
S1
S1
(Predictor)
2
(1 -  )
(Criterion)
Test Utility Key Points
Selection Ratio
(SR) =
# Job openings
n
N
# Applicants
Test Validity [Criterion-related]: The extent to which test
scores correlate with job performance scores [Range is from 0
to 1.0]
Proportion of “Successes” Expected Through the Use of Test of Given Validity
and Given Selection Ratio, for Base Rate .60.
(From Taylor & Russell, 1939, p. 576)
Selection Ratio(SR)
Validity
.05
.10
.20
.30
.40
.50
.60
.70
.80
.90
.95
.00
.05
.10
.15
.20
.60
.64
.68
.71
.75
.60
.63
.67
.70
.73
.60
.63
.65
.68
.71
.60
.62
.64
.67
.69
.60
.62
.64
.66
.67
.60
.62
.63
.65
.66
.60
.61
.63
.64
.65
.60
.61
.62
.63
.64
.60
.61
.61
.62
.63
.60
.60
.61
.61
.62
.60
.60
.60
.60
.61
.25
.30
.35
.40
.45
.78
.82
.85
.88
.90
.76
.79
.82
.85
.87
.73
.76
.78
.81
.83
.71
.73
.75
.78
.80
.69
.71
.73
.75
.77
.68
.69
.71
.73
.74
.66
.68
.69
.70
.72
.65
.66
.67
.68
.69
.63
.64
.65
.66
.66
.62
.62
.63
.63
.64
.61
.61
.62
.62
.62
.50
.55
.60
.65
.70
.93
.95
.96
.98
.99
.90
.92
.94
.96
.97
.86
.88
.90
.92
.94
.82
.84
.87
.89
.91
.79
.81
.83
.85
.87
.76
.78
.80
.82
.84
.73
.75
.76
.78
.80
.70
.71
.73
.74
.75
.67
.68
.69
.70
.71
.64
.64
.65
.65
.66
.62
.62
.63
.63
.63
.75
.80
.85
.90
.95
1.00
.99
1.00
1.00
1.00
1.00
1.00
.99
.99
1.00
1.00
1.00
1.00
.96
.98
.99
1.00
1.00
1.00
.93
.95
.97
.99
1.00
1.00
.90
.92
.95
.97
.99
1.00
.86
.88
.91
.94
.97
1.00
.81
.83
.86
.88
.92
1.00
.77
.78
.80
.82
.84
.86
.71
.72
.73
.74
.75
.75
.66
.66
.66
.67
.67
.67
.63
.63
.63
.63
.63
.63
Note: A full set of tables can be found I Taylor and Russell (1939) and in McCormick and Ilgen (1980, Appendix B).
Selection Ratio Example
Mean Standard Criterion Score of Accepted Cases in Relation to Test Validity and Selectio
(From Brown & Ghiselli, 1953, p. 342)
Validity Coefficient
Selection
Ratio .00 .05 .10 .15 .20 .25 .30 .35 .40 .45 .50 .55 .60 .65 .70 .75 .80 .85 .90 .95 1.00
.05
.10
.15
.20
.25
.30
.35
.40
.45
.50
.50
.60
.65
.70
.75
.80
.85
.90
.95
.00
.00
.00
.00
.00
.00
.00
.00
.00
.00
.00
.00
.00
.00
.00
.00
.00
.00
.00
.10
.09
.08
.07
.06
.06
.05
.05
.04
.04
.04
.03
.03
.02
.02
.02
.01
.01
.01
.21
.18
.15
.14
.13
.12
.11
.10
.09
.08
.07
.06
.06
.05
.04
.04
.03
.02
.01
.31
.26
.23
.21
.19
.17
.16
.15
.13
.12
.11
.10
.09
.07
.06
.05
.04
.03
.02
.42
.35
.31
.28
.25
.23
.21
.19
.18
.16
.14
.13
.11
.10
.08
.07
.05
.04
.02
.52
.44
.39
.35
.32
.29
.26
.24
.22
.20
.18
.16
.14
.12
.11
.09
.07
.05
.03
.62
.53
.46
.42
.38
.35
.32
.29
.26
.24
.22
.19
.17
.15
.13
.11
.08
.06
.03
.73
.62
.54
.49
.44
.40
.37
.34
.31
.28
.25
.23
.20
.17
.15
.12
.10
.07
.04
.83
.70
.62
.56
.51
.46
.42
.39
.35
.32
.29
.26
.23
.20
.17
.14
.11
.08
.04
.94
.79
.70
.63
.57
.52
.48
.44
.40
.36
.32
.29
.26
.22
.19
.16
.12
.09
.05
1.04
.88
.77
.70
.63
.58
.53
.48
.44
.40
.36
.32
.28
.25
.21
.18
.14
.10
.05
1.14
.97
.85
.77
.70
.64
.58
.53
.48
.44
.40
.35
.31
.27
.23
.19
.15
.11
.06
1.25
1.05
.93
.84
.76
.69
.63
.58
.53
.48
.43
.39
.34
.30
.25
.21
.16
.12
.07
1.35
1.14
1.01
.91
.82
.75
.69
.63
.57
.52
.47
.42
.37
.32
.27
.22
.18
.13
.07
1.46
1.23
1.08
.98
.89
.81
.74
.68
.62
.56
.50
.45
.40
.35
.30
.25
.19
.14
.08
1.56
1.32
1.16
1.05
.95
.87
.79
.73
.66
.60
.54
.48
.43
.37
.32
.26
.20
.15
.08
1.66
1.41
1.24
1.12
1.01
.92
.84
.77
.70
.64
.58
.52
.46
.40
.33
.28
.22
.16
.09
1.77
1.49
1.32
1.19
1.08
.98
.90
.82
.75
.68
.61
.55
.48
.42
.36
.30
.23
.17
.09
1.87
1.58
1.39
1.26
1.14
1.04
.95
.87
.79
.72
.65
.58
.51
.45
.38
.32
.25
.18
.10
1.98
1.67
1.47
1.33
1.20
1.10
1.00
.92
.84
.76
.68
.61
.54
.47
.40
.33
.26
.19
.10
2.08
1.76
1.55
1.40
1.27
1.16
1.06
.97
.88
.80
.72
.64
.57
.50
.42
.35
.27
.20
.11
Example of Brogden and Cronbach & Gleser Models
Ns rxy SDyZx – NT (C)
validity
# of applicants coefficient
selected
cost of assessing
each applicant
number of applicants
assessed
average score on the
selection procedure of those
selected (standard score)
standard deviation of
job performance in
dollar terms
Types of Validity (cont.)
Construct Validity
[extent to which a test assesses the construct it intends to
measure]
• Correlation between scores measuring a construct (e.g., anxiety) with one
method (e.g., paper & pencil) with scores on the same construct using a
different method (e.g., interview) [Convergent validation]
• Correlation between scores measuring a construct (e.g., anxiety) using one
method (e.g., paper & pencil) with scores on a different construct (e.g.,
leadership) assessed with a different method (e.g., interview) [Discriminant
validation]
“Construct validation is indeed a never-ending process. However, that should
not preclude using the test operationally to help solve practical problems and
reach real-life decisions as soon as the available validity information has
reached an acceptable level for a particular application. This level varies with
the type of test and the way it will be used. Establishing this level requires
informed professional judgment within the appropriate specialty of professional
practice.” (Anastasi, p.4)
Method 1
(Paper & Pencil)
Traits
Method 1
(Paper & Pencil)
Hetero-Trait;
Mono Method
Method 2
(Clinical
Interview)
A (Boredom)
B
A
B
Method 3
(Peer observation)
C
A
B
C (Anxiety)
.33 .36
.87
.55
.20 .08 .92
.20
.46 .12 .54 .93
.15
.15 .53 .62 .55
A (Boredom)
.55
.20 .15
.61
.35 .41 .90
B (Dep)
.21
.46 .13
.40
.54 .37 .49 .93
C (Anxiety)
.15
.15 .53 .31 .32
B (Dep)
C
Reliability Figures
.49 .91
A (Boredom)
C
.89
B (Dep)
C (Anxiety)
Method 3
(Peer
observation)
A
Method 2
(Clinical
Interview)
Mono-Trait; Hetero-Method
.82
.66
.54 .52
.87
Hetero-Trait; Hetero-Method
SIOP Position on Uniform Guidelines
We suggest the Guidelines as a high-priority for revision because we believe
the regulatory standards should consider contemporary scientific research and
practice. Professional associations like SIOP, APA, AERA, and NCME have
documented these advances in scholarly literature and in technical authorities like
the Principles and Standards. Unfortunately, there are inconsistencies
between the Guidelines and some scholarly literature related to
validation research and the use of employee selection procedures,
and between the Guidelines and other technical authorities. These inconsistencies
create substantial ambiguity for employers that use employee selection procedures,
as well as for federal agencies and the courts when determining whether a
selection procedure is job-related. Consideration of contemporary research and
scientifically supported recommendations will help clarify the standards for valid
selection procedures.
Outtz Summary of Uniform Guidelines (UG)
• PR Nightmare (e.g., UG goal is to prohibit discrimination in employment)
• Much content of UG are not related to scientific practices./research/
Why redo the entire guidelines?
• What would replace the UG? Who would be involved in doing this (e.g.,
stakeholders)? How about the competing interests of various groups?
• Case law from UG would still exist (SIOP ought to focus on impacting
court decisions – position papers, lobbying efforts. litigation)
• “Search for alternatives” is part of the UG and has lead to many
advancements in selection research
Download