Fundamentals of IRT

advertisement
EPSY 546: LECTURE 3
GENERALIZABILITY THEORY
AND
VALIDITY
George Karabatsos
1
GENERALIZABILITY THEORY
2
TRUE SCORE MODEL
• Recall the true score model:
X  n  Tn  en
X+n Observed Test Score of person n,
Tn True Test Score (unknown)
en Random Error (unknown)
3
TRUE SCORE MODEL
• Recall the true score model:
X  n  Tn  en
• One may view that the true score model narrowly
defines error.
1 variable, simple ANOVA:
Between (true score) var + Within (random error) var.
4
GENERALIZABILTY THEORY
• Generalizability Theory extends the true score
model by acknowledging that multiple factors
affect the measurement variance.
– Multivariable ANOVA:
The observed test response is a function of 2 or more
variables, their interactions, and random measurement
error.
5
G-THEORY MODEL (example)
Xnjt =
+
+
+
+
+
+
+

n – 
j – 
t – 
nt – n – t + 
nj – n – j + 
tj – t – j + 
residual
Grand mean
Person n’s effect
Item j’s effect
Time t’s effect
Person  Time effect
Person  Item effect
Time  Item effect
Three way
interaction, and error
6
G-THEORY
VARIANCE PARTITION
Systematic
Persons
2P
Measurement Error (facet contributions)
Items
2I
Time
2T
Person  Time
2 PT
Person  Item
2 PI
Time  Item
2 TI
3-way inter + error
2PIT, error
7
G-THEORY OF DECISIONS
• Relative decisions: Decisions based on the rank
ordering of persons (e.g., college admission,
pass-fail testing).
• Variance contributing to measurement error for
relative decisions:
2Relat = 2PI + 2PT + 2PIT,error
(all variance components associated with the
interaction of persons)
8
G-THEORY OF DECISIONS
• Absolute decisions: Decisions based on the level
of the observed score, without regard to the
performance of others. (e.g., driver’s license).
• Variance contributing to measurement error for
absolute decisions :
2Abs = 2T + 2I + 2PI + 2PT + 2IT + 2PIT,error
(all variance components associated with the
facets, which introduce “constant” effects to
absolute decisions)
9
GENERALIZABILITY
COEFFICIENT

E  2
P  
2
with: 
2
P
2
Decision
2
Decision

,
2
Relat
or 
2
Abs
• Indicates how accurately the observed test scores
allows us to generalize about persons’ behavior in
a designed universe of situations (Cronbach,
1972).
10
STUDIES
• G-Study (Generalizability Study):
Aims to estimate the variance components
underlying a measurement process by defining
the universe of admissible observations as
broadly as possible.
11
STUDIES
• D-Study (Design Study):
Using G-study results to address “what if”
questions about variation in measurement design
(Thompson & Melancon, 1987).
This helps pinpoint sources of error to specify
protocol modifications to obtain the desired level
of generalizability.
12
EXAMPLES OF G- THEORY
• Nice illustrations are offered in:
Webb, Rowley, & Shavelson (1988)
and
Crowley, Thompson, & Worchel (1994)
13
VALIDITY
14
TEST VALIDITY
• VALIDITY: A test is valid if it
measures what it claims to measure.
• Types: Face, Content, Concurrent,
Predictive, Construct.
15
TEST VALIDITY
• Face validity: When the test items appear to
measure what the test claims to measure.
• Content Validity: When the content of the
test items, according to domain experts,
adequately represent the latent trait that the
test intends to measure.
16
TEST VALIDITY
• Concurrent validity: When the test, which
intends to measure a particular latent trait,
correlates highly with another test that
measures that trait.
• Predictive validity: When the scores of the
test predict some meaningful criterion.
17
TEST VALIDITY
• Construct validity: A test has construct
validity when the results of using the test fit
hypotheses concerning the theoretical
nature of the latent trait. The higher the fit,
the higher the construct validity.
18
MESSICK’S UNIFIED
CONSTRUCT VALIDITY
– Content: Item content relevance, representativeness,
and technical quality (includes face).
– Substantive: Theoretical rationales for the observed
consistencies in the test responses.
– Structural: Fidelity of scoring structure to the
structure of the content domain.
– Generalizability: The extent to which the score
properties and interpretations generalize
over population groups, settings, and tasks.
– External: Concurrent/convergent, discrim., pred.
– Consequential: refers to the (potential and actual)
19
consequences of test use.
Download