INTRO 2 IRT

advertisement
INTRO 2 IRT
Tim Croudace
Descriptions of IRT
• “IRT refers to a set of
mathematical models that
describe, in probabilistic terms,
the relationship between a
person’s response to a survey
question/test item and his or her
level of the ‘latent variable’
being measured by the scale”
• Fayers and Hays p55
– Assessing Quality of Life in
Clinical Trials. Oxford Univ Press:
– Chapter on Applying IRT for
evaluating questionnaire item
and scale properties.
• This latent variable is usually
a hypothetical construct
[trait/domain or ability]
which is postulated to exist
but cannot be measured by a
single observable
variable/item.
• Instead it is indirectly
measured by using multiple
items or questions in a multiitem test/scale.
2
logit {πhi} = αh 0 + αh 1zi
αh0
αh1
α10
α21
αh0 α40
The data:
0000
1000
0001
0010
1001
1010
0011
1011
0100
1100
0101
0110
1101
1110
0111
1111
n
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31
Sources of knowledge : q1 radio
q2 newspapers q3 reading
q4 lectures
3
A single latent dimension Z Normal (mean 0; std dev =1 ) so Var= 1 too!
Simple sum scores
(n=1729 new individual values)
0
0
1
0
0
1
1
0
1
0
1
0
0
1
1
0
1
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
0
0
0
0
1
0
1
1
1
0
0
0
1
0
1
1
1
0
0
0
1
0
1
0
1
1
0
0
1
0
1
0
1
1
[n]
Total score
477
0
63
1
12
1
150
1
7
2
32
2
11
2
4
3
231
1
94
2
13
2
378
2
12
3
169
3
45
3
31
4
477 zeros added to data set (new column)
4
Binary Factor / Latent Trait Analysis
Results: logit-probit model
Warming up to this sort of thing … soon ….
U1
U2
U3
...
Up
F
2 items with similar thresholds and similar slopes
3 items with different thresholds but similar slopes
5
The key concept … latent factor models
for constructs underpinning multiple
binary (0/1) responses
• … based on innovations in educational testing
and psychometric statistics > 50 years old
• Same models used in educational testing with
correct incorrect answers can be applied to
symptom present / absent data (both binary)
• Extensions to ordinal outcomes (Likert scales)
• Flexibility in parametric form available
• Semi- and non-parametric approaches too…
6
Binary IRT : The A B C D of it
7
Linear vs non-linear regression of response
probability on latent variable
y-axis
prob
of
response
(“Yes”)
on a
Adapted
without permission
from a
slide by
Prof H Goldstein
simple
binary
(Yes/No)
scale
item
x-axis
score on latent construct being
measured
8
Ordinal IRT : The A B C D of GRM
9
IRT models
• Simplest case of a latent trait analysis…
– Manifest variables are binary: only 2 distinctions are made
• these take 0/1 values
– Yes / No
– Right / Wrong
– Symptom present / absent
• Agree / disagree distinctions for attitudes more likely to be ordinal [>2
response categories] .. see next lecture IRT 2 on Friday
• For scoring of individuals
– (not parameter estimation for items)
• it is frequently assumed that the UNOBSERVED (latent) variable
< the latent factor / trait>
• is not only continuous but normally distributed
– [or the prior dist’n is normal but the posterior dist’n may not be]
10
IRT for binary data
The most commonly used model was developed by
Lord-Birnbaum model (Lord, 1952; Birnbaum, )
2-parameter logistic
[a.k.a. the logit-probit model; Bartholomew (1987)]
• The model is essentially a non-linear single factor model
– When applied to binary data, the traditional linear factor model is only an
approximation to the appropriate item response model
• sometimes satisfactory, but sometimes very poor (we can guess when)
• Some accounts of Item Response Theory make it sound like a
revolutionary & very modern development
• this is not true!
– It should not replace or displace classical concepts, and has suffered from
being presented and taught as disconnected from these
– A unified treatment can be given that builds one from the other (McDonald,
1999) but this would be a one term course on its own
11
What IRT does
IRT models provide a clear statement [picture!]
of the performance of each item in the scale/test
and
how the scale/test functions, overall,
for measuring the construct of interest
in the study population
The objective is to model each item by estimating the
properties describing item performance characteristics
hence Item Characteristic Curve
or Symptom Response Function.
12
Very bland (but simple) example
• Lombard and Doering (1947) data
• Questions on cancer knowledge with four addressing
the source of the information
• Fitting a latent variable model might be proposed as
a way of constructing a measure of how well
informed an individual is about cancer
• A second stage might relate knowledge about cancer
to knowledge about other diseases or general
knowlege
13
Very bland (but simple) example
• Lombard and Doering (1947) data
• Questions on cancer knowledge with four
addressing the source of the information
– radio
– newspapers
– (solid) reading (books?)
– lectures
• 2 to the power 4 i.e. 16 possible response
patterns from 0000 to 1111
14
Data
• Lombard and Doering (1947)
data
• 2 to the power 4
– i.e. 16 possible response
patterns (all occur)
– with more items this is neither
likely nor necessary
– frequency shown for
• 0000 to 1111
• frequency is the number with
each item response pattern
0000
1000
0001
0010
1001
1010
0011
1011
0100
1100
0101
0110
1101
1110
0111
1111
n
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31
15
logit {πhi} = αh 0 + αh 1zi
αh0
αh1
α10
α21
αh0 α40
The data:
0000
1000
0001
0010
1001
1010
0011
1011
0100
1100
0101
0110
1101
1110
0111
1111
n
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31
Sources of knowledge : q1 radio
q2 newspapers q3 reading
q4 lectures
16
A single latent dimension Z Normal (mean 0; std dev =1 ) so Var= 1 too!
Basic objectives of modelling
• When multiple items are applied in a test / survey can use
latent variable modelling to
– explore inter-relationships among observed responses
– determine whether the inter-relationships can be explained by a small number
of factors
– THEN , to assign a SCORE to each individual each on the basis
of their responses
– Basically to rank order (arrange) or quantify (score) survey participants, test
takers, individuals who have been studied
» CAN BE THOUGHT OF AS ADDING A NEW SCORE TO YOUR DATASET FOR
EACH INDIVIDUAL
• this analysis will also help you to understand the properties of each
item, as a measure of the target construct (what properties?)
» GRAPHICAL REPRESENTATION IS BEST
17
Item Properties that we are interested in are
captured graphically by so called Item Characteristics
Curves (ICCs)
18
Item/Symptom & Test/Scale INFORMATION
– is useful and necessary to examine score precision (the
accuracy of estimated scores)
– we are interested in this for different individuals
(individuals with different score values)
– by inspecting the amount of information about each
score level, across the score range (range of estimated
scores) we are identifying variations in measurement
precision (reliable of individual’s estimated scores)
– this enables us to make statements about the effective
measurement range of an instrument in an population
19
e.g. Item Characteristics Curves
20
Item information functions
- add them together to get TIF
beware y axis scaling : not all the same
21
Test Information Function
22
Item information functions
- shown alongside their ICCs
0.14
3.0
0.40
0.14
beware y axis scaling : not all the same
23
1 / Sqrt [Information] = s.e.m
Info Sqrt(Info) 1/(sqrt(Info)
1
1.0
1.0
2
1.4
0.7
3
1.7
0.6
4
2.0
0.5
5
2.2
0.4
6
2.4
0.4
7
2.6
0.4
8
2.8
0.4
9
3.0
0.3
10
3.2
0.3
11
3.3
0.3
12
3.5
0.3
24
Standard error of measuremenr is not constant (U-shaped, not symmetrical)
Approximate reliability
• Reliability
= 1 – 1/[Info]
= {1 – 1 / [1 / (s.e.m ^2) }
s.e.m. = standard error of measurement
25
Back to the Data
• Lombard and Doering (1947)
data
• 2 to the power 4
– i.e. 16 possible response
patterns (all occur)
– with more items this is neither
likely nor necessary
– frequency shown for
• 0000 to 1111
• frequency is the number with
each item response pattern
0000
1000
0001
0010
1001
1010
0011
1011
0100
1100
0101
0110
1101
1110
0111
1111
n
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31
What would be the easiest thing to do with these numbers; to score the patterns..?
26
Answer ..
• Simply add them up
0000
1000
0001
0010
1001
1010
0011
1011
0100
1100
0101
0110
1101
1110
0111
1111
What would be the easiest thing to do with these numbers; to score the patterns..?
27
Simple sum scores
(n=1729 new individual values)
0
0
1
0
0
1
1
0
1
0
1
0
0
1
1
0
1
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
0
0
0
0
1
0
1
1
1
0
0
0
1
0
1
1
1
0
0
0
1
0
1
0
1
1
0
0
1
0
1
0
1
1
[n]
Total score
477
0
63
1
12
1
150
1
7
2
32
2
11
2
4
3
231
1
94
2
13
2
378
2
12
3
169
3
45
3
31
4
477 zeros added to data set (new column)
28
Weighted [by discriminating power] scores
0
0
0
0 [n]
0
1
0
0
1
1
0
1
0
1
0
0
1
1
0
1
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
0
0
0
1
0
1
1
1
0
0
0
1
0
1
1
1
0
0
1
0
1
0
1
1
0
0
1
0
1
0
1
1
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31
Total
score
0
1
1
1
2
2
2
3
1
2
2
2
3
3
3
4
Factor
score
-0.98
-0.68
-0.67
-0.46
-0.41
-0.23
-0.22
0.0
0.16
0.42
0.43
0.66
0.72
0.99
1.02
1.41
Component [weighted by
score
0
0.72 =
0.72
3.40
0.77
1.34
1.34
0.77
0.72+ 0.77
0.72 +1.34
1.34+ 0.77
0.72+ 1.34+ 0.77
3.40
0.72+3.40
3.40+ 0.77
3.40+ 1.34
0.72+ 3.40+ 0.77
0.72+ 3.40+1.34
3.40+1.34+ 0.77
0.72+3.40+1.34+0.77
alpha h 1]
0
0.72
0.77
1.34
1.48
2.06
2.10
2.82
3.40
4.12
4.16
4.74
4.88
5.46
5.50
6.22
29
logit {πhi} = αh 0 + αh 1zi
αh0
αh1
α10
α21
αh0 α40
The data:
0000
1000
0001
0010
1001
1010
0011
1011
0100
1100
0101
0110
1101
1110
0111
1111
n
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31
Sources of knowledge : q1 radio
q2 newspapers q3 reading
q4 lectures
30
A single latent dimension Z Normal (mean 0; std dev =1 ) so Var= 1 too!
Weighted [by discriminating power] scores
0
0
0
0 [n]
0
1
0
0
1
1
0
1
0
1
0
0
1
1
0
1
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
0
0
0
1
0
1
1
1
0
0
0
1
0
1
1
1
0
0
1
0
1
0
1
1
0
0
1
0
1
0
1
1
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31
Total
score
0
1
1
1
2
2
2
3
1
2
2
2
3
3
3
4
Factor
score
-0.98
-0.68
-0.67
-0.46
-0.41
-0.23
-0.22
0.0
0.16
0.42
0.43
0.66
0.72
0.99
1.02
1.41
Component [weighted by
score
0
0.72 =
0.72
3.40
0.77
1.34
1.34
0.77
0.72+ 0.77
0.72 +1.34
1.34+ 0.77
0.72+ 1.34+ 0.77
3.40
0.72+3.40
3.40+ 0.77
3.40+ 1.34
0.72+ 3.40+ 0.77
0.72+ 3.40+1.34
3.40+1.34+ 0.77
0.72+3.40+1.34+0.77
alpha h 1]
0
0.72
0.77
1.34
1.48
2.06
2.10
2.82
3.40
4.12
4.16
4.74
4.88
5.46
5.50
6.22
31
Something a little more subtle
• Simple sum scores assumes all item responses
equally useful at defining the construct
– may not be the case
• If items are differentially important
– different discriminating power with respect to what we are
measuring, we might want to take that into accounf
• How? Weighted sum scores [Component scores]
– weighted by what?
» weighted by the estimates (factor loading type parameter) from
a latent variable model
» [latent trait model with a single latent factor]
32
Weighted
scores
Weights
alpha h 1 parameters
Q1
0.72
Q2
3.40
Q3
1.34
Q4
0.77
These numbers related
to the slopes of the33S’s
?????
0.72
3.40
1.34
0.77
Estimated component scores
(weighted values)
0
0
0
0 [n]
0
1
0
0
1
1
0
1
0
1
0
0
1
1
0
1
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
0
0
0
1
0
1
1
1
0
0
0
1
0
1
1
1
0
0
1
0
1
0
1
1
0
0
1
0
1
0
1
1
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31
Total
score
0
1
1
1
2
2
2
3
1
2
2
2
3
3
3
4
Factor
score
-0.98
-0.68
-0.67
-0.46
-0.41
-0.23
-0.22
0.0
0.16
0.42
0.43
0.66
0.72
0.99
1.02
1.41
Component [weighted by
score
0
=
0.72
0.77
1.34
0.72+ 0.77
0.72 +1.34
1.34+ 0.77
0.72+ 1.34+ 0.77
3.40
0.72+3.40
3.40+ 0.77
3.40+ 1.34
0.72+ 3.40+ 0.77
0.72+ 3.40+1.34
3.40+1.34+ 0.77
0.72+3.40+1.34+0.77
alpha h 1]
0
0.72
0.77
1.34
1.48
2.06
2.10
2.82
3.40
4.12
4.16
4.74
4.88
5.46
5.50
6.22
34
But the bees knees are..
• The estimated factor scores from the model
• Not just some simple sum or unweighted or
weighted items
• Takes into account the proposed score distribution
(gaussian normal) and the estimated model
parameters (but not the fact that they are estimates
rather than known values) and more besides (when
missing data are present)
… the estimated factor scores
35
A graphical and interactive
introduction to IRT
• Play with the key features of IRT models
• www2.uni-jena.de/svw/metheval/irt/VisualIRT.pdf
36
a b (see) [2 parameter IRT model]
• VisualIRT (pdf)
– Page
• VisualIRT (pdf)
– Page
Individual’s score = new ruler value
Any hypothetical latent variable [factor/trait] continuum
expressed in a z-score metric (gaussian normal (0,1)
Item properties
slope = item discrimination
location = item commonality [difficulty/prevalance/ severit
37
IRT Resources
• A visual guide to Item Response Theory
– I. Partchev
• Introduction to RIT,
– R.Baker
• http
//ericae.net/irt/baker/toc.htm
• An introduction to modern measurement theory
– B Reeve
• Chapter in Fayers and Machin QoL book
– P Fayers
• ABC of Item Response Theory
– H Goldstein
• Moustaki papers, and online slides (FA at 100)
• LSE books (Bartholomew, Knott, Moustaki, Steele)
38
Item Response Theory Books
Applications of Item Response Theory to Practical Testing Problems Frederick M. Lord. 274 pages. 1980.
Applying The Rasch Model Trevor G. Bond and Christine M. Fox 255 pages. 2001.
Constructing Measures: An Item Response Modeling Approach
Mark Wilson. 248 pages. 2005.
The EM Algorithm and Related Statistical Models
Michiko Watanabe and Kazunori Yamaguchi. 250 pages. 2004.
Essays on Item Response Theory Edited by Anne Boomsma, Marijtje A.J. van Duijn, Tom A.A. Snijders. 438 pages.
2001.
Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach
Edited by Paul De Boeck and
Mark Wilson. 382 pages. 2004.
Fundamentals of Item Response Theory Ronald K. Hambleton, H. Swaminathan, and H. Jane Rogers. 184 pages. 1991.
Handbook of Modern Item Response Theory Edited by Wim J. van der Linden and Ronald K. Hambleton. 510 pages.
1997.
Introduction to Nonparametric Item Response Theory Klaas Sijtsma and Ivo W. Molenaar. 168 pages. 2002.
Item Response Theory Mathilda Du Toit. 906 pages. 2003.
Item Response Theory for Psychologists Susan E. Embretson and Steven P. Reise. 376 pages. 2000.
Item Response Theory: Parameter Estimation Techniques (Second Edition, Revised and Expanded w/CD) Frank Baker
and Seock-Ho Kim. 495 pages. 2004.
Item Response Theory: Principles and Applications Ronald K. Hambleton and Hariharan Swaminathan. 332 pages. 1984.
Logit and Probit: Ordered and Multinomial Models Vani K. Borooah. 96 pages. 2002.
Markov Chain Monte Carlo in Practice
W.R. Gilks, Sylvia Richardson, and D.J. Spiegelhalter. 512 pages. 1995.
Monte Carlo Statistical Methods
Christian P. Robert and George Casella. 645 pages. 2004.
Polytomous Item Response Theory Models
Remo Ostini and Michael L. Nering. 120 pages. 2005.
Rasch Models for Measurement David Andrich. 96 pages. 1988.
Rasch Models: Foundations, Recent Developments, and Applications Edited by Gerhard H. Fischer and Ivo W.
Molenaar. 436 pages. 1995.
The Sage Handbook of Quantitative Methodology for the Social Sciences Edited by David Kaplan. 511 pages. 2004.
Test Equating, Scaling, and Linking: Methods and Practices (Second Edition) Michael J. Kolen and Robert L. Brennan.
548 pages. 2004.
39
Download