Rescuing the 2PL model

advertisement
Cognitive psychology meets
psychometric theory:
q≥0
On the relation between process models for
decision making and latent variable models for
individual differences
Han van der Maas
Gunter Maris, Denny Borsboom
University of Amsterdam, Cito
Twente, 2009
1

2PL model in item response theory
 j (q k   j )
e
Pjk ( | qk ) 
 (q   )
1 e j k j
• Main model for educational and psychological
measurement in Psychometrics
• Justifications
– Logistic equation does the job (Toolbox)
– Derivation from ‘desirable’ statistical properties (Fischer)
– Derivation from ‘desirable’ measurement properties
(Roskam)
– Derivation from psychological process model (Tuerlinckx & De
Boeck)
2
Diffusion model
• Stochastic accumulation
of evidence stops when
decision threshold is reached
• Implements SPRT (optimal
accuracy given RT)
• Very influential and very
well studied model
• More complex biological
realistic models reduce to
this model
• Explains the speed accuracy trade-off
3
Relation to IRT
• Two main parameters:
– Boundary separation (decision criterion): a
– Drift rate (rate of evidence accumulation): v
eav 1
e av
P(X  1)  2av

e 1 1 e av


a 1 eav
a
(DT) 

2v 1 eav 2v
RT
a
z
X=1
v
t
if z=a/2
0
X=0
if P(X=1) is large
Tuerlinckx & De Boeck (2005):
Drift = ability – difficulty (v = q – )
Discrimination = boundary separation (a = )
e (q   )
P(X  1) 
1 e (q   )
(DT) 

2(q   )
4
Three qualitative predictions
1. Subjects are slowest when v=0, that is when q=
–
When q<<,
responding is very fast
2. For negative v, i.e.
θ<, allowing more
time to think reduces
the probability correct.
3. By reducing time to think
P(+) -> .5, irrespective of θ
Long time limit
Short time limit
Special psychological meaning of q = 
5
Attitudes: “I stick to my decisions” (agree/not
agree)
1. RT decreases with |q-|
– OK: extreme answers are fast
2. if q< and time limit increases
P(agree)0
– OK: the longer I think about: I don’t stick to my
decisions
3. If time to think (a) 0, P(agree) 1/2
– OK: I just say something
Ability: “24+79” (93/114/103/130 )
1. RT decreases with |q-|
– NO, best subjects are fastest
2. if q<and time limit increases
P(agree)0
– NO, they guess because they know that they
don’t know
3. If time to think (a) 0, P(+) 1/2
– NO: P(+) 0 for open questions and to 1/M for
M multiple choice questions
Thus
• For two choice attitude/personality tests the
diffusion IRT model works well!
• But not for multiple choice ability tests:
1. Problem of Multiple choice
•
Diffusion model is two choice model, IRT (2PL) is
dichotomously scored not dichotomous choice!
Guessing does normally not give P(+)=1/2
2. Problem of Guessing
•
In ability testing subjects often guess and don’t score
below chance level
8
Solving the first problem: Multiple
choice
9
Correction for MC
in 2PL
• Assuming equal attractiveness of
incorrect alternatives in Bock’s nominal
response model
*
Pm (q ) 
*
e bm a mq
M
e
b k* a *kq
k
• gives:

e (q   )ln(M 1)
P( | q ) 
1 e (q   )ln(M 1)
e  ln(M 1)
1
P( | q   ) 

 ln(M 1)
1 e
M
10
Solving the second problem: guessing
• Subjects don’t score below chance level
– Typical solution: 3PL
• We propose a restricted 2PL model
• Main idea: Ability can be zero but can not be
negative, so v≥0
– subjects do not score below chance level
– increasing time limit will not decrease P+
11
Ability ≥ 0
• In IRT bipolar traits such as attitudes and unipolar
traits such as abilities are not distinguished
• But ability is very special (think of walking)
– No negative ability (no negative difficulty)
– Real zero point (no ability)
– With sufficient time and ability>0 any item can be
solved
• We need a diffusion IRT model with positive v
12
Idea: a and v both have an
item and a person part
drift v
ability person vp (>0)
difficulty item vi (>0)
requirement
v=f(vp, vi)>0
one sensible solution
v=vp/vi
boundary separation a
response caution ap (>0)
time pressure ai (>0)
requirement
a=g(ap, ai)>0
one sensible solution
a=ap/ai
13
Quotient diffusion 2PLM (Q-diffusion model)
Response caution
Ability
a kp v kp
Time pressure
Pjk ( | q k ) 
e
a ij v ij
ln(M j 1)
a kp v kp
1 e
a ij v ij
Difficulty
ln(M j 1)
All a and v are positive
14
so if
qk  akp v kp
 j  1/a v
i
j
 jq k   j
i
j
 j  ln( M j 1)
e
 Pjk ( | qk ) 
 jq k   j
1 e

qk  akp v kp
 j  1/a ij v ij
 (q   )
e j k j
 Pjk ( | qk ) 
 (q   )
1 e j k j
 j  ln( M j 1)a ij v ij
15
Simple case
qk  a v
⇒ Ignore a,v distinction (RT’s required)
 j  1/a i v ij
⇒ equal time limits for all item in test
p p
k k
Ability > 0
Correction for
multiple choice
Easiness > 0
 jq k ln(M j 1)
e
Pjk ( | qk ) 
 jq k ln(M j 1)
1 e
16
ICC for dichotomous items
17
ICC for MC (4 choices)
18
ICC for MC (10 choices)
19
ICC ‘open’ questions (many choices)
20
Advantages
• Process model for attitude and ability traits
• ‘Mechanistic’ interpretations of IRT parameters
– q is product of drift rate and boundary separation
• Meaningful zero point (ratio properties)
• Speed accuracy trade-off incorporated
–  is easiness parameter, reciprocal of product of item drift
rate and time pressure
–  is intercept parameter, guessing probability
• Model of accuracy, but also model of RT
• Guessing explained by restricting the 2PL
• Simple extension to MC
21
Relation to other IRT models
•
•
•
•
•
Ramsay’s Q model
Person fit
van der Linden’s IRT RT model
2PL and multidimensional IRT
Guessing models
22

Ramsay’s quotient model
• Ramsay (1989) investigates simple models for Skj
– Difference model (q): Rasch model
– Quotient model (q/)
q /
•
S
e kj
Pjk ( | qk ) 
S
1 e kj
ek j
Pjk ( | qk ) 
q k /  j ,q k  0,  j  0
Ke
 Rasch model with guessing
Fits better than Rasch model and
in 4 examples
• if q = apvp,  = aivi and K=M-1 the Q-model and the Q-diffusion
model are equivalent
ce
• Note: model in Cressie & Holland (’83)
P ( | q ) 
(1 c)  ce
also equivalent:
e
jk
k
q *k   *j
e
q *k   *j
23
Person fit
Q-diffusion for ability
a kp v kp
Pjk ( | q k ) 
e
a ij v ij
a kp
ln(M j 1)
a kp v kp
1 e
D-diffusion for attitude
a ij v ij
ln(M j 1)
Pjk ( | q k ) 
e
a ij
(v kp v ij )ln(M j 1)
a kp
1 e
a ij
(v kp v ij )ln(M j 1)
a ij is item discrimination (time pressure)
a kp is person discrimination (response caution)



Person discrimination can be estimated by varying
time pressure over items and by using RT data
24
IRT model for RT
• General model: van der Linden’s Hierarchical model
• Fundamental equation for RT modeling:
 *j
E(DT)  *  E(ln( DT )   j   k
k

25
Translation of the item and person parameters of van
der Linden’s model to diffusion model parameters
akp
v ij
i
i
*
v ij
a 1 aj 1 aj  j
v kp
E(DT) 


 * ;E(ln( DT)  ln( 2)  ln i  ln p   j   k
p
p
2v 2 v k
2 vk
k
aj
ak
i
p
vj
ak
v ij
 j  ln i
aj
v kp
 k  ln p
ak
qk  akp v kp
 j  1/a ij v ij
 j  ln( M j 1)
• If speed and ability parameters at the second level in
van der Linden’s model are positively correlated,
then individual differences are primarily due to
differences in drift rate (for an example, see van der
Linden, 2007).
• If these parameters correlate negatively, the
individual differences in drift rate are probably
similar across subjects, and differences are mainly
due to differences in response caution (see example
2 of Klein Entink, Fox, & van der Linden, 2009).
26
2PL
• the Q-model and Q-diffusion model are restricted 2PL models
• In the Q-diffusion model:
– In the end all items will be passed,
for all items, P(+|q>0,ai∞)=1, e.g. one
Guttman item
• If not (because the item also requires a jump) it measures another
additional ability. The test is not unidimensional!
– So any ability test with some items that I can solve and others that I will
never solve (P+ does not increase with longer time limit), tests for more
than one ability
– requires Conjunctive Multidimensional
Q-diffusion model


+
x
q
27
Guessing
• Two process models
– 3PL (p and g process)
– 1PL-AG (p and ability dependent g process)
• San Martin, del Pino, and De Boeck
– DINA
• One state models
– Difficulty dependent guessing
• Hessen 2005, d model
– Q-diffusion model: one state ability based guessing
28
Fitting
• Ramsay fit the QM, DM and DM-G (Rasch model
with guessing) to four datasets and found that
QM fitted best in all cases
• For the full diffusion model several programs are
available
– Fast-DM (Voss & Voss, 2007)
– DMAT (Vandekerckhove & Tuerlinckx, 2007; Voss & Voss, 2007)
• Hierarchical IRT RT model of van der Linden
– CIRT: Fox, J. P., Klein Entink, R. H., & van der Linden, W. J. (2007).
29
discussion
• Diffusion model is simplistic model of simple
perceptual discriminations. It is too simple for
complex processes involved in typical ability
items in IRT
– But 2PL is simplistic too
– Better a simple model than no model at all
• We don’t need a process model
– Then IRT is curve fitting without explanatory value
• 3PL will fit better
– 3PL is difficult to fit and uses many more parameters
then the Q-diffusion IRT model
– The point of the Q-diffusion IRT is not goodness of fit
but theoretical clarity
30
thanks
31
Note: Capacities versus abilities
• For Lumsden’s sticks for measuring
length the Guttman model is OK
• This also applies to STM
– Digit span items
32
QM versus DM-G
33
ICC’s of new guessing model
34
Download