Posterior predictive model checking
(PPMC) using the Lord and Wingersky
recursive algorithm
Dr Chris Wheadon, UK Rasch Day,
January 2011, Durham.
Version 2.0
Copyright © AQA and its licensors. All rights reserved.
The Purpose of Model Fit
Perfection is never obtained in empirical data. What we really
want to test is the hypothesis "Do the data fit the model
usefully?“ (Linacre, 2008, p.402)
• Fit can be analysed at the item level or at the test level
• Why does Winsteps under-report fit at the test level?
Version 2.0
Copyright © AQA and its licensors. All rights reserved.
Global fit?
“Why am I repulsed by global fit tests - the neutron bombs of
statistical practice? Because misfit is never global and never a
statistical event, it is always local and idiosyncratic.”
de Jong J, Linacre JM. (1993)
Version 2.0
Copyright © AQA and its licensors. All rights reserved.
Fit at the test level
• However, predictions are often made at the test level from
distributions of true scores
• Typically we equate true scores not observed scores
• Classification accuracy and consistency indices compare true
scores with cumulative probability distributions derived from
those true scores
• So, there is a need to understand how well the observed
summed score distribution is modelled by the Rasch model
Version 2.0
Copyright © AQA and its licensors. All rights reserved.
The R10 test
The statistic is based on evaluating the difference between the
observed and expected score distribution given estimates of
item and population parameters.
Outcome of R0-statistic:
1306.6440, df = 364, prob(R0) = .0000
Version 2.0
Copyright © AQA and its licensors. All rights reserved.
Posterior Predictive Model Checking
• A popular Bayesian diagnostic tool
• Compares observed data with replicated data
• Lends itself to graphical display
• BUT, yields large simulated datasets which can be intensive to
process (1,000 simulations of 1,000 candidates on 21 items
yields 21 million responses )
Version 2.0
Copyright © AQA and its licensors. All rights reserved.
PPMC checking
•We are interested in evaluating the difference between the
observed and expected score distribution given estimates of
item and population parameters.
•The Wingersky Lord recursive algorithm determines the
conditional distribution of number correct scores from the
probability of incorrect responses to every item for any given
level of ability.
•Posterior draws of the model parameters allow the likely
distribution of the conditional distributions to be modelled
Version 2.0
Copyright © AQA and its licensors. All rights reserved.
Estimating the number correct frequency
distribution
r
x
fr(x)
1
0
f1(0)
= (1-p1)
1
f1(1)
= p1
0
f2(0)
=f1(0)(1 – p2)
=.74(1 - .27 )
=.5402
1
f2(1)
=f1(0)(1 – p2) + f1(0)p2
=.26 (1 - .27)
+ .74(.27)
=.3896
2
f2(2)
f1(1)p2
=.26(.27)
=.0702
2
Version 2.0
Using theta
= -2
= (1-.26)
= .74
= .26
Copyright © AQA and its licensors. All rights reserved.
The Lord and Wingersky recursive
algorithm
Define
as the probability of earning a score in the first
category of item 1,
as the probability of earning a score in the
second category of item 1, and so forth up to the last category of item 1. Then for
,
the recursion formula for finding the probability of earning score after the -th item
added is,
for x between min r and max r, where
and
scores after adding the -th item. Note that when
, then
Version 2.0
are the minimum and maximum
, by definition.
Copyright © AQA and its licensors. All rights reserved.
A solution using Winbugs, R and C++
Winbugs
http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml
IRT Models
Curtis, S.M. (2010), BUGS Code for Item Response Theory, Journal
of Statistical Software, Code Snippets, 36(1), 1-34
R, R2Winbugs, inline
Rtools: building packages for R under Microsoft Windows
http://www.murdoch-sutherland.com/Rtools/
R scripts:
https://github.com/cbwheadon/predicted_scores
Version 2.0
Copyright © AQA and its licensors. All rights reserved.
Run the model…
Version 2.0
Copyright © AQA and its licensors. All rights reserved.
Expected summed score distribution within 5 and 95
per cent of posterior draws of parameters
0.06
0.05
frequency
0.04
model
l.limit
0.03
observed
u.limit
0.02
0.01
0.00
0
10
20
30
40
50
score
Version 2.0
Copyright © AQA and its licensors. All rights reserved.
Item x test correlations
0.8
Item Test Correlation
0.7
type
0.6
simulated
observed
0.5
0.4
2
4
6
8
10
Item
Version 2.0
Copyright © AQA and its licensors. All rights reserved.
Inter-item correlations
10
8
item
ppp
<0.05
6
0.05<ppp<0.95
>0.95
4
2
2
4
6
8
10
12
item
Version 2.0
Copyright © AQA and its licensors. All rights reserved.
References
Curtis, S.M. (2010), BUGS Code for Item Response
Theory, Journal of Statistical Software, Code Snippets,
36(1), 1-34
Linacre, J. M. (2008). A user's guide to WINSTEPS®
MINISTEP: Rasch-Model Computer Programs (Program
Manual 3.66.0.)
de Jong J, Linacre JM. (1993) Rasch Measurement
Transactions, 7:2 p.296-7,
http://www.rasch.org/rmt/rmt72n.htm
Version 2.0
Copyright © AQA and its licensors. All rights reserved.
Copyright © 2010 AQA and its licensors. All rights reserved.
The Assessment and Qualifications Alliance (AQA) is a company limited by guarantee registered in England and Wales (company number 3644723)
and a registered charity (registered charity number 1073334). Registered address: AQA, Devas Street, Manchester M15 6EX.