Posterior predictive model checking (PPMC) using the Lord and Wingersky recursive algorithm Dr Chris Wheadon, UK Rasch Day, January 2011, Durham. Version 2.0 Copyright © AQA and its licensors. All rights reserved. The Purpose of Model Fit Perfection is never obtained in empirical data. What we really want to test is the hypothesis "Do the data fit the model usefully?“ (Linacre, 2008, p.402) • Fit can be analysed at the item level or at the test level • Why does Winsteps under-report fit at the test level? Version 2.0 Copyright © AQA and its licensors. All rights reserved. Global fit? “Why am I repulsed by global fit tests - the neutron bombs of statistical practice? Because misfit is never global and never a statistical event, it is always local and idiosyncratic.” de Jong J, Linacre JM. (1993) Version 2.0 Copyright © AQA and its licensors. All rights reserved. Fit at the test level • However, predictions are often made at the test level from distributions of true scores • Typically we equate true scores not observed scores • Classification accuracy and consistency indices compare true scores with cumulative probability distributions derived from those true scores • So, there is a need to understand how well the observed summed score distribution is modelled by the Rasch model Version 2.0 Copyright © AQA and its licensors. All rights reserved. The R10 test The statistic is based on evaluating the difference between the observed and expected score distribution given estimates of item and population parameters. Outcome of R0-statistic: 1306.6440, df = 364, prob(R0) = .0000 Version 2.0 Copyright © AQA and its licensors. All rights reserved. Posterior Predictive Model Checking • A popular Bayesian diagnostic tool • Compares observed data with replicated data • Lends itself to graphical display • BUT, yields large simulated datasets which can be intensive to process (1,000 simulations of 1,000 candidates on 21 items yields 21 million responses ) Version 2.0 Copyright © AQA and its licensors. All rights reserved. PPMC checking •We are interested in evaluating the difference between the observed and expected score distribution given estimates of item and population parameters. •The Wingersky Lord recursive algorithm determines the conditional distribution of number correct scores from the probability of incorrect responses to every item for any given level of ability. •Posterior draws of the model parameters allow the likely distribution of the conditional distributions to be modelled Version 2.0 Copyright © AQA and its licensors. All rights reserved. Estimating the number correct frequency distribution r x fr(x) 1 0 f1(0) = (1-p1) 1 f1(1) = p1 0 f2(0) =f1(0)(1 – p2) =.74(1 - .27 ) =.5402 1 f2(1) =f1(0)(1 – p2) + f1(0)p2 =.26 (1 - .27) + .74(.27) =.3896 2 f2(2) f1(1)p2 =.26(.27) =.0702 2 Version 2.0 Using theta = -2 = (1-.26) = .74 = .26 Copyright © AQA and its licensors. All rights reserved. The Lord and Wingersky recursive algorithm Define as the probability of earning a score in the first category of item 1, as the probability of earning a score in the second category of item 1, and so forth up to the last category of item 1. Then for , the recursion formula for finding the probability of earning score after the -th item added is, for x between min r and max r, where and scores after adding the -th item. Note that when , then Version 2.0 are the minimum and maximum , by definition. Copyright © AQA and its licensors. All rights reserved. A solution using Winbugs, R and C++ Winbugs http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml IRT Models Curtis, S.M. (2010), BUGS Code for Item Response Theory, Journal of Statistical Software, Code Snippets, 36(1), 1-34 R, R2Winbugs, inline Rtools: building packages for R under Microsoft Windows http://www.murdoch-sutherland.com/Rtools/ R scripts: https://github.com/cbwheadon/predicted_scores Version 2.0 Copyright © AQA and its licensors. All rights reserved. Run the model… Version 2.0 Copyright © AQA and its licensors. All rights reserved. Expected summed score distribution within 5 and 95 per cent of posterior draws of parameters 0.06 0.05 frequency 0.04 model l.limit 0.03 observed u.limit 0.02 0.01 0.00 0 10 20 30 40 50 score Version 2.0 Copyright © AQA and its licensors. All rights reserved. Item x test correlations 0.8 Item Test Correlation 0.7 type 0.6 simulated observed 0.5 0.4 2 4 6 8 10 Item Version 2.0 Copyright © AQA and its licensors. All rights reserved. Inter-item correlations 10 8 item ppp <0.05 6 0.05<ppp<0.95 >0.95 4 2 2 4 6 8 10 12 item Version 2.0 Copyright © AQA and its licensors. All rights reserved. References Curtis, S.M. (2010), BUGS Code for Item Response Theory, Journal of Statistical Software, Code Snippets, 36(1), 1-34 Linacre, J. M. (2008). A user's guide to WINSTEPS® MINISTEP: Rasch-Model Computer Programs (Program Manual 3.66.0.) de Jong J, Linacre JM. (1993) Rasch Measurement Transactions, 7:2 p.296-7, http://www.rasch.org/rmt/rmt72n.htm Version 2.0 Copyright © AQA and its licensors. All rights reserved. Copyright © 2010 AQA and its licensors. All rights reserved. The Assessment and Qualifications Alliance (AQA) is a company limited by guarantee registered in England and Wales (company number 3644723) and a registered charity (registered charity number 1073334). Registered address: AQA, Devas Street, Manchester M15 6EX.