Jan Štochl, Ph.D. Department of Psychiatry University of Cambridge Email: js883@cam.ac.uk Comparison of maximum likelihood and bayesian estimation of Rasch model: What we gain by using bayesian approach? Comparison of results from General health questionnaire Content of the presentation Brief introduction to the concept of bayesian statistics Using R and Winbugs for estimation of bayesian Rasch model Analysis and comparison of both methodologies in General health questionnaire General ideas and introduction to bayesian statistics A bit of theory…… What is Bayesian statistics? • It is an alternative to the classical statistical inference (classical statisticians are called „frequentist“) • Bayesians view the probability as a statement of uncertainty. In other words, probability can be defined as the degree to which a person (or community) believes that a proposition is true. • This uncertainty is subjective (differs across researchers) Bayesians versus frequentists • A frequentist is a person whose long-run ambition is to be wrong 5% of the time • A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule Bayes theorem and modeling • Our situation – fit the model to the observed data • Models give the probability of obtaining the data, given some parameters: P ( X θ) • This is called the likelihood • • We want to use this to learn about the parameters Inference • We observe some data, X, and want to make inferences about the parameters from the data – i.e. find out about P(θ|X) • We have a model, which gives us the likelihood P(X|θ) • independenceWe need to use P(X|θ) to find P(θ|X) – i.e. to invert the probability Bayes theorem • Published in 1763 • Allows to go from P(X|θ) to P(θ|X) Prior distribution of parameters P (θ X ) = P (θ) P ( X θ) P( X ) Posterior distribution It´s a constant! P ( X ) = P ( X θ)P (θ) d θ Bayes theorem and adding more data • Suppose we observe some data, X1, and get a posterior distribution: P ( X 1 ) P ( ) P ( X 1 ) • What if we later observe more data, X2? If this is independent of X1, then P ( X 1 X 2 ) P ( X 1 ) P ( X 2 ) so that P ( X 1 , X 2 ) P ( ) P ( X 1 ) P ( X 2 ) i.e. the first posterior is used as the prior to get the second posterior Features of Bayesian approach • Flexibility to incorporate your expert opinion on the parameters • Although this concept is easy to understand, it is not easy to compute. Fortunately, MCMC methods have been developed • Finding prior distribution can be difficult • Misspecification of priors can be dangerous • The less data you have the higher is the influence of priors • The more informative are priors the more they influence the final estimates When to use Bayesian approach? • When the sample size is small • When the researcher has knowledge about the parameter values (e.g. from previous research) • When there are lots of missing data • When some respondents have too few responses to estimate their ability • Can be useful for test equating • Item banking Openbugs • Can handle many types of data (including polytomous) • Can handle many types of models (SEM, IRT, Multilevel……) • Possibility to use syntax language or special graphical interface to introduce the model (doodles) • Provides standard errors of the estimates • Provides fit statistics (bayesian ones) • Can be remotely used from R (packages „R2Winbugs“, „R2Openbugs“, „Brugs“, „Rbugs“…) • Results from Openbugs can be exported to R and further analyzed (packages „coda“, „boa“) Practical comparison of maximum likelihood and bayesian estimation of Rasch model General Health Questionnaire, items 1-7 General Health Questionnaire (GHQ) • 28 items, scored dichotomously (0 and 1), 4 unidimensional subscales (7 items each) • Only one subscale is analyzed (items 1-7) • Rasch model is used, maximum likelihood estimates are obtained in R (package „ltm“), bayesian estimates in Openbugs (and analyzed in R) • 2 runs in Openbugs : • - first one with vague (uninformative) priors for difficulty parameters (normal distibution with mean=0 and sd=10) - second one with mix of informative and uninformative priors for difficulty parameters (to demonstrate the influence of priors) Item fit of Rasch (1PLM) model and Mokken model item Difficulty Discrimination Chi-square p-value GHQ15 1.72 3.57 30.02 <0.0001 GHQ16 1.23 3.57 47.35 <0.0001 GHQ17 1.26 3.57 104.02 <0.0001 GHQ18 1.37 3.57 50.11 <0.0001 GHQ19 1.82 3.57 13.50 0.02 GHQ20 1.51 3.57 18.47 0.00 GHQ21 1.95 3.57 6.00 0.31 Item GHQ15 GHQ16 GHQ17 GHQ18 GHQ19 GHQ20 GHQ21 # vi #vi ItemH monotonicity # z intersection 0.57 0 0 0 0.63 0 0 0 0.55 0 0 0 0.55 0 0 0 0.59 0 0 0 0.66 0 0 0 0.67 0 0 0 #t 0 0 0 0 0 0 0 Snapshots Trace of delta[1] 0 2 2.0 2.4 Density of delta[1] 4000 5000 6000 7000 8000 9000 2.0 2.2 2.4 2.6 Iterations N = 7000 Bandw idth = 0.01576 Trace of delta[2] Density of delta[2] 2.8 0 2 2.0 2.4 3000 4000 5000 6000 7000 8000 9000 1.8 2.0 2.2 2.4 2.6 Iterations N = 7000 Bandw idth = 0.01535 Trace of delta[3] Density of delta[3] 0 2.0 2 2.6 3000 4000 5000 6000 7000 8000 9000 1.8 2.0 2.2 2.4 2.6 Iterations N = 7000 Bandw idth = 0.0156 Trace of delta[4] Density of delta[4] 2.6 3.2 0.0 1.5 3.0 3000 3000 4000 5000 6000 7000 Iterations 8000 9000 2.4 2.6 2.8 3.0 3.2 N = 7000 Bandw idth = 0.01883 3.4 Snapshots Recovery of difficulty parameters Maximum likelihood standard Item difficulty error GHQ01 2.367 0.1097 Bayesian - vague priors prior prior standard mean SD difficulty error 0 10 2.369 0.1102 Bayesian - informative priors prior prior standard mean SD difficulty error 2.0 3.16 2.325 0.1088 GHQ02 2.293 0.1076 0 10 2.295 0.1071 1.2 1.00 2.239 0.1070 GHQ03 2.324 0.1085 0 10 2.327 0.1087 1.8 10.00 2.283 0.1077 GHQ04 2.958 0.1304 0 10 2.962 0.1307 0.0 10.00 2.914 0.1300 GHQ05 4.108 0.1970 0 10 4.120 0.1976 0.0 0.32 3.220 0.1306 GHQ06 4.108 0.1970 0 10 4.122 0.1988 3.0 1.00 4.027 0.1889 GHQ07 3.813 0.1757 0 10 3.820 0.1766 0.0 31.62 3.770 0.1746 Further reading and software General literature on bayesian IRT analysis • Congdon, P (2006). Bayesian Statistical Modelling, 2nd edition. Wiley. • Congdon, P. (2005). Bayesian Methods for Categorical Data, Wiley. • Congdon, P. (2003). Applied Bayesian Modelling, Wiley. • Winbugs User Manual (available online) from • http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/manual14.pdf • Winbugs discussion archive http://www.jiscmail.ac.uk/lists/bugs.html • Lee, S.Y. (2007). Structural Equation Modelling: A Bayesian Approach, Wiley. • Iversen, G. R. (1984). Bayesian Statistical Inference: Sage. Available software •Winbugs, Openbugs, Jags (freely available) • R (freely available) - package „mokken“ •Mplus (commercial)