Lecture 15 Generalised linear mixed models in WinBUGS Lecture Contents • • • • • • Differences in WinBUGS methods Rejection sampling algorithm Adaptive rejection (AR) sampling Reunion Island dataset Hierarchical centering Probit formulation Logistic regression model (recap) • A standard Bayesian logistic regression model (e.g. for the rat tumour example) can be written as follows: yi ~ Binomial (ni , pi ) logit ( pi ) X 0 ~ N (0, m0 ), 1 ~ N (0, m1 ) • Both MLwiN and WinBUGS can fit this model but can we write out the conditional posterior distributions and use Gibbs Sampling? Conditional distribution for β0 p( 0 | y, 1 ) p( 0 ) p( y | 0 , 1 ) exp( 0 1 xi ) 1 exp( ) 2m0 i 1 exp( 0 1 xi ) m0 2 0 yi 1 1 exp( 0 1 xi ) ni yi p( 0 | y, 1 ) ~ ? This distribution is not a standard distribution and so we cannot simply simulate from a standard random number generator. However both WinBUGS and MLwiN can fit this model using MCMC. We will in this lecture describe how WinBUGS deals with this problem. Why not use Gibbs? To use Gibbs we would need to generate samples from the distribution p( 0 | y, 1 ). Although this is not a standard distribution and so there is no easily available function to generate directly from this distribution other methods exist. The method that WinBUGS uses is called adaptive rejection (AR) sampling and can be used to sample from any log concave distribution function. Before going on to AR sampling we will consider rejection sampling in general. Rejection Sampling Assume we wish to generate from a distribution function f(x) but it is difficult to generate from this function directly. However there exists a distribution function g(x) that is easy to sample from and f(x) ≤ M g(x) for all x. Note that this is often easier for distributions with a bounded range but can also be used for unbounded distributions. Then we sample from g(x) and accept the sampled value with probability Mg(x)/f(x). If we reject we sample again until we accept a value. Note the algorithm works best if g(x) is close to f(x) and M is small. Truncated Normal example Let us assume that we wish to generate random variables from a truncated standard normal truncated at -2 and +2 The standard method would be to generate from a Normal(0,1) and reject values outside the range (-2,2) An alternative would be to generate from a (scaled) Uniform distribution on (2,-2) as shown. The probability of acceptance is around 0.95*sqrt(2π)/4=0.595 5000 random draws results in acceptance rate of .606 Adapted Rejection Sampling AR sampling (Gilks and Wild 1992) takes rejection sampling a step further by clever construction of the function g(x) that is used. The algorithm works for all log concave functions f(x). A few points in f(x) are chosen and the tangent to log f(x) is constructed at each point. These tangents are joined to form a piecewise linear function g(x) that is an envelope for log f(x). When a point is chosen from g(x) it is accepted/rejected using rejection sampling. Also however the tangent at the point is evaluated and the envelope function is updated to be a closer approximation to log f(x). An illustration of AR sampling WinBUGS 1.4 New Method • In fact for the models we consider WinBUGS 1.4 no longer uses AR sampling for fixed effects. • Instead it uses a method by Gamerman(1997). This is essentially a Metropolis-Hastings algorithm where at each iteration the MV normal proposal distribution is formed by performing one iteration, starting at the current point, of Iterative Weighted Least Squares (IWLS). A routine that is similar to the IGLS estimation method used in MLwiN for MQL/PQL estimates. • To see the AR sampler in action you will need to use WinBUGS 1.3 or go to blocking options under the Options/Blocking options screen and remove the tick box. (You can try this in the practical if you wish). Reunion Island 2 level dataset We will here consider the 2 level reunion dataset as considered in MLwiN and the final model: WinBUGS code for the model model { # Level 1 definition for(i in 1:N) { fscr[i] ~ dbin(p[i],denom[i]) logit(p[i]) <- beta[1] * cons[i] + beta[2] * ai[i] + u2[herd[i]] * cons[i] } # Higher level definitions for (j in 1:n2) { u2[j] ~ dnorm(0,tau.u2) } # Priors for fixed effects for (k in 1:2) { beta[k] ~ dflat() } # Priors for random terms tau.u2 ~dgamma(0.001000,0.001000) sigma2.u2 <- 1/tau.u2 } Here we see the code for a 2-level logistic regression model. Note the use of dbin for the binomial distribution and the logit link function. The file also contains initial values from the 1st order MQL run of this model and data values. We are using Gamma(ε,ε) priors. WinBUGS node info On the info menu you can select node info. This will inform you (in a funny language) what method is being used for each node. For this model we get: beta[1] UpdaterGLM.LogitUpdater – means the Gamerman method. u2[1] UpdaterRejection.Logit – means the AR method. (I think!) tau.u2 UpdaterGamma.Updater – means conjugate Gibbs sampling. WinBUGS chains The model was run for 5,000 iterations after a burnin of 500 which took just over 2 minutes. The chains all look reasonable. beta[1] 1.5 1.0 0.5 0.0 501 2000 4000 iteration beta[2] -0.5 -1.0 -1.5 -2.0 501 2000 4000 iteration sigma2.u2 0.6 0.4 0.2 0.0 501 2000 4000 iteration Point Estimates Estimates are similar to MLwiN: Node Mean SD Beta[1] 0.651 0.169 Beta[2] -1.113 0.174 Sigma2.u2 0.091 0.056 MLwin estimates after 50,000 iterations Reunion Island 3 level dataset We will here consider the 3 level reunion dataset as considered in MLwiN and the final model which gave estimation difficulties: WinBUGS chains The model was run for 5000 iterations after a burnin of 500 which took 10 minutes! Although the intercept chain on the left looks good and better than MH, the cow level variance has clearly not converged as yet. So the WinBUGS method does not fix the problem. beta[1] sigma2.u2 1.0 0.6 0.75 0.4 0.5 0.2 0.25 0.0 0.0 501 2000 4000 iteration 501 2000 4000 iteration Between cows variance after 50k iterations using WinBUGS Hierarchical centering formulation This is a method for improving the mixing of an MCMC method. Consider the VC model yij 0 u j eij u j ~ N (0, u2 ), eij ~ N (0, e2 ) This can be written equivalently as yij j eij j ~ N ( 0 , u2 ), eij ~ N (0, e2 ) and the Gibbs sampling algorithm written using this parameterisation. This may improve mixing if j is less correlated with 0 than uj is. See Browne (2004) for models where this works well. Hierarchical centering in WinBUGS model { # Level 1 definition for(i in 1:N) { fscr[i] ~ dbin(p[i],denom[i]) logit(p[i]) <- beta[2] * ai[i] + u2[cow[i]] } # Higher level definitions for (j in 1:n2) { u2[j] ~ dnorm(u3[herd[j]],tau.u2) } for (j in 1:n3) { u3[j] ~ dnorm(beta[1],tau.u3) } # Priors for fixed effects for (k in 1:2) { beta[k] ~ dflat() } # Priors for random terms tau.u2 ~ dgamma(0.001000,0.001000) sigma2.u2 <- 1/tau.u2 tau.u3 ~ dgamma(0.001000,0.001000) sigma2.u3 <- 1/tau.u3 } It is easy to modify the code in WinBUGS to fit a hierarchically centred model as shown to the left. Note the main difficulty is for a 3 level model the mapping vector herd must now map from cows to herd rather than observations to herd. This means changing the datafile. Trace plot for hierarchical centred formulation Unfortunately this doesn’t improve things much! The parameter expansion method also discussed in Browne (2004) may work better here and may be worth examining. Estimates comparison In the following table we compare estimates after 50k (100k for MLwiN) iterations from 3 methods and see reasonable agreement. Parameter MLwiN WinBUGS H.C. β0 0.563 (0.125) 0.560 (0.127) 0.570 (0.131) β1 -1.014 (0.130) -1.007 (0.130) -1.021 (0.131) σ2v 0.094 (0.043) 0.094 (0.043) 0.096 (0.044) σ2u 0.229 (0.137) 0.202 (0.139) 0.231 (0.127) Probit Regression The logistic link function is only one possible link function for Binomial data. Any functions that will map from (0,1) to the whole real line will do and another popular choice is the probit link (the inverse of the normal cdf). This link with the normal distribution can work to our advantage and allow another way of using Gibbs sampling for a binary data model. This is through the use of latent variables. Latent variable approach One source of binary data is the thresholding of a continuous response, for example in education students often take exams and rather than a mark we observe whether the student passes or fails. The latent variable approach works like the reverse of this i.e. we see the binary response and from this we generate the underlying continuous variable. Simple example Consider the random effects probit regression model: yij ~ Bernoulli ( pij ) probit( yij ) X ij u j , u j ~ N (0, u2 ) This model is equivalent to the following Normal response model: y *ij X ij u j eij u j ~ N (0, u2 ), eij ~ N (0, e2 ) where yij* is unobserved but is restricted to positive values when y = 1 and negative values when y = 0. Gibbs Sampling algorithm We then have four steps for our latent variable model: 1. Generate yij* from its truncated Normal conditional distribution for all i and j. 2. Generate from its (multivariate) Normal conditional distribution. 3. Generate uj from its Normal conditional distribution for each j. 4. Generate 2u from its inverse Gamma conditional distribution. Probit model for 2 level reunion island dataset (demo) Using Gibbs in MLwiN: Trace plots for only 5k iterations Probit for 3 level reunion island dataset • Still not great but ESS upto 206! Introduction to the Practical In the next practical you will have the chance to explore the possible methods from this lecture on the two datasets from the last practical: • The Bangladesh contraceptive use dataset. • The pig pneumonia dataset.