Bayes, BUGs and Beyond Bayes Paradigm Chapter 16, ARM We have used lmer successfully to fit multilevel models. Issue: uses point estimate of variance (if 0, drop the random effect). A strategy: Start with lm or glm. Fit in lmer using multiple levels. Use BUGs or JAGs Each parameter has a prior distribution. They use: Gaussian for group-level models or Uniform distributions (noninformative) and caution (p348) that we should check robustness to prior specification. Combine prior with likelihood to get a posterior which answers any question. More R programming Stat 506 Gelman & Hill, Chapter 16 Regression Gelman & Hill, Chapter 16 More Complexity We’ve seen that classical regression estimates are posterior means given a U(−∞, ∞) prior on each component of β and U(0, ∞) on log(σ). Interpretation is different: versus Stat 506 b σ 2 (XT X)−1 ) β|y, σ ∼ N(β, b ∼ N(β, σ 2 (XT X)−1 ) β|σ y i = Xi β + Z i b i + i Same uniform prior on β and log(σ). bi ∼ N(0, Ψ) Varying Intercept: yi ∼ N(αj[i] + βxi , σy2 ) with prior αj ∼ N(µα , σα2 ) Stat 506 Gelman & Hill, Chapter 16 Stat 506 Gelman & Hill, Chapter 16 Fit Radon data in R lm fit lm.pooled <- lm(log.radon ~ floor, data = mn) display.xtable(lm.pooled) srrs2 <- read.table("http://www.stat.columbia.edu/~gelman/arm/examples/radon/srrs2.dat", head = TRUE, sep = ",") Estimate Std. Error t value mn <- droplevels(subset(srrs2, state == "MN")) (Intercept) 1.33 0.03 44.64 names(mn)[19] <- "radon" mn$log.radon <- y <- log(pmax(0.1, mn$radon)) floor -0.61 0.07 -8.42 n <- nrow(mn) levels(mn$county) <- gsub("[[:space:]]$", "", levels(mn$county)) Table: n = 919 rank = 2 resid sd = 0.823 R-Squared = 0.072 ybarbar <- mean(y) sample.size <- as.vector(table(mn$county)) n.county <- length(sample.size) sample.size.jittered <- sample.size * exp(runif(n.county, -0.1, 0.1)) lm.unpooled.1 <- lm(log.radon ~ 0 + floor + county, cty.mns <- tapply(y, mn$county, mean) display.xtable(lm.unpooled.1) cty.vars <- tapply(y, mn$county, var) cty.sds <- mean(sqrt(cty.vars[!is.na(cty.vars)]))/sqrt(sample.size) cty.sds.sep <- sqrt(cty.vars/sample.size) Stat 506 Gelman & Hill, Chapter 16 JAGs setup Build a model file containing: model { for (i in 1:n){ y[i] ~ dnorm (y.hat[i], tau.y) y.hat[i] <- a[county[i]] + b*x[i] } b ~ dnorm (0, .0001) tau.y <- pow(sigma.y, -2) sigma.y ~ dunif (0, 100) for (j in 1:J){ a[j] ~ dnorm (mu.a, tau.a) } mu.a ~ dnorm (0, .0001) tau.a <- pow(sigma.a, -2) sigma.a ~ dunif (0, 100) } Same code works in BUGs or JAGs Stat 506 Gelman & Hill, Chapter 16 data = Estimate Std. Error t value floor -0.72 0.07 -9.80 countyAITKIN 0.84 0.38 2.22 countyANOKA 0.87 0.10 8.33 Stat 506 Gelman & Hill, Chapter 16 countyBECKER 1.53 0.44 3.48 JAGs call countyBELTRAMI 1.55 0.29 5.37 countyBENTON 1.43 0.38 3.78 require(R2jags)countyBIG STONE 1.51 0.44 3.46 radon.data <- with(mn, list(n = n, J = n.county, y = log.radon, county = as.nume countyBLUE EARTH 2.01 0.20 9.94 x = floor)) countyBROWN 1.99 5.24 radon.JAGs <- jags(model.file = "radonModel1.jags", data 0.38 = radon.data, parameters.to.save = c("a", "b", "mu.a",1.00 "sigma.y", "sigma.a"), n.chains = 3 countyCARLTON 0.24 4.19 n.iter = 500) countyCARVER 1.56 0.31 5.03 1.40 0.34 4.14 ## module glm loaded countyCASS countyCHIPPEWA 1.73 0.38 4.57 countyCHISAGO 1.04 0.31 3.36 attach.jags(radon.JAGs) countyCLAY 1.99 0.20 9.78 ls(pos = 2) countyCLEARWATER 1.34 0.38 3.52 ## [1] "a" "b" "deviance" "mu.a" "n.sims" "sigma.a" countyCOOK 0.66 0.53 1.24 ## [7] "sigma.y" countyCOTTONWOOD 1.27 0.38 3.34 summary(gelman.diag(as.mcmc(radon.JAGs))) countyCROW WING 1.12 0.22 5.12 countyDAKOTA 1.34 0.10 14.03 ## Length Class Mode countyDODGE 1.80 0.44 4.12 ## psrf 180 -nonenumeric ## mpsrf 1 -nonenumeric countyDOUGLAS 1.73 0.25 6.87 countyFARIBAULT 0.64 0.31 2.06 Stat 506 Gelman & Hill, Chapter 16 countyFILLMORE 1.40 0.54 2.61 JAGs Plot Setup Plots of Random Effects plot(radon.JAGs) display8 <- c(36, 1, 35, 21, 14, 71, 61, 70) # counties to be displayed x.jitter <- mn$floor + runif(n, -0.05, 0.05) x.range <- range(x.jitter) y.range <- range(mn$log.radon[!is.na(match(as.numeric(mn$county), display8))]) Bugs model at "radonModel1.jags", fit using jags, 3 chains, each with 500 iterations (first 250 discarded) R−hat 80% interval for each chain −10000 1000 2000 3000 1 1.5 2+ * a[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] b deviance mu.a sigma.a −10000 1000 2000 3000 1 1.5 2+ * array truncated for lack of space medians and 80% intervals ● 3 2 1 0 ● ● ● *a ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 3 4 5 6 7 8 910 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 ● ● ● # pull out parameter estimates from classical fits a.pooled <- coef(lm.pooled)[1] # complete-pooling intercept b.pooled <- coef(lm.pooled)[2] # complete-pooling slope a.nopooled <- coef(lm.unpooled.1)[2:(n.county + 1)] # no-pooling vector of inte b.nopooled <- coef(lm.unpooled.1)[1] # no-pooling slope a.multilevel <- apply(a, 2, median) b.multilevel <- median(b) cnty <- as.numeric(mn$county) −0.6 ● ● −0.8 ● ● deviance 2080 ● ● ● ● ● 2060 ● ● ● ● ● ● ● ● mu.a ● ● ● ● ● ● ● 1.6 1.5 1.4 1.3 ● ● ● ● ● ● ● 0.5 sigma.a 0.4 0.3 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 0.78 sigma.y 0.76 0.74 0.72 ● ● ● ● ● ● ● ● Stat 506 Gelman & Hill, Chapter 16 Plots of Random Effects 1 3 2 ● ● ●● ● ● ● ● ● 0.0 0.8 0.0 0.8 floor floor floor ● ● ●● ● ● 0.0 0.8 0.0 0.8 0.0 0.8 floor floor floor 3 2 ● 1 3 2 1 3 ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ●● ● STEARNS log radon level 0 ● ● ●● ● 0 ● ● 2 ● ● ● 1 ●● RAMSEY log radon level STEELE ● ● intercept, αj (multilevel inference) ● ● 0 ● ● ● ● ● log radon level 3 2 1 0 log radon level 3 2 1 ● 0.0 0.8 log radon level 3 2 ● sample.size <- as.vector(table(mn$county)) sample.size.jitter <- sample.size * exp(runif(85, -0.1, 0.1)) plot(sample.size.jitter, a.multilevel, xlab = "sample size in county j", ylim = range(mn$log.radon), ylab = expression(paste("intercept, ", alpha[j], " (multilevel inference)")), cex.lab = 1.2, cex.axis = 1.1, cex.main = 1.1, mgp = c(2, 0.7, 0), pch = 20, log = "x") for (j in 1:85) { lines(rep(sample.size.jitter[j], 2), median(a[, j]) + c(-1, 1) * sd(a[, j])) } abline(a.pooled, 0, lwd = 0.5) DOUGLAS floor ● ● 1 KOOCHICHING 0.0 0.8 CLAY log radon level ● ● 0 log radon level AITKIN 0 3 1 2 ● 0 log radon level ● Gelman & Hill, Chapter 16 Figure 12.3b par(mfrow = c(2, 4)) for (j in display8) { plot(x.jitter[cnty == j], mn$log.radon[cnty == j], xlim = c(-0.05, 1.05), ylim = y.range, xlab = "floor", ylab = "log radon level", main = levels(mn$county)[j], cex.lab = 1.2, cex.axis = 1.1, cex.main = 1.1, mgp = c(2, 0.7, 0), pch = 20) curve(a.pooled + b.pooled * x, lwd = 0.5, lty = 2, col = "gray10", add = TRUE) curve(a.nopooled[j] + b.nopooled * x, lwd = 0.5, col = "gray10", add = TRUE) curve(a.multilevel[j] + b.multilevel * x, lwd = 1, col = "black", add = TRUE) } LAC QUI PARLE Stat 506 ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● 4 2120 2100 ● ● ● ● 3 ● ● ● 2 −0.7 ● ● ●● ● 1 b ● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ●●● ●● ●● ● ●● ●● ●●●● ● ● ● ● ●● ●● ● ● ● ● ●●● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● −2 ● 1 2 5 10 20 50 100 sample size in county j 0.0 0.8 floor Stat 506 Gelman & Hill, Chapter 16 Stat 506 Gelman & Hill, Chapter 16 Convergence of MCMC runs Traceplot on Slope plot(1:250, b[1:250], type = "l", col = 4) ## chain 1 in 1:250 lines(1:250, b[1:250 + 250], type = "l", col = 2) ## chain 2 251:500 lines(1:250, b[1:250 + 500], type = "l", col = 3) ## chain 3 501:750 Trace plots of multiple chains. See that they are sampling from the same general values of each parameter, not stuck in different areas. b[1:250] Check effective sample size ≈ n × (1 − |ρ̂|) where n is the number of samples saved and ρ is the autocorrelation of residuals. Correlation near 1 reduces effective sample size, near 0 has no effect. Should be over 100. −0.9 −0.8 −0.7 −0.6 −0.5 Check to see that Gelman & Rubin’s R̂ is close to 1. R̂ < 1.1 0 50 100 150 200 250 1:250 Three chains should “cover” the same space. Stat 506 Gelman & Hill, Chapter 16 Fits and Residuals Stat 506 More Posterior Effects Fitted values are based on posterior means: yHat <- a.multilevel[cnty] + b.multilevel * mn$floor ## or monitor it yResid <- mn$log.radon - yHat plot(yHat, yResid, xlab = "Posterior Mean Fits", ylab = "Residuals") ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ● ● ● ● ●●● ● ● ● ● ● ●● ● ●● ●●● ●●● ● ● ● ●● ● ● ●● ●● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ●●● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●●● ●● ●● ●● ● ● ● ●● ●● ●● ● ● ●●● ● ●● ● ● ●● ● ●●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●●● ● ● ●●● ● ● ● ●● ●● ●●● ● ●●● ● ●●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ●● ● ● ● ●● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ●●●● ● ●● ●● ● ● ●● ●● ● ●● ● ● ● ●●● ●● ● ● ● ● ●● ● ● ● ●● ● ● ●●●● ● ● ●● ● ● ● ●● ●● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ●● ●● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● Compare unlogged radon for random house in Hennepin county (26) to random house in Lac Qui Parle county (36) at floor = 0. lqp.radon <- exp(rnorm(500, a[, 26] + b, sigma.y)) hen.radon <- exp(rnorm(500, a[, 26] + b, sigma.y)) hist(radon.diff) Histogram of radon.diff 1.0 100 150 200 Frequency 50 0 0 −3 −2 −1 Residuals 1 2 ● 0.5 Gelman & Hill, Chapter 16 −20 0 10 20 radon.diff 1.5 Posterior Mean Fits Stat 506 −10 xtable(summary(radon.diff <- lqp.radon - hen.radon)) Gelman & Hill, Chapter 16 Stat 506 ## Error: Gelman & Hill, Chapter 16 xtable.table is not implemented for tables 16.5 Add County Level Predictors Add Predictors Complete Pooling: one α, one β, Priors: N(0, 1002 ) model { for (i in 1:n){ y[i] ~ dnorm (a + b*x[i], tau.y) } b ~ dnorm (0, .0001) tau.y <- pow(sigma.y, -2) sigma.y ~ dunif (0, 100) a ~ dnorm (0, .0001) } Winter is associated with higher radon levels No Pooling: One β as above, 84 αj ’s, each N(0, 1002 ) y.hat[i] <- a + b_floor * floor[i] + b_winter * winter[i] + b_fw * floor[i] * winter[i] y.hat[i] <- a + b_floor * floor[i] + b_winter * winter[i] Interaction? model { for (i in 1:n){ y[i] ~ dnorm (a[cnty[i]] + b*x[i], tau.y) } b ~ dnorm (0, .0001) tau.y <- pow(sigma.y, -2) sigma.y ~ dunif (0, 100) for(j in 1:J){ a[j] ~ dnorm (mu.a, tau.a) } sigma.a ~ dunif (0, 100) tau.a <- pow(sigma.a, -2) mu.a ~ dnorm(0, .0001) } Stat 506 Each gets a dnorm(0,.0001) prior. column all ones y.hat[i] <- inprod(b[], X[i, ]) Gelman & Hill, Chapter 16 County level Uranium fit Now pass matrix X with first Stat 506 Gelman & Hill, Chapter 16 16.6 Predictions JAGs model model { for (i in 1:n){ y[i] ~ dnorm (a[cnty[i]] + b*x[i], tau.y) } b ~ dnorm (0, .0001) tau.y <- pow(sigma.y, -2) sigma.y ~ dunif (0, 100) for(j in 1:J){ a[j] ~ dnorm (a.hat[j], tau.a) a.hat[j] <- g.o + g.1*u[j] } sigma.a ~ dunif (0, 100) tau.a <- pow(sigma.a, -2) g.0 ~ dnorm(0, .0001) g.1 ~ dnorm(0, .0001) } Stat 506 Gelman & Hill, Chapter 16 Either in BUGs/JAGs add more lines for missing responses, and a y.tilde flag. or in R attach the saved MCMC samples, generate data using each row of the simulation. attach.jags(radon2.JAGs) y.hepin <- rnorm(n.sims, a[, 26] + b * 1, sigma.y) ## first floor u.tilde <- mean(mn$u) a.tilde <- rnorm(n.sims, g.0 + g.1 * u.tilde, sigma.a) y.newcty <- rnorm(n.sims, a.tilde + b * 1, sigma.y) Stat 506 Gelman & Hill, Chapter 16 16.7 Does it Work? 16.8 Principles Simulate to Check, as in Chapter 8 1 Pick reasonable parameter values, θtrue . From expert opinion From preliminary fitted results. 2 3 Create y fake using θtrue and the assumed model. Fit the model, see if estimates are close to θtrue . Coverage rates for credible intervals. Data, modeled responses or unmodeled: predictors or sizes modeled: y used with ∼ unmodeled: n, J, cnty, floor, u used on RHS of <- or ∼ Parameters, modeled or not, are on LHS of ∼ modeled: a fit to uranium unmodeled: b, g.0, g.1, sigma.y, sigma.a Derived: y.hat, tau.y, a.hat, tau.a defined with <indices: i, j Parameters and missing data can be given initial values. Stat 506 Gelman & Hill, Chapter 16 16.9 Implementation Stat 506 Gelman & Hill, Chapter 16 16.10 Open-ended Start simple, get it working, add complexity slowly. Use 3-4 chains Start with a few runs. autojags lets you update with more runs. Check R̂ < 1.1. Better to simplify or subset the data than to runn for thousands of iterations. Stat 506 Gelman & Hill, Chapter 16 Models can get very complex. Nonlinear is no harder to handle. b * exp( -g * x[i]) Ratio: y[i] <- (a + b*x1[i])/(1 + g*x2[i]) Unequal variances by group or as power of the mean. T distribution instead of normal, even estimating the df. Stat 506 Gelman & Hill, Chapter 16