Bayes, BUGs and Beyond Bayes Paradigm

Bayes, BUGs and Beyond Bayes Paradigm Chapter 16, ARM We have used lmer successfully to fit multilevel models. Issue: uses point estimate of variance (if 0, drop the random effect). A strategy: Start with lm or glm. Fit in lmer using multiple levels. Use BUGs or JAGs Each parameter has a prior distribution. They use: Gaussian for group-level models or Uniform distributions (noninformative) and caution (p348) that we should check robustness to prior specification. Combine prior with likelihood to get a posterior which answers any question. More R programming Stat 506 Gelman & Hill, Chapter 16 Regression Gelman & Hill, Chapter 16 More Complexity We’ve seen that classical regression estimates are posterior means given a U(−∞, ∞) prior on each component of β and U(0, ∞) on log(σ). Interpretation is different: versus Stat 506 b σ 2 (XT X)−1 ) β|y, σ ∼ N(β, b ∼ N(β, σ 2 (XT X)−1 ) β|σ y i = Xi β + Z i b i + i Same uniform prior on β and log(σ). bi ∼ N(0, Ψ) Varying Intercept: yi ∼ N(αj[i] + βxi , σy2 ) with prior αj ∼ N(µα , σα2 ) Stat 506 Gelman & Hill, Chapter 16 Stat 506 Gelman & Hill, Chapter 16 Fit Radon data in R lm fit lm.pooled <- lm(log.radon ~ floor, data = mn) display.xtable(lm.pooled) srrs2 <- read.table("http://www.stat.columbia.edu/~gelman/arm/examples/radon/srrs2.dat", head = TRUE, sep = ",") Estimate Std. Error t value mn <- droplevels(subset(srrs2, state == "MN")) (Intercept) 1.33 0.03 44.64 names(mn)[19] <- "radon" mn$log.radon <- y <- log(pmax(0.1, mn$radon)) floor -0.61 0.07 -8.42 n <- nrow(mn) levels(mn$county) <- gsub("[[:space:]]$", "", levels(mn$county)) Table: n = 919 rank = 2 resid sd = 0.823 R-Squared = 0.072 ybarbar <- mean(y) sample.size <- as.vector(table(mn$county)) n.county <- length(sample.size) sample.size.jittered <- sample.size * exp(runif(n.county, -0.1, 0.1)) lm.unpooled.1 <- lm(log.radon ~ 0 + floor + county, cty.mns <- tapply(y, mn$county, mean) display.xtable(lm.unpooled.1) cty.vars <- tapply(y, mn$county, var) cty.sds <- mean(sqrt(cty.vars[!is.na(cty.vars)]))/sqrt(sample.size) cty.sds.sep <- sqrt(cty.vars/sample.size) Stat 506 Gelman & Hill, Chapter 16 JAGs setup Build a model file containing: model { for (i in 1:n){ y[i] ~ dnorm (y.hat[i], tau.y) y.hat[i] <- a[county[i]] + b*x[i] } b ~ dnorm (0, .0001) tau.y <- pow(sigma.y, -2) sigma.y ~ dunif (0, 100) for (j in 1:J){ a[j] ~ dnorm (mu.a, tau.a) } mu.a ~ dnorm (0, .0001) tau.a <- pow(sigma.a, -2) sigma.a ~ dunif (0, 100) } Same code works in BUGs or JAGs Stat 506 Gelman & Hill, Chapter 16 data = Estimate Std. Error t value floor -0.72 0.07 -9.80 countyAITKIN 0.84 0.38 2.22 countyANOKA 0.87 0.10 8.33 Stat 506 Gelman & Hill, Chapter 16 countyBECKER 1.53 0.44 3.48 JAGs call countyBELTRAMI 1.55 0.29 5.37 countyBENTON 1.43 0.38 3.78 require(R2jags)countyBIG STONE 1.51 0.44 3.46 radon.data <- with(mn, list(n = n, J = n.county, y = log.radon, county = as.nume countyBLUE EARTH 2.01 0.20 9.94 x = floor)) countyBROWN 1.99 5.24 radon.JAGs <- jags(model.file = "radonModel1.jags", data 0.38 = radon.data, parameters.to.save = c("a", "b", "mu.a",1.00 "sigma.y", "sigma.a"), n.chains = 3 countyCARLTON 0.24 4.19 n.iter = 500) countyCARVER 1.56 0.31 5.03 1.40 0.34 4.14 ## module glm loaded countyCASS countyCHIPPEWA 1.73 0.38 4.57 countyCHISAGO 1.04 0.31 3.36 attach.jags(radon.JAGs) countyCLAY 1.99 0.20 9.78 ls(pos = 2) countyCLEARWATER 1.34 0.38 3.52 ## [1] "a" "b" "deviance" "mu.a" "n.sims" "sigma.a" countyCOOK 0.66 0.53 1.24 ## [7] "sigma.y" countyCOTTONWOOD 1.27 0.38 3.34 summary(gelman.diag(as.mcmc(radon.JAGs))) countyCROW WING 1.12 0.22 5.12 countyDAKOTA 1.34 0.10 14.03 ## Length Class Mode countyDODGE 1.80 0.44 4.12 ## psrf 180 -nonenumeric ## mpsrf 1 -nonenumeric countyDOUGLAS 1.73 0.25 6.87 countyFARIBAULT 0.64 0.31 2.06 Stat 506 Gelman & Hill, Chapter 16 countyFILLMORE 1.40 0.54 2.61 JAGs Plot Setup Plots of Random Effects plot(radon.JAGs) display8 <- c(36, 1, 35, 21, 14, 71, 61, 70) # counties to be displayed x.jitter <- mn$floor + runif(n, -0.05, 0.05) x.range <- range(x.jitter) y.range <- range(mn$log.radon[!is.na(match(as.numeric(mn$county), display8))]) Bugs model at "radonModel1.jags", fit using jags, 3 chains, each with 500 iterations (first 250 discarded) R−hat 80% interval for each chain −10000 1000 2000 3000 1 1.5 2+ * a[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] b deviance mu.a sigma.a −10000 1000 2000 3000 1 1.5 2+ * array truncated for lack of space medians and 80% intervals ● 3 2 1 0 ● ● ● *a ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 3 4 5 6 7 8 910 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 ● ● ● # pull out parameter estimates from classical fits a.pooled <- coef(lm.pooled)[1] # complete-pooling intercept b.pooled <- coef(lm.pooled)[2] # complete-pooling slope a.nopooled <- coef(lm.unpooled.1)[2:(n.county + 1)] # no-pooling vector of inte b.nopooled <- coef(lm.unpooled.1)[1] # no-pooling slope a.multilevel <- apply(a, 2, median) b.multilevel <- median(b) cnty <- as.numeric(mn$county) −0.6 ● ● −0.8 ● ● deviance 2080 ● ● ● ● ● 2060 ● ● ● ● ● ● ● ● mu.a ● ● ● ● ● ● ● 1.6 1.5 1.4 1.3 ● ● ● ● ● ● ● 0.5 sigma.a 0.4 0.3 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 0.78 sigma.y 0.76 0.74 0.72 ● ● ● ● ● ● ● ● Stat 506 Gelman & Hill, Chapter 16 Plots of Random Effects 1 3 2 ● ● ●● ● ● ● ● ● 0.0 0.8 0.0 0.8 floor floor floor ● ● ●● ● ● 0.0 0.8 0.0 0.8 0.0 0.8 floor floor floor 3 2 ● 1 3 2 1 3 ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ●● ● STEARNS log radon level 0 ● ● ●● ● 0 ● ● 2 ● ● ● 1 ●● RAMSEY log radon level STEELE ● ● intercept, αj (multilevel inference) ● ● 0 ● ● ● ● ● log radon level 3 2 1 0 log radon level 3 2 1 ● 0.0 0.8 log radon level 3 2 ● sample.size <- as.vector(table(mn$county)) sample.size.jitter <- sample.size * exp(runif(85, -0.1, 0.1)) plot(sample.size.jitter, a.multilevel, xlab = "sample size in county j", ylim = range(mn$log.radon), ylab = expression(paste("intercept, ", alpha[j], " (multilevel inference)")), cex.lab = 1.2, cex.axis = 1.1, cex.main = 1.1, mgp = c(2, 0.7, 0), pch = 20, log = "x") for (j in 1:85) { lines(rep(sample.size.jitter[j], 2), median(a[, j]) + c(-1, 1) * sd(a[, j])) } abline(a.pooled, 0, lwd = 0.5) DOUGLAS floor ● ● 1 KOOCHICHING 0.0 0.8 CLAY log radon level ● ● 0 log radon level AITKIN 0 3 1 2 ● 0 log radon level ● Gelman & Hill, Chapter 16 Figure 12.3b par(mfrow = c(2, 4)) for (j in display8) { plot(x.jitter[cnty == j], mn$log.radon[cnty == j], xlim = c(-0.05, 1.05), ylim = y.range, xlab = "floor", ylab = "log radon level", main = levels(mn$county)[j], cex.lab = 1.2, cex.axis = 1.1, cex.main = 1.1, mgp = c(2, 0.7, 0), pch = 20) curve(a.pooled + b.pooled * x, lwd = 0.5, lty = 2, col = "gray10", add = TRUE) curve(a.nopooled[j] + b.nopooled * x, lwd = 0.5, col = "gray10", add = TRUE) curve(a.multilevel[j] + b.multilevel * x, lwd = 1, col = "black", add = TRUE) } LAC QUI PARLE Stat 506 ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● 4 2120 2100 ● ● ● ● 3 ● ● ● 2 −0.7 ● ● ●● ● 1 b ● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ●●● ●● ●● ● ●● ●● ●●●● ● ● ● ● ●● ●● ● ● ● ● ●●● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● −2 ● 1 2 5 10 20 50 100 sample size in county j 0.0 0.8 floor Stat 506 Gelman & Hill, Chapter 16 Stat 506 Gelman & Hill, Chapter 16 Convergence of MCMC runs Traceplot on Slope plot(1:250, b[1:250], type = "l", col = 4) ## chain 1 in 1:250 lines(1:250, b[1:250 + 250], type = "l", col = 2) ## chain 2 251:500 lines(1:250, b[1:250 + 500], type = "l", col = 3) ## chain 3 501:750 Trace plots of multiple chains. See that they are sampling from the same general values of each parameter, not stuck in different areas. b[1:250] Check effective sample size ≈ n × (1 − |ρ̂|) where n is the number of samples saved and ρ is the autocorrelation of residuals. Correlation near 1 reduces effective sample size, near 0 has no effect. Should be over 100. −0.9 −0.8 −0.7 −0.6 −0.5 Check to see that Gelman & Rubin’s R̂ is close to 1. R̂ < 1.1 0 50 100 150 200 250 1:250 Three chains should “cover” the same space. Stat 506 Gelman & Hill, Chapter 16 Fits and Residuals Stat 506 More Posterior Effects Fitted values are based on posterior means: yHat <- a.multilevel[cnty] + b.multilevel * mn$floor ## or monitor it yResid <- mn$log.radon - yHat plot(yHat, yResid, xlab = "Posterior Mean Fits", ylab = "Residuals") ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ● ● ● ● ●●● ● ● ● ● ● ●● ● ●● ●●● ●●● ● ● ● ●● ● ● ●● ●● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ●●● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ●●● ●● ●● ●● ● ● ● ●● ●● ●● ● ● ●●● ● ●● ● ● ●● ● ●●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●●● ● ● ●●● ● ● ● ●● ●● ●●● ● ●●● ● ●●● ● ● ● ● ● ● ●● ● ●● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ●● ● ● ● ●● ● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ●●●● ● ●● ●● ● ● ●● ●● ● ●● ● ● ● ●●● ●● ● ● ● ● ●● ● ● ● ●● ● ● ●●●● ● ● ●● ● ● ● ●● ●● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ●● ●● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● Compare unlogged radon for random house in Hennepin county (26) to random house in Lac Qui Parle county (36) at floor = 0. lqp.radon <- exp(rnorm(500, a[, 26] + b, sigma.y)) hen.radon <- exp(rnorm(500, a[, 26] + b, sigma.y)) hist(radon.diff) Histogram of radon.diff 1.0 100 150 200 Frequency 50 0 0 −3 −2 −1 Residuals 1 2 ● 0.5 Gelman & Hill, Chapter 16 −20 0 10 20 radon.diff 1.5 Posterior Mean Fits Stat 506 −10 xtable(summary(radon.diff <- lqp.radon - hen.radon)) Gelman & Hill, Chapter 16 Stat 506 ## Error: Gelman & Hill, Chapter 16 xtable.table is not implemented for tables 16.5 Add County Level Predictors Add Predictors Complete Pooling: one α, one β, Priors: N(0, 1002 ) model { for (i in 1:n){ y[i] ~ dnorm (a + b*x[i], tau.y) } b ~ dnorm (0, .0001) tau.y <- pow(sigma.y, -2) sigma.y ~ dunif (0, 100) a ~ dnorm (0, .0001) } Winter is associated with higher radon levels No Pooling: One β as above, 84 αj ’s, each N(0, 1002 ) y.hat[i] <- a + b_floor * floor[i] + b_winter * winter[i] + b_fw * floor[i] * winter[i] y.hat[i] <- a + b_floor * floor[i] + b_winter * winter[i] Interaction? model { for (i in 1:n){ y[i] ~ dnorm (a[cnty[i]] + b*x[i], tau.y) } b ~ dnorm (0, .0001) tau.y <- pow(sigma.y, -2) sigma.y ~ dunif (0, 100) for(j in 1:J){ a[j] ~ dnorm (mu.a, tau.a) } sigma.a ~ dunif (0, 100) tau.a <- pow(sigma.a, -2) mu.a ~ dnorm(0, .0001) } Stat 506 Each gets a dnorm(0,.0001) prior. column all ones y.hat[i] <- inprod(b[], X[i, ]) Gelman & Hill, Chapter 16 County level Uranium fit Now pass matrix X with first Stat 506 Gelman & Hill, Chapter 16 16.6 Predictions JAGs model model { for (i in 1:n){ y[i] ~ dnorm (a[cnty[i]] + b*x[i], tau.y) } b ~ dnorm (0, .0001) tau.y <- pow(sigma.y, -2) sigma.y ~ dunif (0, 100) for(j in 1:J){ a[j] ~ dnorm (a.hat[j], tau.a) a.hat[j] <- g.o + g.1*u[j] } sigma.a ~ dunif (0, 100) tau.a <- pow(sigma.a, -2) g.0 ~ dnorm(0, .0001) g.1 ~ dnorm(0, .0001) } Stat 506 Gelman & Hill, Chapter 16 Either in BUGs/JAGs add more lines for missing responses, and a y.tilde flag. or in R attach the saved MCMC samples, generate data using each row of the simulation. attach.jags(radon2.JAGs) y.hepin <- rnorm(n.sims, a[, 26] + b * 1, sigma.y) ## first floor u.tilde <- mean(mn$u) a.tilde <- rnorm(n.sims, g.0 + g.1 * u.tilde, sigma.a) y.newcty <- rnorm(n.sims, a.tilde + b * 1, sigma.y) Stat 506 Gelman & Hill, Chapter 16 16.7 Does it Work? 16.8 Principles Simulate to Check, as in Chapter 8 1 Pick reasonable parameter values, θtrue . From expert opinion From preliminary fitted results. 2 3 Create y fake using θtrue and the assumed model. Fit the model, see if estimates are close to θtrue . Coverage rates for credible intervals. Data, modeled responses or unmodeled: predictors or sizes modeled: y used with ∼ unmodeled: n, J, cnty, floor, u used on RHS of <- or ∼ Parameters, modeled or not, are on LHS of ∼ modeled: a fit to uranium unmodeled: b, g.0, g.1, sigma.y, sigma.a Derived: y.hat, tau.y, a.hat, tau.a defined with <indices: i, j Parameters and missing data can be given initial values. Stat 506 Gelman & Hill, Chapter 16 16.9 Implementation Stat 506 Gelman & Hill, Chapter 16 16.10 Open-ended Start simple, get it working, add complexity slowly. Use 3-4 chains Start with a few runs. autojags lets you update with more runs. Check R̂ < 1.1. Better to simplify or subset the data than to runn for thousands of iterations. Stat 506 Gelman & Hill, Chapter 16 Models can get very complex. Nonlinear is no harder to handle. b * exp( -g * x[i]) Ratio: y[i] <- (a + b*x1[i])/(1 + g*x2[i]) Unequal variances by group or as power of the mean. T distribution instead of normal, even estimating the df. Stat 506 Gelman & Hill, Chapter 16

Bayes, BUGs and Beyond Bayes Paradigm

Related documents

Products

Support

Bayes, BUGs and Beyond Bayes Paradigm

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib