Bayes, BUGs and Beyond Bayes Paradigm

advertisement
Bayes, BUGs and Beyond
Bayes Paradigm
Chapter 16, ARM
We have used lmer successfully to fit multilevel models.
Issue: uses point estimate of variance (if 0, drop the random
effect).
A strategy:
Start with lm or glm.
Fit in lmer using multiple levels.
Use BUGs or JAGs
Each parameter has a prior distribution. They use:
Gaussian for group-level models or
Uniform distributions (noninformative)
and caution (p348) that we should check robustness to prior
specification.
Combine prior with likelihood to get a posterior which answers any
question.
More R programming
Stat 506
Gelman & Hill, Chapter 16
Regression
Gelman & Hill, Chapter 16
More Complexity
We’ve seen that classical regression estimates are posterior means
given a U(−∞, ∞) prior on each component of β and U(0, ∞) on
log(σ).
Interpretation is different:
versus
Stat 506
b σ 2 (XT X)−1 )
β|y, σ ∼ N(β,
b ∼ N(β, σ 2 (XT X)−1 )
β|σ
y i = Xi β + Z i b i + i
Same uniform prior on β and log(σ).
bi ∼ N(0, Ψ)
Varying Intercept:
yi ∼ N(αj[i] + βxi , σy2 ) with prior αj ∼ N(µα , σα2 )
Stat 506
Gelman & Hill, Chapter 16
Stat 506
Gelman & Hill, Chapter 16
Fit Radon data in R
lm fit
lm.pooled <- lm(log.radon ~ floor, data = mn)
display.xtable(lm.pooled)
srrs2 <- read.table("http://www.stat.columbia.edu/~gelman/arm/examples/radon/srrs2.dat",
head = TRUE, sep = ",")
Estimate Std. Error t value
mn <- droplevels(subset(srrs2, state == "MN"))
(Intercept)
1.33
0.03
44.64
names(mn)[19] <- "radon"
mn$log.radon <- y <- log(pmax(0.1, mn$radon))
floor
-0.61
0.07
-8.42
n <- nrow(mn)
levels(mn$county) <- gsub("[[:space:]]$", "", levels(mn$county))
Table: n = 919 rank = 2 resid sd = 0.823 R-Squared = 0.072
ybarbar <- mean(y)
sample.size <- as.vector(table(mn$county))
n.county <- length(sample.size)
sample.size.jittered <- sample.size * exp(runif(n.county, -0.1, 0.1))
lm.unpooled.1 <- lm(log.radon ~ 0 + floor + county,
cty.mns <- tapply(y, mn$county, mean)
display.xtable(lm.unpooled.1)
cty.vars <- tapply(y, mn$county, var)
cty.sds <- mean(sqrt(cty.vars[!is.na(cty.vars)]))/sqrt(sample.size)
cty.sds.sep <- sqrt(cty.vars/sample.size)
Stat 506
Gelman & Hill, Chapter 16
JAGs setup
Build a model file containing:
model {
for (i in 1:n){
y[i] ~ dnorm (y.hat[i], tau.y)
y.hat[i] <- a[county[i]] + b*x[i]
}
b ~ dnorm (0, .0001)
tau.y <- pow(sigma.y, -2)
sigma.y ~ dunif (0, 100)
for (j in 1:J){
a[j] ~ dnorm (mu.a, tau.a)
}
mu.a ~ dnorm (0, .0001)
tau.a <- pow(sigma.a, -2)
sigma.a ~ dunif (0, 100)
}
Same code works in BUGs or JAGs
Stat 506
Gelman & Hill, Chapter 16
data =
Estimate Std. Error t value
floor
-0.72
0.07
-9.80
countyAITKIN
0.84
0.38
2.22
countyANOKA
0.87
0.10
8.33
Stat 506
Gelman & Hill, Chapter 16
countyBECKER
1.53
0.44
3.48
JAGs call
countyBELTRAMI
1.55
0.29
5.37
countyBENTON
1.43
0.38
3.78
require(R2jags)countyBIG STONE
1.51
0.44
3.46
radon.data <- with(mn, list(n = n, J = n.county, y = log.radon, county = as.nume
countyBLUE EARTH
2.01
0.20
9.94
x = floor))
countyBROWN
1.99
5.24
radon.JAGs <- jags(model.file
= "radonModel1.jags",
data 0.38
= radon.data,
parameters.to.save
= c("a", "b", "mu.a",1.00
"sigma.y", "sigma.a"),
n.chains = 3
countyCARLTON
0.24
4.19
n.iter = 500)
countyCARVER
1.56
0.31
5.03
1.40
0.34
4.14
## module glm loaded countyCASS
countyCHIPPEWA
1.73
0.38
4.57
countyCHISAGO
1.04
0.31
3.36
attach.jags(radon.JAGs)
countyCLAY
1.99
0.20
9.78
ls(pos = 2)
countyCLEARWATER
1.34
0.38
3.52
## [1] "a"
"b"
"deviance" "mu.a"
"n.sims"
"sigma.a"
countyCOOK
0.66
0.53
1.24
## [7] "sigma.y"
countyCOTTONWOOD
1.27
0.38
3.34
summary(gelman.diag(as.mcmc(radon.JAGs)))
countyCROW WING
1.12
0.22
5.12
countyDAKOTA
1.34
0.10
14.03
##
Length Class Mode
countyDODGE
1.80
0.44
4.12
## psrf 180
-nonenumeric
## mpsrf
1
-nonenumeric
countyDOUGLAS
1.73
0.25
6.87
countyFARIBAULT
0.64
0.31
2.06
Stat 506
Gelman & Hill, Chapter 16
countyFILLMORE
1.40
0.54
2.61
JAGs Plot
Setup Plots of Random Effects
plot(radon.JAGs)
display8 <- c(36, 1, 35, 21, 14, 71, 61, 70) # counties to be displayed
x.jitter <- mn$floor + runif(n, -0.05, 0.05)
x.range <- range(x.jitter)
y.range <- range(mn$log.radon[!is.na(match(as.numeric(mn$county), display8))])
Bugs model at "radonModel1.jags", fit using jags, 3 chains, each with 500 iterations (first 250 discarded)
R−hat
80% interval for each chain
−10000 1000
2000
3000 1 1.5 2+
* a[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
[49]
[50]
[51]
[52]
[53]
[54]
[55]
[56]
[57]
[58]
[59]
[60]
[61]
[62]
[63]
[64]
[65]
[66]
[67]
[68]
[69]
[70]
[71]
[72]
[73]
b
deviance
mu.a
sigma.a
−10000 1000
2000
3000 1 1.5 2+
* array truncated for lack of space
medians and 80% intervals
●
3
2
1
0
●
●
●
*a
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1 2 3 4 5 6 7 8 910 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
●
●
●
# pull out parameter estimates from classical fits
a.pooled <- coef(lm.pooled)[1] # complete-pooling intercept
b.pooled <- coef(lm.pooled)[2] # complete-pooling slope
a.nopooled <- coef(lm.unpooled.1)[2:(n.county + 1)] # no-pooling vector of inte
b.nopooled <- coef(lm.unpooled.1)[1] # no-pooling slope
a.multilevel <- apply(a, 2, median)
b.multilevel <- median(b)
cnty <- as.numeric(mn$county)
−0.6
●
●
−0.8
●
●
deviance
2080
●
●
●
●
●
2060
●
●
●
●
●
●
●
●
mu.a
●
●
●
●
●
●
●
1.6
1.5
1.4
1.3
●
●
●
●
●
●
●
0.5
sigma.a 0.4
0.3
0.2
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.8
0.78
sigma.y 0.76
0.74
0.72
●
●
●
●
●
●
●
●
Stat 506
Gelman & Hill, Chapter 16
Plots of Random Effects
1
3
2
●
●
●●
●
●
●
●
●
0.0 0.8
0.0 0.8
floor
floor
floor
●
●
●●
●
●
0.0 0.8
0.0 0.8
0.0 0.8
floor
floor
floor
3
2
●
1
3
2
1
3
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
0
●
●
●
●●
●
STEARNS
log radon level
0
●
●
●●
●
0
●
●
2
●
●
●
1
●●
RAMSEY
log radon level
STEELE
●
●
intercept, αj (multilevel inference)
●
●
0
●
●
●
●
●
log radon level
3
2
1
0
log radon level
3
2
1
●
0.0 0.8
log radon level
3
2
●
sample.size <- as.vector(table(mn$county))
sample.size.jitter <- sample.size * exp(runif(85, -0.1, 0.1))
plot(sample.size.jitter, a.multilevel, xlab = "sample size in county j",
ylim = range(mn$log.radon), ylab = expression(paste("intercept, ",
alpha[j], "
(multilevel inference)")), cex.lab = 1.2, cex.axis = 1.1,
cex.main = 1.1, mgp = c(2, 0.7, 0), pch = 20, log = "x")
for (j in 1:85) {
lines(rep(sample.size.jitter[j], 2), median(a[, j]) + c(-1, 1) * sd(a[,
j]))
}
abline(a.pooled, 0, lwd = 0.5)
DOUGLAS
floor
●
●
1
KOOCHICHING
0.0 0.8
CLAY
log radon level
●
●
0
log radon level
AITKIN
0
3
1
2
●
0
log radon level
●
Gelman & Hill, Chapter 16
Figure 12.3b
par(mfrow = c(2, 4))
for (j in display8) {
plot(x.jitter[cnty == j], mn$log.radon[cnty == j], xlim = c(-0.05,
1.05), ylim = y.range, xlab = "floor", ylab = "log radon level",
main = levels(mn$county)[j], cex.lab = 1.2, cex.axis = 1.1, cex.main = 1.1,
mgp = c(2, 0.7, 0), pch = 20)
curve(a.pooled + b.pooled * x, lwd = 0.5, lty = 2, col = "gray10",
add = TRUE)
curve(a.nopooled[j] + b.nopooled * x, lwd = 0.5, col = "gray10", add = TRUE)
curve(a.multilevel[j] + b.multilevel * x, lwd = 1, col = "black", add = TRUE)
}
LAC QUI PARLE
Stat 506
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
4
2120
2100
●
●
●
●
3
●
●
●
2
−0.7
●
●
●●
●
1
b
●
●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
●
●
● ●●●
●●
●●
● ●●
●●
●●●●
● ●
●
● ●●
●●
● ● ● ●
●●●
●●
●
● ●
●
●● ● ●
● ●●
● ● ●
●
● ● ●
●
●
●
●
●
● ●
●
●
●
0
●
●
●
●
●
−2
●
1
2
5
10
20
50
100
sample size in county j
0.0 0.8
floor
Stat 506
Gelman & Hill, Chapter 16
Stat 506
Gelman & Hill, Chapter 16
Convergence of MCMC runs
Traceplot on Slope
plot(1:250, b[1:250], type = "l", col = 4) ## chain 1 in 1:250
lines(1:250, b[1:250 + 250], type = "l", col = 2) ## chain 2 251:500
lines(1:250, b[1:250 + 500], type = "l", col = 3) ## chain 3 501:750
Trace plots of multiple chains. See that they are sampling
from the same general values of each parameter, not stuck in
different areas.
b[1:250]
Check effective sample size ≈ n × (1 − |ρ̂|) where n is the
number of samples saved and ρ is the autocorrelation of
residuals. Correlation near 1 reduces effective sample size,
near 0 has no effect. Should be over 100.
−0.9 −0.8 −0.7 −0.6 −0.5
Check to see that Gelman & Rubin’s R̂ is close to 1. R̂ < 1.1
0
50
100
150
200
250
1:250
Three chains should “cover” the same space.
Stat 506
Gelman & Hill, Chapter 16
Fits and Residuals
Stat 506
More Posterior Effects
Fitted values are based on posterior means:
yHat <- a.multilevel[cnty] + b.multilevel * mn$floor ## or monitor it
yResid <- mn$log.radon - yHat
plot(yHat, yResid, xlab = "Posterior Mean Fits", ylab = "Residuals")
●
●
●
●
● ● ●
●
●
●
●●
●
●●
● ● ●
●
●
●
●
●
●
●
●●
● ● ● ●● ●
●●
●
● ●
●
●
●
●
●
●
●●
●●●
●
●●
●
●
● ●
●●●
●
●
●
●
●
●●
●
●●
●●●
●●● ●
●
●
●● ●
●
●●
●●
●
●● ●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●●
●●●
●
●●
●
●
●
●
● ● ●●
●
●●
●
●
●●●
●●
●● ●● ● ● ●
●●
●●
●●
●
●
●●●
●
●●
●
●
●●
●
●●●
●
●
●
●●
●
●
●
●
● ●● ●
●
●
●
●
●●
●●
●
●
●
● ● ●
●
●●●
●
●
●●●
●
●
●
●● ●● ●●●
●
●●●
● ●●●
● ●
●
●
●
● ●●
●
●●
●
● ●●
●●
● ●●
●
●
●
●●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
● ●●
● ●
●
●●
●
●
●
● ● ●● ● ● ●
●●
●●
●
●
● ●●
●
●
●
●●
● ●●
●
●
●●● ●
●
●● ●
●
●
●
●
●
●●●●
●
●●
●● ● ●
●●
●●
●
●●
●
●
● ●●●
●●
●
●
●
●
●●
●
●
● ●●
●
●
●●●● ●
●
●●
●
●
● ●●
●●
●
●●
●
●●
●
●
●
● ●●
●
●
●
●
●●
●
●●●
●
●
●●
●
●
●
● ●● ●● ●●●
●
●
●●
●
● ●
●
●
●●
●
●
●
● ●●
●
●
● ●
● ●
●
●● ●
●●
●
●●● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●
● ●
●●
●
●
●
●
●
●
●
●
●●
● ●●
●
●● ●
●
●
●
●
●
● ● ●
●
●
●● ●
●
Compare unlogged radon for random house in Hennepin county
(26) to random house in Lac Qui Parle county (36) at floor = 0.
lqp.radon <- exp(rnorm(500, a[, 26] + b, sigma.y))
hen.radon <- exp(rnorm(500, a[, 26] + b, sigma.y))
hist(radon.diff)
Histogram of radon.diff
1.0
100 150 200
Frequency
50
0
0
−3 −2 −1
Residuals
1
2
●
0.5
Gelman & Hill, Chapter 16
−20
0
10
20
radon.diff
1.5
Posterior Mean Fits
Stat 506
−10
xtable(summary(radon.diff <- lqp.radon - hen.radon))
Gelman & Hill, Chapter 16
Stat 506
## Error:
Gelman & Hill, Chapter 16
xtable.table is not implemented for tables
16.5 Add County Level Predictors
Add Predictors
Complete Pooling: one α, one β, Priors: N(0, 1002 )
model {
for (i in 1:n){
y[i] ~ dnorm (a + b*x[i], tau.y)
}
b ~ dnorm (0, .0001)
tau.y <- pow(sigma.y, -2)
sigma.y ~ dunif (0, 100)
a ~ dnorm (0, .0001)
}
Winter is associated with higher radon levels
No Pooling: One β as above, 84 αj ’s, each N(0, 1002 )
y.hat[i] <- a + b_floor * floor[i] + b_winter * winter[i] + b_fw * floor[i] *
winter[i]
y.hat[i] <- a + b_floor * floor[i] + b_winter * winter[i]
Interaction?
model {
for (i in 1:n){
y[i] ~ dnorm (a[cnty[i]] + b*x[i], tau.y)
}
b ~ dnorm (0, .0001)
tau.y <- pow(sigma.y, -2)
sigma.y ~ dunif (0, 100)
for(j in 1:J){
a[j] ~ dnorm (mu.a, tau.a)
}
sigma.a ~ dunif (0, 100)
tau.a <- pow(sigma.a, -2)
mu.a ~ dnorm(0, .0001)
}
Stat 506
Each gets a dnorm(0,.0001) prior.
column all ones
y.hat[i] <- inprod(b[], X[i, ])
Gelman & Hill, Chapter 16
County level Uranium fit
Now pass matrix X with first
Stat 506
Gelman & Hill, Chapter 16
16.6 Predictions
JAGs model
model {
for (i in 1:n){
y[i] ~ dnorm (a[cnty[i]] + b*x[i], tau.y)
}
b ~ dnorm (0, .0001)
tau.y <- pow(sigma.y, -2)
sigma.y ~ dunif (0, 100)
for(j in 1:J){
a[j] ~ dnorm (a.hat[j], tau.a)
a.hat[j] <- g.o + g.1*u[j]
}
sigma.a ~ dunif (0, 100)
tau.a <- pow(sigma.a, -2)
g.0 ~ dnorm(0, .0001)
g.1 ~ dnorm(0, .0001)
}
Stat 506
Gelman & Hill, Chapter 16
Either in BUGs/JAGs
add more lines for missing responses, and a y.tilde flag.
or in R
attach the saved MCMC samples, generate data using each row of
the simulation.
attach.jags(radon2.JAGs)
y.hepin <- rnorm(n.sims, a[, 26] + b * 1, sigma.y) ## first floor
u.tilde <- mean(mn$u)
a.tilde <- rnorm(n.sims, g.0 + g.1 * u.tilde, sigma.a)
y.newcty <- rnorm(n.sims, a.tilde + b * 1, sigma.y)
Stat 506
Gelman & Hill, Chapter 16
16.7 Does it Work?
16.8 Principles
Simulate to Check, as in Chapter 8
1
Pick reasonable parameter values, θtrue .
From expert opinion
From preliminary fitted results.
2
3
Create y fake using θtrue and the assumed model.
Fit the model, see if estimates are close to θtrue . Coverage
rates for credible intervals.
Data, modeled responses or unmodeled: predictors or sizes
modeled: y used with ∼
unmodeled: n, J, cnty, floor, u used on RHS of <- or
∼
Parameters, modeled or not, are on LHS of ∼
modeled: a fit to uranium
unmodeled: b, g.0, g.1, sigma.y, sigma.a
Derived: y.hat, tau.y, a.hat, tau.a defined with <indices: i, j
Parameters and missing data can be given initial values.
Stat 506
Gelman & Hill, Chapter 16
16.9 Implementation
Stat 506
Gelman & Hill, Chapter 16
16.10 Open-ended
Start simple, get it working, add complexity slowly.
Use 3-4 chains
Start with a few runs. autojags lets you update with more
runs. Check R̂ < 1.1.
Better to simplify or subset the data than to runn for
thousands of iterations.
Stat 506
Gelman & Hill, Chapter 16
Models can get very complex.
Nonlinear is no harder to handle. b * exp( -g * x[i])
Ratio: y[i] <- (a + b*x1[i])/(1 + g*x2[i])
Unequal variances by group or as power of the mean.
T distribution instead of normal, even estimating the df.
Stat 506
Gelman & Hill, Chapter 16
Download