Chapter 2 3.1 SLR

advertisement
Chapter 2
3.1 SLR
Regress kids test score on mom’s high school (binary: 1 or 0).
Interpret the intercept estimate (78) and slope estimate (12).
xtable(lm(kid.score ~ mom.hs, kids))
Read Chapters 1-2 on your own, but note Section 2.5, a problem
with significance testing:
Study one: θb = 25 with SE = 10. Two-sided p-value = .012
Study two: θb = 10 with SE = 10. p-value = .30
b = 15 with SE =
Compare study one to study two: ∆
√
102 + 102 = 14.1 for 2-sided p-value = .29
Also 2.4: We don’t worry about “multiple testing” issues.
(Intercept)
mom.hs
Estimate
77.5484
11.7713
Std. Error
2.0586
2.3224
t value
37.67
5.07
Pr(>|t|)
0.0000
0.0000
Regress kids test score on mom’s IQ (continuous). Interpret the
intercept estimate (26) and slope estimate (0.6).
xtable(lm(kid.score ~ mom.iq, kids))
(Intercept)
mom.iq
Stat 505
Gelman & Hill, Chapter 3
Std. Error
5.9174
0.0585
Stat 505
3.2 MLR
t value
4.36
10.42
Pr(>|t|)
0.0000
0.0000
Gelman & Hill, Chapter 3
Interpretations:
Using both predictors. Slope is “change when all other factors are
held constant” (and are conditional on the other terms being in
b = (26 6 0.6)T
the model). β
Holding other predictors constant does not make sense in many
settings (e.g. polynomials, interactions).
xtable(lm(kid.score ~ mom.hs + mom.iq, kids))
(Intercept)
mom.hs
mom.iq
Estimate
25.7998
0.6100
Estimate
25.7315
5.9501
0.5639
Std. Error
5.8752
2.2118
0.0606
Stat 505
t value
4.38
2.69
9.31
Gelman & Hill, Chapter 3
Pr(>|t|)
0.0000
0.0074
0.0000
Prediction: If we look at another person or group where IQ is
one pt higher, but HS was the same, how does predicted score
change? or Change in prediction for HS vs nonHS at the same
IQ?
Counterfactuals: Imagine rewinding the clock, and change
moms HS from 0 to 1. What is the effect on the same kid’s
score? It’s a thought experiment. I can imagine it better if
we’re assigning treatments. More in Chapters 9-10.
Stat 505
Gelman & Hill, Chapter 3
3.3 Interactions
3.4 Inference
Terminology
Effect of IQ depends on the level of HS.
Plug in 0’s for HS to get estimates for HS grads, 1’s give
adjustments to intercept and slope. What if HS were a factor with
levels “noHS” and “HS”? How would R code them?
Units – of analysis (not of measurement) may be subjects
Predictors
Outcome (response) variable
Note advice about finding interactions: tend to appear when main
effects are large. Really??
Matrix notation
Distributions: y ∼ N(Xβ, σ 2 V) with, for now, V = I.
G & H have an R package called arm containing the data and
functions like display
Helps to center predictors about their mean.
Stat 505
Gelman & Hill, Chapter 3
Stat 505
OLS estimator
Gelman & Hill, Chapter 3
Centered X
b = (XT X)−1 XT y is Var(β)
b = σ 2 (XT X)−1 = σ 2 Vβ
Variance of β
If two predictors are linearly independent, what will the covariance
of their estimated coefficients be?
xtable(summary(fit1))
xtable(cbind(summary(fit1, cor=TRUE)$cor, summary(fit2, cor
(Intercept)
mom.iq
Estimate
25.7998
0.6100
Std. Error
5.9174
0.0585
t value
4.36
10.42
Pr(>|t|)
0.0000
0.0000
Std. Error
0.8768
0.0585
t value
98.99
10.42
Pr(>|t|)
0.0000
0.0000
(Intercept)
mom.iq
(Intercept)
1.00
-0.99
mom.iq
-0.99
1.00
(Intercept)
1.00
0.00
xtable(summary(fit2))
(Intercept)
centeredIQ
Estimate
86.7972
0.6100
Stat 505
Gelman & Hill, Chapter 3
Stat 505
Gelman & Hill, Chapter 3
centeredIQ
0.00
1.00
Centering
Welcome to Bayesian Inference
xtable(anova(fit1, fit2))
1
2
Res.Df
432
432
RSS
144137.34
144137.34
Df
Sum of Sq
-0
0.00
F
Pr(>F)
Note in Figure 3.7, βb is known and labeled on the x axis.
Frequentists would consider unknown β the center for the sampling
b i.e. β
b ∼ N(β, σ 2 Vβ ). Our authors use a
distribution of β,
b σ 2 Vβ ), our ignorance about the
Bayesian interpretation, β ∼ N(β,
location of true β is expressed as a posterior distribution.
print(xtable(matrix(fivenum(predict(fit1) - predict(fit2)),nrow=1)), include.colnames=FALSE)
1
-0.00
-0.00
1T (x − 1x) = 1T x − 1T 1x =
P
Stat 505
-0.00
-0.00
0.00
xi − nx = 0
Gelman & Hill, Chapter 3
Variances
Stat 505
Gelman & Hill, Chapter 3
Comparing Means
Compute pairwise comparisons of breakage by tension level.
For any estimable linear combination of the coefficients, the
variance is:
b = σ 2 λT V λ
Var(λT β)
β
If λ is a single column what is the dimension?
If Λ has four columns (each making an estimable linear
combination) what are the dimensions and what are the
components of
b =
Var(ΛT β)
coef(warp.fit <- lm(breaks ~ tension, warpbreaks))
## (Intercept)
##
36.4
tensionM
-10.0
tensionH
-14.7
Lambda <- matrix(c( 0,1,0, 0,0,1, 0,1,-1),byrow=TRUE,3,3)
as.numeric(Lambda %*% coef(warp.fit))
## [1] -10.00 -14.72
4.72
sqrt(diag( Lambda %*% vcov(warp.fit) %*% t(Lambda)))
## [1] 3.96 3.96 3.96
Stat 505
Gelman & Hill, Chapter 3
Stat 505
Gelman & Hill, Chapter 3
Another Generalized Inverse
Two Fits
coef(warp.fit2 <- lm(breaks ~ tension -1, warpbreaks))
## tensionL tensionM tensionH
##
36.4
26.4
21.7
coef(warp.fit <- lm(breaks ~ tension, warpbreaks))
Lambda2 <- matrix(c( 1,-1,0, 1,0,-1, 0,1,-1),byrow=TRUE,3,3)
as.numeric(Lambda2 %*% coef(warp.fit2))
## (Intercept)
##
36.4
tensionM
-10.0
tensionH
-14.7
coef(warp.fit2 <- lm(breaks ~ tension -1, warpbreaks))
## [1] 10.00 14.72
4.72
## tensionL tensionM tensionH
##
36.4
26.4
21.7
Lambda2 %*% vcov(warp.fit2) %*% t(Lambda2)
##
[,1] [,2] [,3]
## [1,] 15.68 7.84 -7.84
## [2,] 7.84 15.68 7.84
## [3,] -7.84 7.84 15.68
as.numeric(Lambda2 %*% coef(warp.fit2))
## [1] 10.00 14.72
4.72
See how the two var-cov matrices compare:
vcov(warp.fit)
Stat 505
Gelman & Hill, Chapter 3
Compare the fits
Stat 505
Gelman & Hill, Chapter 3
Residuals
See how the two var-cov matrices compare:
xtable(vcov(warp.fit))
(Intercept)
tensionM
tensionH
(Intercept)
7.84
-7.84
-7.84
tensionM
-7.84
15.68
7.84
tensionH
-7.84
7.84
15.68
xtable(vcov(warp.fit2))
tensionL
tensionM
tensionH
tensionL
7.84
-0.00
-0.00
Stat 505
tensionM
-0.00
7.84
-0.00
tensionH
-0.00
-0.00
7.84
Gelman & Hill, Chapter 3
b = y − Xβ
b
ri = yi − xi β
P
s2 = σ
b2 = T /(n − k) = ri2 /(n − k) where n is the number of
rows and k is the rank of X.
2
The sampling distribution of σσb2 is χ2n−k .
2
R 2 = 1 − σbs 2 is proportion of variance of y explained by the model.
y
The model must contain an intercept.
p 42 Do the authors drop predictors with coefficients close to 0 (in
SE’s)? Stay tuned for §4.6
Stat 505
Gelman & Hill, Chapter 3
Drawing Lines
Interaction Lines
kidfit.4 <- lm (kid.score ~ mom.hs * mom.iq, kids)
plot(kid.score ~ mom.iq, data=kids,xlab="Moms IQ",
ylab="Kids Score", col= mom.hs+2, pch=20)
curve (cbind (1, 1, x, 1*x) %*% coef(kidfit.4), add=TRUE,
col="darkgreen",lwd=2)
curve (cbind (1, 0, x, 0*x) %*% coef(kidfit.4), add=TRUE,
col="red",lwd=2)
140
140
kidfit.2 <- lm (kid.score ~ mom.iq, kids)
plot ( kid.score~mom.iq,data=kids, xlab="Moms IQ",
ylab="Kids Score", col= mom.hs+2, pch=20)
curve (coef(kidfit.2)[1] + coef(kidfit.2)[2]*x, add=TRUE, col="blue")
fit.3 <- lm (kid.score ~ mom.hs + mom.iq, kids)
curve (cbind (1, 1, x) %*% coef(fit.3), add=TRUE, col="darkgreen",lwd=2)
curve (cbind (1, 0, x) %*% coef(fit.3), add=TRUE, col="red",lwd=2)
●
●
●
●
●
●
●
●
●
●
●● ●
● ●
●●
● ●
●●
● ●●●●
●
● ●●●
●
●
●
●
●
●
●
●
●
●● ● ●
● ● ● ●● ●●● ●● ●
●
●
●
● ● ● ●●
●● ●
●
● ● ●
●●
●●
●●
● ●
● ●
●● ●
●
●●●
●
●
● ●● ●
● ●●●
● ● ●●
●●●
●●
●
●
●●
●
●
●●●
● ●
●
●
●
●●
●●
●● ● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●●
● ●
●
●● ● ●●●
● ●● ●●
●
●
●
●●●
●
● ●
●●
●●
●
●●
●
●●●●●
●●
●
●
●
● ●●
●
●●
●●
●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ●
● ●
●●●
●
● ● ●
●
●
● ●
●● ● ● ●●
●●
●●●●●
●●●
●
●
●
●
●●
●● ●
●
●●
●●●
●
●
●
●
●●
●●
●●
●
●
● ●●
●
●● ●●● ●
● ●
●
●●
●●
● ●●
● ● ●●●
●●
●
●
●
●●● ●
●
●
●●●●
●●
●
● ●
●●
● ●●
●
●
●
●
●●
● ● ●●
●
●
●●
●
●●●
● ●
●●
●
●● ●
●
●
●
● ●●
●
●
● ●● ●●
●● ●
●
●●
●
● ● ●
●
●
●●
●
●
●
●
●
●
●
● ● ● ● ●●
●
●●
● ●●
●
●
●
●
●●●
●
● ●
● ●
●●
●
●
●
● ●●●●
●
120
140
20
q
Moms
IQ
For each row of data, SE(ŷi )) =s xi Vβ xi T
kid.score
●
●
●
120
●
120
130
140
100
60
20
Kids Score
How well do we predict new points?
Moms IQ
0
160
1
●
120
140
mom.iq
Stat 505
110
mom.hs
●
100
100
predictDF <- data.frame(mom.iq=rep(7:14*10,2), mom.hs=rep(0:1,each=8))
predictDF <- cbind(predictDF, predict(kidfit.4, newdata=predictDF, inte
names(predictDF)[3] <- "kid.score"
predictDF$mom.hs <- factor(predictDF$mom.hs )
myplot + geom_smooth(aes(ymin = lwr, ymax = upr), data = predictDF, s
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●●
●
●
●
●
● ●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●● ●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●●
●● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
90
Gelman & Hill, Chapter 3
Aqnew point has extra variance (σ̂ 2 so now the predicitive error is
s 1 + xi Vβ xi T
150
80
Stat 505
Gelman & Hill, Chapter 3
kid.score
100
60
Kids Score
70fit of
80the 90
How wobbly is the
line? 100
50
●
70
Uncertainty
2 80
Uncertainty
100
●
●
●
●
●
●
●● ●
●
●●
● ●
● ●● ● ●
● ●
● ●● ●
●●●
●
●
●
●
●
●
●
●
●● ●
●
● ●
● ● ● ●●● ●
● ●
●
●● ●
●
●
●●
● ●
●●
● ●
● ●● ● ●
● ● ●
●
● ● ●●●●● ●●● ● ● ●
●●●●
●
●
● ● ●●
● ●●
●●
●
● ●●
● ●●●
●●
●
●
●
●●
●● ● ● ● ●● ● ●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ●●●
● ●
●
●
●
● ●● ● ● ●
● ●
●●●●● ●
● ●
●
●●● ●●
●
●
●●● ● ●●● ●●
●●
● ●●●●● ●
● ●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●● ●
●
●●
● ● ●●
●●●● ● ● ● ●● ●
●●
●● ●
●
●
● ● ● ●● ●●
●
●● ●
● ●
● ●●
●
●● ●
● ●●● ●● ●●●
●
●
●
● ●● ●●●
●
●●
●●● ●
●● ● ● ●
●●
●● ● ● ●●
●
●
● ●
● ●●
● ● ●● ●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
● ●
●● ●
●● ●
●
●
●
● ●●
●
●
● ●● ● ●
● ● ●
●●
●●
● ● ●
●●
●
● ●●
●
●
●
●
●
● ● ● ● ●●
●
●●
● ●●
●
●
●
●
●
●●●●● ● ●●
●
●
●
●
●
●
●
● ●
●
Gelman & Hill, Chapter 3
●
●
●
●
Stat 505
●
●
●
80
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●●
●
●
●● ●
●
●
●● ●
●
●●
●
●
●
●
●
●
●
●
●● ●
●
●●
●● ●
Stat
505
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
mom.hs
●
●
0
1
Gelman & Hill, Chapter 3
The Usual Assumptions
Diagnostic plots
par(mfrow=c(1,4))
plot(kidfit.4)
Normality of errors. Never check raw y for normality.
Residuals vs Leverage
273 ●
111 ●
●
●
●
● ●
●
● ●
●
●
●
● ●
●
●●
●
● ●
● ●●
●
●
● ● ●●
●●
● ●
●
●
●
● ●●
●
● ●● ●●
●
● ● ●● ●
●
● ● ●
● ●
● ●●
●
●
● ●
●● ●
●
●
● ●●
●
●
●
●●
●●● ●●
●●
●
●
● ● ● ● ●
● ●● ●●●● ●●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
● ●
●●●
● ● ●
● ●●●
●
●
●
●●
●●● ● ●● ●
●
●
●● ●●
●
●●
●● ●●
●
●●
●● ● ● ●
● ● ●● ●
● ● ●●
●
●● ●
●
●●
●
●
● ●
●
●
● ● ● ● ●●●
●
● ●● ● ● ●●
● ● ●●●
● ●● ●
●
●●
●●
●
● ● ●●
●
●
●●● ●
●
●
● ●
●●●●
● ●● ●
● ●
●
● ●
● ●●
● ● ●●
●
● ●
● ●●●● ● ● ●
●
● ● ●●
● ●
●● ●
●● ●●●●
●
●
● ● ●● ●
●
●●
●
●
●
●
●●
●
● ●
●●●
●●
●
●
●
●
●
●
● ●● ●●
● ●●
●●
●
●● ●●● ●●
●
●
●
●
● ●
● ●
●
● ●
●
●
● ● ● ●
●
●●
●
●
●
●
●
●
● ●●
●●
●
●
●●
●●●● ●
●●
●
● ●● ●
●
● ●
●●
● ● ●
●● ●
● ● ● ●
●
●
●●
●
● ● ●●
●
●
●
●
●
●● ●
●
●●● ● ●
●
●
● ● ●
●
●
80
90
Fitted values
100
110
−3
−2
−1
0
1
2
3
60
70
Theoretical Quantiles
80
90
Fitted values
100
●
213 ●
●
●●
●●
●
●●● ● ●
●
●● ●●
●
●●
●
●●
●
●●● ●●
●
●●●
●
● ●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ●
●
●
●●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●● ● ●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●●● ● ● ● ●
●●●
●
●
●●
●
●
●
●
●
●
●
● ●● ●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●● ● ●● ● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●● ●
●
●
●
●●
●
●●●
●● ●
●
●
●
●
●
●
●●
●
●
● ●● ●
●
●
●
●
● ●●
●
●
● ●●
●
●● ●
●●● ●
●
●
●
●●
●
●
●●
●
●
●
●
●
● ●
●
●
●●● ●
●
●
●
●
●
●
●
● ●●
●
●
● ●●
●
●●● ●
●
●
●
●●
●
●
● ● ●●
●
●
●●
● ●
●
●
●
●● ●
●
●●
●
●
●
●●● ● ●
●
●
● ●●
●●
●
●
● ●●
●
●
●
●
●
●
● ●
2
●●
110
1
0
−1
●
Standardized residuals
●
−2
1.5
1.0
0.5
Standardized residuals
●
●
−3
111 ●
273 ●
● 286
70
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
111
286
●●
● 273
0.0
2
1
0
−1
Standardized residuals
20
−40
●
60
5
●
−3
Constant variance = homo– not hetero–scedastic
0
Residuals
4
−20
Independent errors.
●
●●●
●
−2
Model is valid – properly specified.
3
Scale−Location
● 286
●
●
●
●
●
● ●
●
●
●
●
●●
●●
● ●
●
● ● ●●
●●
●
●
●
●
●●●● ●
●●
● ● ●●●
● ●●●
●
●●
●
●
●
●
●●
●
●
●●
● ● ●●
●●
●
●●
●
●
●●
●
●●
●
●● ●
● ●●
●●●●●●
●
● ●●●
●
●
●
●
●●●
●●●●● ●
● ●●● ●
●
●
● ● ●●
●
●
●
●
●
●●●● ● ●
●
● ● ●●●●●
●●●
●●
●
●
●●●
●
●
●●
● ●●●●
●
● ●●
●
●●● ● ●
●
●
● ●●● ●
●● ● ●●
●
●●●
●●
● ●●
●
●
●
●● ● ●
●
● ●●
●
●● ● ●● ●
●●●
●●
●● ● ●
●
● ●
●● ●
●●
●●●
●
●
● ●● ●●●●
● ●
●●
●
●
●● ●
● ● ●●●●●
● ●
●● ●
●● ●
●
●●● ● ●
●
●● ● ●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●
●
●
●● ●●
● ●
● ● ●
● ● ●●●
● ●●
●● ●
● ●
●
●● ●
●
●
●●
●
●
●
●● ●
●
● ●● ●
● ●
●●●●
●● ● ●●
● ● ● ● ●●
●
●
●
● ●
●
●
● ● ●●
●●●
● ● ●
●
●● ●●
●
●
● ●
●
●
● ●
●
●
●
●
●● ● ● ●
●
● ●●
●
●
●
● ● ●
●
●
●
● ●
●
●
●●
● ●
●
●
●
● ●
●
−60
2
Normal Q−Q
3
Residuals vs Fitted
3
Data are valid – will answer our question.
40
1
●
● 286
●
●
●
●
●
●
●
●
111 ●
Cook's distance
0.00
0.02
0.04
0.06
0.08
0.10
Leverage
Check validity with new data. (extrapolate?)
Data used to create the estimated model always fits it better than
data which got no “say” in estimation.
Later: “Can the model generate data like the data we see?”
Stat 505
Gelman & Hill, Chapter 3
Stat 505
Gelman & Hill, Chapter 3
Download