330.Lect23 - Department of Statistics

advertisement
Stats 330: Lecture 23
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 1
Plan of the day
In today’s lecture we continue our discussion
of the multiple logistic regression model
Topics covered
– Models and submodels
– Residuals for Multiple Logistic Regression
– Diagnostics in Multiple Logistic Regression
– No analogue of R2
Reference: Coursebook, sections 5.2.3, 5.2.3
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 2
Comparison of models
• Suppose model 1 and model 2 are two
models, with model 2 a submodel of
model1
• If Model 2 is in fact correct, then the
difference in the deviances will have
approximately a chi-squared distribution
• df equals the difference in df of the
separate models
• Approximation OK for grouped and
ungrouped data
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 3
Example: kyphosis data
• Is age alone an adequate model?
> age.glm<-glm(Kyphosis~Age+I(Age^2),family=binomial,
data=kyphosis.df)
Null deviance: 83.234 on 80 degrees of freedom
Residual deviance: 72.739 on 78 degrees of freedom
AIC: 78.739
Full model has deviance 54.428 on 76 df
Chisq is 72.739 - 54.428 = 18.311 on 78-76=2 df
> 1-pchisq(18.311,2)
[1] 0.0001056372
Highly significant: need at least one of start and number
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 4
Anova in R
Two-model form:
comparing
> anova(age.glm,kyphosis.glm, test=“Chi”)
Analysis of Deviance Table
Model 1:
Model 2:
Resid.
1
2
Kyphosis ~ Age + I(Age^2)
Kyphosis ~ Age + I(Age^2) + Start + Number
Df Resid. Dev Df Deviance P(>|Chi|)
78
72.739
76
54.428 2
18.311 0.0001056 ***
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 5
Residuals
• Two kinds of residuals
– Pearson residuals
• useful for grouped data only
• similar to residuals in linear regression,
actual minus fitted value
– Deviance residuals
• useful for grouped and ungrouped data
• Measure contribution of each covariate
pattern to the deviance
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 6
Pearson residuals
Pearson residual for pattern i is
ri  niˆ i
niˆ i (1  ˆ i )
Probability predicted by
model
Standardized to have approximately unit
variance, so big if more than 2 in
absolute value
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 7
Deviance residuals (i)
• For grouped data, the deviance is
M
 ri 
 ni  ri 
  2(ni  ri ) log 

deviance   2ri log 
 niˆ i 
 ni  niˆ i 
 i 1
M
  d i2 where
i 1
1/ 2

 ri 
 ni  ri 
  2(ni  ri ) log 

d i   2ri log 
 niˆ i 
 ni  niˆ i 

d i is  ve if ri  niˆ i , and - ve otherwise
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 8
Deviance residuals (i)
• Thus, the deviance can be written as the
sum of squares of M quantities d1, …, dM
,one for each covariate pattern
• Each di is the contribution to the deviance
from the ith covariate pattern
• If deviance residual is big (more than
about 2 in magnitude), then the covariate
pattern has a big influence on the
likelihood, and hence the estimates
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 9
Calculating residuals
> pearson.residuals<-residuals(budworm.glm,
type="pearson")
> deviance.residuals<-residuals(budworm.glm,
type="deviance")
> par(mfrow=c(1,2))
> plot(pearson.residuals, ylab="residuals",
main="Pearson")
> abline(h=0,lty=2)
> plot(deviance.residuals, ylab="residuals",
main="Deviance")
> abline(h=0,lty=2)
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 10
Pearson
-3
-2
-2
-1
0
residuals
-1
residuals
0
1
1
2
2
Deviance
2
4
6
Index
© Department of Statistics 2012
8
10
12
2
4
6
8
10
12
Index
STATS 330 Lecture 23: Slide 11
Diagnostics: outlier detection
• Large residuals indicate covariate patterns poorly
fitted by the model
• Large Pearson residuals indicate a poor match
between the “maximum model probabilities” and the
logistic model probabilities, for grouped data
• Large deviance residuals indicate influential points
• Example: budworm data
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 12
Diagnostics: detecting nonlinear regression functions
• For a single x, plot the logits of the
maximal model probabilities against x
• For multiple x’s, plot Pearson residuals
against fitted probabilities, against
individual x’s
• If the data has most ni’s equal to 1, so
can’t be grouped, try gam (cf kyphosis
data)
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 13
Example: budworms
1
0
-1
-3
-2
Pearson residuals
• Plot Pearson
residuals versus
dose, plot shows a
curve
2
Pearson residuals vs dose
0
5
10
15
20
25
30
budworm.df$dose
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 14
Diagnostics: influential points
Will look at 3 diagnostics
– Hat matrix diagonals
– Cook’s distance
– Leave-one-out Deviance Change
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 15
Example: vaso-constriction
data
Data from study of reflex vaso-constriction
(narrowing of the blood vessels) of the
skin of the fingers
– Can be caused caused by sharp intake of
breath
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 16
Example: vaso-constriction
data
Variables measured:
Response = 0/1
1=vaso-constriction occurs, 0 = doesn’t occur
Volume: volume of air breathed in
Rate: rate of intake of breath
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 17
Data
Volume
1
3.70
2
3.50
3
1.25
4
0.75
5
0.80
6
0.70
7
0.60
8
1.10
9
0.90
10
0.90
11
0.80
12
0.55
13
0.60
. . . 39
© Department of Statistics 2012
Rate Response
0.825
1
1.090
1
2.500
1
1.500
1
3.200
1
3.500
1
0.750
0
1.700
0
0.750
0
0.450
0
0.570
0
2.750
0
3.000
0
obs in all
STATS 330 Lecture 23: Slide 18
Plot of data
> plot(Rate,Volume,type="n", cex=1.2)
> text(Rate,Volume,1:39,
col=ifelse(Response==1, “red",“blue"),
cex=1.2)
> text(2.3,3.5,“blue: no VS", col=“blue",
adj=0, cex=1.2)
> text(2.3,3.0,“red: VS", col=“red", adj=0, cex=1.2)
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 19
Plot of volume versus rate, with ID numbers shown
3.5
1
2
red: VS
3.0
17
blue: no VS
2.5
31
Note points 4
and 18
2.0
16
30
1.0
1.5
Volume
32
29
19
0.5
20
25
24
14
23 39
3
35
8 33
34
22
28
37
18
4
38
26
21
10
9
11
7
0
27
1
2
5 36
12 13
6
15
3
Rate
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 20
Enhanced residual plots
> vaso.glm = glm(Response ~ log(Volume) + log(Rate),
family=binomial, data=vaso.df)
> pear.r<-residuals(vaso.glm, type="pearson")
> dev.r<-residuals(vaso.glm, type="deviance")
> par(mfrow=c(1,2))
> plot(pear.r, ylab="residuals", main="Pearson",type="n")
> text(pear.r,cex=0.7)
> abline(h=0,lty=2)
> abline(h=2,lty=2,lwd=2)
> abline(h=-2,lty=2,lwd=2)
> plot(dev.r, ylab="residuals", main="Deviance",type="h")
> text(dev.r, cex=0.7)
> abline(h=0,lty=2)
> abline(h=2,lty=2,lwd=2)
> abline(h=-2,lty=2,lwd=2)
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 21
Pearson
Deviance
4
4
3
2
18
2
18
1
29
39
31
29
0
1
15
3
14
2
7
1011
9
16
17
3435
36
21
30
26
-1
21
30
26
22
38
-1
38
28
19
32
1011
9
7
12
13
8
20
16
17
2
32
22
12
14
1
25 27
20
25 27
3
39
31
6
15
0
5
3435
36
5
residuals
1
residuals
6
13
28
37
23
8
19
37
23
33
33
24
0
10
20
Index
© Department of Statistics 2012
24
30
40
0
10
20
30
40
Index
STATS 330 Lecture 23: Slide 22
Diagnostics: Hat matrix
diagonals
• Can define hat matrix diagonals (HMD’s)
pretty much as in linear models
• HMD big if HMD > 3p/M
(M= no of covariate patterns)
• Draw index plot of HMD’s
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 23
Plotting HMD’s
>
>
>
>
HMD<-hatvalues(vaso.glm)
plot(HMD,ylab="HMD's",type="h")
text(HMD,cex=0.7)
abline(h=3*3/39, lty=2)
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 24
0.25
Obs 31 highleverage
0.20
31
29
0.15
13
HMD's
6
12
15
19
36
0.10
5
22
38
18
1
4
23
0.05
3
8
27 28
25
14
2
24
20
16
26
34
33
30
37
35
39
21
9
0.00
17
7
10 11
32
0
10
20
30
40
Index
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 25
Hat matrix diagonals
• In ordinary regression, the hat matrix diagonals
measure how “outlying” the covariates for an
observation are
• In logistic regression, the HMD’s measure the
same thing, but are down-weighted according to
the estimated probability for the observation.
The weights gets small if the probability is close
to 0 or 1.
• In the vaso-constriction data, points 1,2,17 had
very small weights, since the probabilities are
close to 1 for these points.
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 26
Plot of volume versus rate, with ID numbers shown
Note points
1,2,17
3.5
1
2
red: VS
3.0
17
blue: no VS
2.5
31
2.0
16
30
1.0
1.5
Volume
32
29
19
0.5
20
25
24
14
23 39
3
35
8 33
34
22
28
37
18
4
38
26
21
10
9
11
7
0
27
1
2
5 36
12 13
6
15
3
Rate
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 27
Diagnostics: Cooks distance
• Can define an analogue of Cook’s distance for
each point
CD = (Pearson resid )2 x HMD/(p*(1-HMD)2)
p = number of coeficients
• CD big if more than about 10% quantile of the chisquared distribution on k+1 df, divided by k+1
• Calculate with qchisq(0.1,k+1)/(k+1)
• But not that reliable as a measure
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 28
Cooks D: calculating and
plotting
p<-3
CD<-cooks.distance(vaso.glm)
plot(CD,ylab="Cook's D",type="h",
main="index plot of Cook's distances")
text(CD, cex=0.7)
bigcook<-qchisq(0.1,p)/p
abline(h=bigcook, lty=2)
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 29
Points 4 and 18
influential
index plot of Cook's distances
0.4
4
0.2
0.1
Cook's D
0.3
18
19
29
6
0.0
1 2 3
0
23
8
5
7
31
24
13
12
9 10 11
10
14
33
28
15
16 17
20 21
20
22
25 26 27
30
30
32
34 35
36
37
38 39
40
Index
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 30
Diagnostics: leave-one-out
deviance change
• If the ith covariate pattern is left out, the
change in the deviance is approximately
(Dev. Res) 2 + (Pearson. Res)2HMD/(1-HMD)
Big if more than about 4
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 31
Deviance change: calculating
and plotting
> dev.r<-residuals(vaso.glm,type="deviance")
> Dev.change<-dev.r^2 + pear.r^2*HMD/(1-HMD)
> plot(Dev.change,ylab="Deviance change",
type="h")
> text(Dev.change, cex=0.7)
> bigdev<-4
> abline(h=bigdev, lty=2)
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 32
4 and 18
influential
Deviance change
6
4
4
3
Deviance change
5
18
2
24
33
19
23
8
28
29
37
1
13
5
0
1 2 3
0
39
31
6
34 35
12
7
9 10 11
10
15
14
22
16 17
20
21
20
36
38
27
25
26
30
30
32
40
Index
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 33
All together
Leverage plot
Index plot of deviance residuals
4
0.20
2
0.10
Leverage
1
0
0.00
-1
Deviance Residuals
31
18
10
0
30
20
30
Observation number
Observation Number
Cook's Distance Plot
Deviance Changes Plot
2
3
4
5
18
0
1
Deviance changes
0.1
0.2
0.3
18
40
4
6
4
0.4
10
0
40
0.0
Cook's Distance
20
0
10
20
30
Observation number
40
0
10
20
30
40
Observation number
influenceplots(vaso.glm)
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 34
Should we delete points?
• How influential are the 3 points?
• Can delete each in turn and examine
changes in coefficients, predicted
probabilities
• First, coefficients:
Deleting:
None
31
4
18
All 3
(Intercept) -2.875 -3.041 -5.206 -4.758 -24.348
log(Volume) 5.179 4.966 8.468 7.671 39.142
log(Rate)
4.562 4.765 7.455 6.880 31.642
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 35
Should we delete points (2)?
• Next, fitted probabilities:
Fitted at None
31
point 31 0.722 0.627
point 4
0.075 0.073
point 18 0.106 0.100
delete points
4
18 4 and 18
0.743 0.707
0.996
0.010 0.015
0.000
0.018 0.026
0.000
All 3
0.996
0.000
0.000
• Conclusion: points 4 and 18 have a big
effect.
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 36
Should we delete points (3)?
• Should we delete?
• They could be genuine – no real evidence
they are wrong
• If we delete them, we increase the
regression coefficients, make fitted
probabilities more extreme
• Overstate the predictive ability of the
model
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 37
Residuals for ungrouped data
• If all cases have distinct covariate
patterns, then the residuals lie along two
curves (corresponding to success and
failure) and have little or no diagnostic
value.
• Thus, there is a pattern even if everything
is OK.
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 38
Formulas
• Pearson residuals: for ungrouped data,
residual for i th case is
1  ˆ i
, y 1
ˆ i
ˆ i

,y 0
1  ˆ i
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 39
Formulas (cont)
• Deviance residuals: for ungrouped data,
residual for i th case is
2 | log ˆ |, y  1
 2 | log(1  ˆ ) |, y  0
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 40
Use of plot function
Normal Q-Q plot
3
Residuals vs Fitted
77
2
77
-6
-4
-2
0
1
0
-1
-3
43
-2
Std. deviance resid.
1
0
-1
-2
Residuals
plot(kyphosis.glm)
46
2
46
43
2
-2
Predicted values
2
Cook's distance plot
0.4
0.3
0.2
Cook's distance
1.0
0.0
0.5
77
25
0.1
1.5
43
0.0
S td. devianc e res id.
1
43
46
-6
-4
-2
0
Predicted values
© Department of Statistics 2012
0
Theoretical Quantiles
Scale-Location plot
77
-1
2
0
20
40
60
80
Obs. number
STATS 330 Lecture 23: Slide 41
Analogue of R2?
• There is no satisfactory analogue of R2 for
logistic regression.
• For the “small m big n” situation we can
use the residual deviance, since we can
obtain an approximate p-value.
• For other situations we can use the
Hosmer –Lemeshow statistic (next slide)
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 42
Hosmer-Lemeshow statistic
• How can we judge goodness of fit for ungrouped
data?
• Can use the Hosmer-Lemeshow statistic, which
groups the data into cases having similar fitted
probabilities
– Sort the cases in increasing order of fitted
probabilities
– Divide into 10 (almost) equal groups
– Do a chi-square test to see if the number of
successes in each group matches the estimated
probability
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 43
Kyphosis data
Divide probs into 10 classes : lowest 10%, next 10%......
Class 1 Class 2 Class 3 Class 4 Class 5
Observed 0’s
9
8
8
7
8
Observed 1’s
0
0
0
1
0
Total obs
9
8
8
8
8
Expected 1’s
0.022
0.082
0.199
0.443
0.776
Class 6 Class 7 Class 8 Class 9 Class 10
Observed 0’s
8
5
5
3
3
Observed 1’s
0
3
3
5
5
Total obs
8
8
8
8
8
Expected 1’s 1.023
1.639
2.496
3.991
6.328
Note: Expected = Total.obs x average prob
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 44
In R, using the kyphosis data
Result of fitting
model
> HLstat(kyphosis.glm)
Value of HL statistic =
P-value = 0.592
6.498
A p-value of less than 0.05 indicates problems.
No problem indicated for the kyphosis data –
logistic appears to fit OK.
The function HLstat is in the “330 functions”
© Department of Statistics 2012
STATS 330 Lecture 23: Slide 45
Download