V a riable selection:

advertisement
Variable selection:
Suppose for the i-th observational unit (case)
you record
(
Yi = 01 failure
success
and explanatory variabales
Z1i Z2i
Zri
Variable (or model) selection:
{ subject matter theory and expert opinion
{ t models to test hypotheses of interest
{ stepwise methods
Consider a logistic regression model
i = P r(Yi = 1 Z1i Z2i
Zri)
and
log
i
!
1 i
= 0 + 1Z1i + 2Z2i + + r Zri
Which explanatory variables should be included
in the model?
backward elimination
forward selection
stepwise selection
the user supplies a list of potential variables
consider \diagnostics"
maximize a \penalized" likelihood
1101
1102
Retrospective study of bronchopulmonary dysplasia (BPD) in new born infants.
Background variables:
Observational Units:
Sex
0 = female
1 = male
YOB
year of birth
248 infants treated for respiratory distress
syndrome (RDS) at Stanford Medical
Center between 1962 and 1973, and
who received ventilatory assistance by
intubation for more than 24 hours, and
who survived for 3 or more days.
BPD =
GEST
gestational age (weeks
BWT
birth weight (grams)
10)
AGSYM age at onset of respiratory symptoms
(hrs 10)
Binary response:
(
APGAR one minute APGAR score (0{10)
1 stage III or IV
2 state I or II
RDS
Suspected causes:
Duration and level of exposure to oxygen therapy and intubation.
1103
severity of initial X-ray for RDS
on a 6 point scale from
0 = no RDS seen
to
5 = very severe case of RDS
1104
Methods for assessing the adequacy of models:
Treatment Variables
INTUB duration of endotracheal intubation (hrs)
VENTL duration of assisted ventilation (hrs)
LOWO2 hours of exposure to 22{49% level of
elevated oxygen
MED02 hours of exposure to 40{79% level of
elevated oxygen
HI02
hours of exposure to 80{100% level of
elevated oxygen
1. Goodness-of-t tests
2. Procedues aimed at specic alternatives
such as tting extended or modied models.
3. Residuals and other diagnostic statistics
designed to detect anomalous or inuencial cases or unexpected patterns.
{ Inspection of graphs
{ Case deletion diagnostics
{ Measures of t
AGVEN age at onset of ventilatory assistance
(hours)
1106
1105
Logistic regression model for the BDP
data:
Overall assesment of t:
An overall measures similar to R2
tted model
log-likelihood
!
LM L0
LS L0
"
2
2
= G0G2GM
-
log
0
!
^i
1 ^i
log-likelihood
for model
containing only
an intercept
L
L
0
M
This becomes L0 (since LS = 0) when
there are n observational units that provide n
independent Bernoulli observations.
L0
LM
L0
=
2L0
( 2L M )
306:520 97:770
=
= :681
2L0
306:520
1107
12:729 + 0:7686 LNL
1:7807 LNM + :3543 (LNM)2
log-likelihood
for saturated
model
For the BPD data and the model selected with
backward elimination
=
2:4038 LNH + :5954 (LNH)2
+ 1:6048 LNV
2
6
6
4
concordant = 97:0%
discordant = 2:9%
GAMMA = 0:941
3
7
7
5
1108
Goodness of t tests:
H0
8
<
:
: proposed model is correct
: any other model
Pearson chi-squared test:
n
X
2
X
(Yij
i=1 j =1
Deviance:
G2 = 2
n
X
n (Y
^ij )2
ni^ij )2 X
ij ni
=
ni^ij
^ij (1 ^ij )
i=1 ni
2
X
i=1 j =1
(
HA
X2 =
When each response is an independent
Bernoulli observation
1 success
Yi = 0 failure i = 1; 2; : : : ; n
and
i = P r Yi = 1 Z1i; ; Zri
and there is only one response for each
pattern of covariates (Z1i; : : : ; Zri)
Yij log Yij =ni^ij
)
Then, for testing
i
H0 : log
1 i = 0+1Z1i+ +kZki
versus
HA : 0 < i < 1 (general alternative)
0
1
@
A
1109
1110
neither
G2
n
= 2 i=1 Yi log ^
n
+2 i=1
(1 Yi)log 11 Y^ii
0
X
@
X
1
Yi A
i
0
1
@
A
nor
In this situation, G2 and X 2 tests
for comparing two (nested) models
may still be well approximated by chisquared distributions when k+1 =
= r = 0.
H0 : log 1 = 0 + 1Z1i + + k Zki
HA : log 1 = 0 +1Z1i + +k Zki
+ k+1Zk+1;i + + rZr;i
(Yi ^i)2
X2 =
^i(1 ^i)
i=1 n
X
i
i
i
i
is well approximated by a chi-square
distribution when H0 is true, even for
large n.
1111
1112
Deviance:
G2
= 2 i=1 Yi log ^^
n
0
X
@
Insignicant values of G2 and X 2
1
mi;A A
mi;0
1. Only indicate that the alternative
model oers little or no improvement over the null model
Pearson statistic
n (^
mi;A m
^ i;0)2
X2 =
m
^ i;0
i=1
X
have approximate chi-squared distributions with r k degrees of freedom
when
(r k)=n is small
k+1 = = r = 0
(S. Haberman, 1977 Annals of Statistics.).
2. Do not imply that either model is
adequate.
These are the types of comparisons you make with stepwise
variable selection procedures.
1114
1113
Hosmer-Lemeshow test:
Collect the n cases into g groups. Make
a 2 g contingency table
Groups
1 2
g
(i=1) Y = 1 011 012 01g
(i=2) Y = 2 021 022 02g
0
n1
0
n2
0
ng
Compute a Pearson statistic
2 g (0ik Eik )2
C=
Eik
i=1 k=1
X
The "expected counts" are
0
E2k = nk (1 k)
0
E1k = nk k
where
n0
1
k = n0 j=1 ^j
k
X
k
X
1115
1116
Hosmer and Lemeshow recommend
g = 10 groups
formed as
group 1
For the BPD example:
all observational units with
0 < ^j :1
all observational units with
.1 < ^j :2
group 2
.
.
group 10 all observational units with
:9 < ^j < 1
Reject the proposed model if
C > Xg2 2;
The \lackt" option to the model statement
in PROC LOGISTIC make 10 groups of nearly
equal size
0
0
0
0
1
4
9
16
25
22
25
25
25
25
25
25
25
25
24
25
21
25
16
25
9
25
0
25
0
22
0
"Expected" counts
i
.2-.3
.3-.4
.6-.7
.7-.8
.8-.9
.9-1.0
3
3
5
3
6
2
6
2
1
46
128
18
6
8
1
1
5
1
1
1
1.72
0.28
46.59
0.41
131
4.95
126.05
3.14
17.86
2.67
8.33
\Expected" Counts
3.56 3.11 1.67 7.18
7.44 3.89 1.33 3.82
2.38
0.62
C = 12.46 on 8 d.f. (p-value =
0.132)
* This test often has little power
* Even when this test indicates that
the model does not t well, it says
little about how to improve the
model.
Diagnostics:
Cook, and Weisberg (1982) Residuals and Inuence in Regression, Chapman Hall.
Belsley, Kuh, and Welsch (1980) Regression Diagnostics: Identifying Inuential Data and
Sources of Collinearity, Wiley.
Pregibon (1981) Annals of Statistics 9, 705{724.
1
X
@
.1-.2
1118
1117
BPD=1
(yes)
BPD=2
(No)
^i values
.4-.5 .5-.6
0-.1
^iA
.03
.09
.25
.53
1.25
2.67
7.64
18.1
24.5
22.0
25.0
25
24.9
25
24.7
25
24.5
25
23.7
25
22.3
25
17.4
25
6.9
25
0.50
25
0.01
22
C = 3.41 on 8 d.f. (p-value = .91)
1119
Hosmer and Lemeshow (2000) Applied Logistic Regression. Wiley, 2nd edition.
Collett, D. (1991) Modelling Binary Data, Chapman and Hall, London
Lloyd, C. (1999) Statistical Analysis of Categorical Data, Wiley, Section 4.2.
1120
1
Residuals and other diagnostics:
Pearson residuals:
Kay, R. and Little, S. (1986) Applied Statistics,
35, 16{30 (case study).
Fowlkes, E. B. (1987) Biometrika 74, 503{515
n
= i=1
Yi ni
^i
ri =
ni
^i(1 ^i)
0
@X 2
q
G2
@
Miller, M. E., Hui, S. L., Tierney, W. M. (1991)
Statistics in Medicine 10 1213{1226.
n
Deviance residuals: = i=1
di = sign(Yi ni
^i) jgij
0
Cook, R. D. (1986) JRSS-B, 48, 133{155.
1
ri2A
X
X
1
d2i A
q
where
"
!
ni Yi
Y
gi = 2 Yi log i + (ni Yi) log
ni^i
ni(1 ^1)
1121
PRED
adjusted residuals: OBS
S.E. -RESID
ri
= ^(
q
Adjusted Pearson residual:
r
ri = p i
1 hi
Adjusted deviance residual:
d
di = p i
1 hi
^
Yi nii
q
V Yi nii
^)
^i
= n ^ (1Yi ^ni)(1
i i
i
1122
hi
)
where
hi
1123
is the \leverage" of the
i-th observation
1124
!#
Compare residuals to percentiles of
the standard normal distribution
Cases with residuals larger than 3 or
smaller than -3 are suspicious.
None of these \residuals" may be
well approximated with a standard
normal distribution.
They are too \discrete".
Residual plots
{ versus each explanatory variable
{ versus order (look for outliers
or patterns across time)
{ versus expected counts: ni^i
Smoothed residual plots
1126
1125
What is leverage?
log 1
0
@
2
6
6
6
6
6
6
6
4
i
1
i
log 1 11
A
= 0 + 1X1i + +kXki
i = 1; : : : ; n
3
7
7
7
log 1 22 7
7
7
...
7
5
2
log 1 nn
=
6
6
6
6
4
1
X12
...
1
...
...
1
X11
X1n
Xk 1
Xk 2
Xkn
3
2
7
76
76
74
5
0
1
...
k
3
7
7
5
= X
"
model matrix
1127
In linear regression the \hat matrix" is
H = X (X 0X ) 1X 0
which is a projection operator onto the
column space of X , and
Y^ =
HY
residuals = (I
V (residuals) = (I
)Y
H ) 2
H
1128
Pregibon (1981) uses a generalized
least squares approach to logistic regression which yields a hat matrix
H = V 1=2X (X 0V X ) 1X 0V 1=2
where V is an n n diagonal matrix
with i-th diagonal element
ni
^i(1 ^i) = Vii
and ni is the number of cases with the
i-th covariate pattern.
The i-th diagonal element of
called a leverage value
Call this element hi.
Note that
n
X
i=1
hi
H
is
=k+1
%
number of coeÆcients
When there is one individual for each
covariance pattern, the upper bound
on hi is 1.
1130
1129
INFLUENCE: Analogous to Cook's D for
Cases with large values of hi may be cases with
vectors of covariates that are far away from the
mean of the covariates.
However, such cases can have small hi values
if
^i << :1 or ^i >> :9
An alternative quantity that gets larger as the
vector of covariates gets farther from the mean
of the covariates is
hi
bi = ni^i(1
^i)
see Hosmer + Lemshow
pages 153{155
Look for cases with large leverage values and
see what happens to estimated coeÆcients
when the case is deleted.
1131
linear regression
Dene:
b
b(i)
the m.l.e. for using all n observations
the m.l.e. for when the i-th case
is deleted
A \standardized" distance between b and b(i)
is approximately
Inuence(i) = (b b(i))0(X 0V X ) 1(b b(i))
%
called
Ci in
PROC LOGISTIC
ri2hi
(1 hi)2
hi
= (ri)2
1 hi
=:
%
%
squared adjusted monotone function
residual
of leverage
1132
(
V ar Yi
^ )=_ nii(1
i
i
)(1
hi
)
and an adjusted Pearson residual is
ri
i
= ni^i(1Yi ^i^)(1
= 1 ri hi
q
hi
PROC LOGISTIC approximates the
m.l.e. for with the i-th case deleted
as
b(i) = b b(i)
where
b(i) = (X 0V X ) 1xi Yi1 nhii^i
0
1
@
A
An approximate measure of inuence is
hi
Ci = ri2
(1 hi)2
)
%
square of the Pearson residual
1133
1134
Download