Variable selection: Suppose for the i-th observational unit (case) you record ( Yi = 01 failure success and explanatory variabales Z1i Z2i Zri Variable (or model) selection: { subject matter theory and expert opinion { t models to test hypotheses of interest { stepwise methods Consider a logistic regression model i = P r(Yi = 1 Z1i Z2i Zri) and log i ! 1 i = 0 + 1Z1i + 2Z2i + + r Zri Which explanatory variables should be included in the model? backward elimination forward selection stepwise selection the user supplies a list of potential variables consider \diagnostics" maximize a \penalized" likelihood 1101 1102 Retrospective study of bronchopulmonary dysplasia (BPD) in new born infants. Background variables: Observational Units: Sex 0 = female 1 = male YOB year of birth 248 infants treated for respiratory distress syndrome (RDS) at Stanford Medical Center between 1962 and 1973, and who received ventilatory assistance by intubation for more than 24 hours, and who survived for 3 or more days. BPD = GEST gestational age (weeks BWT birth weight (grams) 10) AGSYM age at onset of respiratory symptoms (hrs 10) Binary response: ( APGAR one minute APGAR score (0{10) 1 stage III or IV 2 state I or II RDS Suspected causes: Duration and level of exposure to oxygen therapy and intubation. 1103 severity of initial X-ray for RDS on a 6 point scale from 0 = no RDS seen to 5 = very severe case of RDS 1104 Methods for assessing the adequacy of models: Treatment Variables INTUB duration of endotracheal intubation (hrs) VENTL duration of assisted ventilation (hrs) LOWO2 hours of exposure to 22{49% level of elevated oxygen MED02 hours of exposure to 40{79% level of elevated oxygen HI02 hours of exposure to 80{100% level of elevated oxygen 1. Goodness-of-t tests 2. Procedues aimed at specic alternatives such as tting extended or modied models. 3. Residuals and other diagnostic statistics designed to detect anomalous or inuencial cases or unexpected patterns. { Inspection of graphs { Case deletion diagnostics { Measures of t AGVEN age at onset of ventilatory assistance (hours) 1106 1105 Logistic regression model for the BDP data: Overall assesment of t: An overall measures similar to R2 tted model log-likelihood ! LM L0 LS L0 " 2 2 = G0G2GM - log 0 ! ^i 1 ^i log-likelihood for model containing only an intercept L L 0 M This becomes L0 (since LS = 0) when there are n observational units that provide n independent Bernoulli observations. L0 LM L0 = 2L0 ( 2L M ) 306:520 97:770 = = :681 2L0 306:520 1107 12:729 + 0:7686 LNL 1:7807 LNM + :3543 (LNM)2 log-likelihood for saturated model For the BPD data and the model selected with backward elimination = 2:4038 LNH + :5954 (LNH)2 + 1:6048 LNV 2 6 6 4 concordant = 97:0% discordant = 2:9% GAMMA = 0:941 3 7 7 5 1108 Goodness of t tests: H0 8 < : : proposed model is correct : any other model Pearson chi-squared test: n X 2 X (Yij i=1 j =1 Deviance: G2 = 2 n X n (Y ^ij )2 ni^ij )2 X ij ni = ni^ij ^ij (1 ^ij ) i=1 ni 2 X i=1 j =1 ( HA X2 = When each response is an independent Bernoulli observation 1 success Yi = 0 failure i = 1; 2; : : : ; n and i = P r Yi = 1 Z1i; ; Zri and there is only one response for each pattern of covariates (Z1i; : : : ; Zri) Yij log Yij =ni^ij ) Then, for testing i H0 : log 1 i = 0+1Z1i+ +kZki versus HA : 0 < i < 1 (general alternative) 0 1 @ A 1109 1110 neither G2 n = 2 i=1 Yi log ^ n +2 i=1 (1 Yi)log 11 Y^ii 0 X @ X 1 Yi A i 0 1 @ A nor In this situation, G2 and X 2 tests for comparing two (nested) models may still be well approximated by chisquared distributions when k+1 = = r = 0. H0 : log 1 = 0 + 1Z1i + + k Zki HA : log 1 = 0 +1Z1i + +k Zki + k+1Zk+1;i + + rZr;i (Yi ^i)2 X2 = ^i(1 ^i) i=1 n X i i i i is well approximated by a chi-square distribution when H0 is true, even for large n. 1111 1112 Deviance: G2 = 2 i=1 Yi log ^^ n 0 X @ Insignicant values of G2 and X 2 1 mi;A A mi;0 1. Only indicate that the alternative model oers little or no improvement over the null model Pearson statistic n (^ mi;A m ^ i;0)2 X2 = m ^ i;0 i=1 X have approximate chi-squared distributions with r k degrees of freedom when (r k)=n is small k+1 = = r = 0 (S. Haberman, 1977 Annals of Statistics.). 2. Do not imply that either model is adequate. These are the types of comparisons you make with stepwise variable selection procedures. 1114 1113 Hosmer-Lemeshow test: Collect the n cases into g groups. Make a 2 g contingency table Groups 1 2 g (i=1) Y = 1 011 012 01g (i=2) Y = 2 021 022 02g 0 n1 0 n2 0 ng Compute a Pearson statistic 2 g (0ik Eik )2 C= Eik i=1 k=1 X The "expected counts" are 0 E2k = nk (1 k) 0 E1k = nk k where n0 1 k = n0 j=1 ^j k X k X 1115 1116 Hosmer and Lemeshow recommend g = 10 groups formed as group 1 For the BPD example: all observational units with 0 < ^j :1 all observational units with .1 < ^j :2 group 2 . . group 10 all observational units with :9 < ^j < 1 Reject the proposed model if C > Xg2 2; The \lackt" option to the model statement in PROC LOGISTIC make 10 groups of nearly equal size 0 0 0 0 1 4 9 16 25 22 25 25 25 25 25 25 25 25 24 25 21 25 16 25 9 25 0 25 0 22 0 "Expected" counts i .2-.3 .3-.4 .6-.7 .7-.8 .8-.9 .9-1.0 3 3 5 3 6 2 6 2 1 46 128 18 6 8 1 1 5 1 1 1 1.72 0.28 46.59 0.41 131 4.95 126.05 3.14 17.86 2.67 8.33 \Expected" Counts 3.56 3.11 1.67 7.18 7.44 3.89 1.33 3.82 2.38 0.62 C = 12.46 on 8 d.f. (p-value = 0.132) * This test often has little power * Even when this test indicates that the model does not t well, it says little about how to improve the model. Diagnostics: Cook, and Weisberg (1982) Residuals and Inuence in Regression, Chapman Hall. Belsley, Kuh, and Welsch (1980) Regression Diagnostics: Identifying Inuential Data and Sources of Collinearity, Wiley. Pregibon (1981) Annals of Statistics 9, 705{724. 1 X @ .1-.2 1118 1117 BPD=1 (yes) BPD=2 (No) ^i values .4-.5 .5-.6 0-.1 ^iA .03 .09 .25 .53 1.25 2.67 7.64 18.1 24.5 22.0 25.0 25 24.9 25 24.7 25 24.5 25 23.7 25 22.3 25 17.4 25 6.9 25 0.50 25 0.01 22 C = 3.41 on 8 d.f. (p-value = .91) 1119 Hosmer and Lemeshow (2000) Applied Logistic Regression. Wiley, 2nd edition. Collett, D. (1991) Modelling Binary Data, Chapman and Hall, London Lloyd, C. (1999) Statistical Analysis of Categorical Data, Wiley, Section 4.2. 1120 1 Residuals and other diagnostics: Pearson residuals: Kay, R. and Little, S. (1986) Applied Statistics, 35, 16{30 (case study). Fowlkes, E. B. (1987) Biometrika 74, 503{515 n = i=1 Yi ni ^i ri = ni ^i(1 ^i) 0 @X 2 q G2 @ Miller, M. E., Hui, S. L., Tierney, W. M. (1991) Statistics in Medicine 10 1213{1226. n Deviance residuals: = i=1 di = sign(Yi ni ^i) jgij 0 Cook, R. D. (1986) JRSS-B, 48, 133{155. 1 ri2A X X 1 d2i A q where " ! ni Yi Y gi = 2 Yi log i + (ni Yi) log ni^i ni(1 ^1) 1121 PRED adjusted residuals: OBS S.E. -RESID ri = ^( q Adjusted Pearson residual: r ri = p i 1 hi Adjusted deviance residual: d di = p i 1 hi ^ Yi nii q V Yi nii ^) ^i = n ^ (1Yi ^ni)(1 i i i 1122 hi ) where hi 1123 is the \leverage" of the i-th observation 1124 !# Compare residuals to percentiles of the standard normal distribution Cases with residuals larger than 3 or smaller than -3 are suspicious. None of these \residuals" may be well approximated with a standard normal distribution. They are too \discrete". Residual plots { versus each explanatory variable { versus order (look for outliers or patterns across time) { versus expected counts: ni^i Smoothed residual plots 1126 1125 What is leverage? log 1 0 @ 2 6 6 6 6 6 6 6 4 i 1 i log 1 11 A = 0 + 1X1i + +kXki i = 1; : : : ; n 3 7 7 7 log 1 22 7 7 7 ... 7 5 2 log 1 nn = 6 6 6 6 4 1 X12 ... 1 ... ... 1 X11 X1n Xk 1 Xk 2 Xkn 3 2 7 76 76 74 5 0 1 ... k 3 7 7 5 = X " model matrix 1127 In linear regression the \hat matrix" is H = X (X 0X ) 1X 0 which is a projection operator onto the column space of X , and Y^ = HY residuals = (I V (residuals) = (I )Y H ) 2 H 1128 Pregibon (1981) uses a generalized least squares approach to logistic regression which yields a hat matrix H = V 1=2X (X 0V X ) 1X 0V 1=2 where V is an n n diagonal matrix with i-th diagonal element ni ^i(1 ^i) = Vii and ni is the number of cases with the i-th covariate pattern. The i-th diagonal element of called a leverage value Call this element hi. Note that n X i=1 hi H is =k+1 % number of coeÆcients When there is one individual for each covariance pattern, the upper bound on hi is 1. 1130 1129 INFLUENCE: Analogous to Cook's D for Cases with large values of hi may be cases with vectors of covariates that are far away from the mean of the covariates. However, such cases can have small hi values if ^i << :1 or ^i >> :9 An alternative quantity that gets larger as the vector of covariates gets farther from the mean of the covariates is hi bi = ni^i(1 ^i) see Hosmer + Lemshow pages 153{155 Look for cases with large leverage values and see what happens to estimated coeÆcients when the case is deleted. 1131 linear regression Dene: b b(i) the m.l.e. for using all n observations the m.l.e. for when the i-th case is deleted A \standardized" distance between b and b(i) is approximately Inuence(i) = (b b(i))0(X 0V X ) 1(b b(i)) % called Ci in PROC LOGISTIC ri2hi (1 hi)2 hi = (ri)2 1 hi =: % % squared adjusted monotone function residual of leverage 1132 ( V ar Yi ^ )=_ nii(1 i i )(1 hi ) and an adjusted Pearson residual is ri i = ni^i(1Yi ^i^)(1 = 1 ri hi q hi PROC LOGISTIC approximates the m.l.e. for with the i-th case deleted as b(i) = b b(i) where b(i) = (X 0V X ) 1xi Yi1 nhii^i 0 1 @ A An approximate measure of inuence is hi Ci = ri2 (1 hi)2 ) % square of the Pearson residual 1133 1134