Logistic Regression ~ Handout #1

8 - Introduction to Logistic Regression These data are taken from the text “Applied Logistic Regression” by Hosmer and Lemeshow. Researchers are interested in the relationship between age and presence or absence of evidence of coronary heart disease (CHD). The smooth is an estimate of: E(CHD|Age) = P(CHD=1|Age) Why? Expectation of a Bernoulli Random Variable 193 Fitting the Model in JMP Select Analyze > Fit Y by X and place CHD (y/n) in the Y box and age in the X box. The resulting output is shown below. Because the response is a dichotomous categorical variable logistic regression is performed. The curve is a plot of: exp( ô  ˆ1 Age) Pˆ (CHD | Age)  1  exp( ˆ  ˆ Age) o Example: Pˆ (CHD | Age  40)  Pˆ (CHD | Age  60)  194 1 Interpretation of Model Parameters P(CHD=1|Age) = e  o  1 Age 1  e  o  1 Age Odds for Success  ( x) ~ 1   ( x)  ~ thus   ( x)  ~      Age ln  o 1  1   ( x)  ~   Suppose we contrast individuals who are Age = x to those who are Age = x + c. What can we say about the increased risk associated with a c year increase in age? The logistic model gives us a means to do this through the odds ratio (OR).   ( Age  x  c)    1   ( Age  x  c)   ln( OR associated with a c year increase in age)  ln    ( Age  x)    1   ( Age  x)    ( Age  x  c)    ( Age  x)    ln     o   1 ( Age  c)  (  o   1 Age)  c1  ln   1   ( Age  x  c)   1   ( Age  x)  Exponentiating both sides gives Thus the multiplicative increase (or decrease if 1  0 ) in odds associated with a c year increase in age is e c1 . 195 Example: Interpreting a c year increase in age. Question: Is it reasonable to assume that a c unit increase in a continuous predictor is constant regardless of starting point? For example, does the risk associated with a 5 year increase in age remain constant throughout ones life? 196 Statistical Inference for the Logistic Regression Model Given estimates for the model parameters and their estimated standard errors what types of statistical inferences can be made? Hypothesis Testing For testing: H o : i  0 H a : i  0 Large sample test for significance of “slope” parameter (  i ) î z  N (0,1) SE ( î ) Confidence Intervals for Parameters and Corresponding OR’s 100(1   )% CI for  i ˆ  z SE(ˆ ) i 1 / 2 i 100(1   )% CI for OR Associated with  i exp( ˆ  z SE(ˆ )) i 1 / 2 i if  i corresponds to a continuous predictor and we wish to examine the OR associated with a c unit increase the CI for the OR becomes exp( cî  z1 / 2 cSE(î )) Example: What is the OR for CHD associated with a 10 year increase in age? Give a 95% confidence interval based on this estimate. 197 In JMP Using the Analyze > Fit Y by X Approach OPTIONS FOR LOGISTIC REGRESSION Range Odds Ratios – Odds ratio associated with being at the maximum of x vs. the minimum of x. Unit Odds Ratios – Odds ratio associated with a unit increase in x, i.e. c = 1. ROC Curve – if we use ˆ( x) = P̂ (CHD| x ) to construct a rule for classifying a ~ ~ patient as having CHD vs. No CHD this option gives the ROC curve coming from all possible cutpoints based on this estimate probability. Estimated Odds Ratios ROC Curve and Table By changing the classification rule based on estimated probability we can obtain an ROC curve. 198 Logistic Regression for the CHD data in R > CHD <- read.table(file.choose(),header=T) > CHD agegrp age chd 1 1 20 0 2 1 23 0 3 1 24 0 4 1 25 0 5 1 25 1 . . . . . . . . . . . . 96 8 63 1 97 8 64 0 98 8 64 1 99 8 65 1 100 8 69 1 > names(CHD) [1] "agegrp" "age" "chd" Make sure that you specify family=”binomial” or R will perform ordinary least squares > attach(CHD) > chd <- factor(chd) > chd.glm <- glm(chd~age,family="binomial") > summary(chd.glm) Call: glm(formula = chd ~ age, family = "binomial") Deviance Residuals: Min 1Q Median -1.9718 -0.8456 -0.4576 3Q 0.8253 Max 2.2859 Coefficients: Estimate Std. Error z value (Intercept) -5.30945 1.13263 -4.688 age 0.11092 0.02404 4.614 --Signif. codes: 0 `***' 0.001 `**' 0.01 Pr(>|z|) 2.76e-06 *** 3.95e-06 *** `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 136.66 Residual deviance: 107.35 AIC: 111.35 on 99 on 98 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 3 > probCHD <- exp(-5.30945 + .11092*age)/(1+exp(-5.30945 + .11092*age)) 199 > plot(age,probCHD,type="b",ylab="P(CHD|Age)",xlab="Age") P(CHD | Age)  e  o  1 Age 1  e  o  1 Age ô  5.310 ˆ1  .11092 An easier way obtain the estimated probabilities is to extract them from the model object. > probCHD <- fitted(chd.glm) > plot(Age,probCHD,type=”b”,ylab=”P(CHD|Age)”) # This produces plot above We can obtain the estimated logit ( Lî  ô  ˆ1 Age ) by using the predicted command. > chd.logit = predict(chd.glm) > plot(Age,chd.logit,type="b",ylab="L = bo + b1*Age") > title(main="Plot of Estimated Logit vs. Age") 200 The Logistic Regression Model (single predictor case) e o  1xi yi   i 1  e o  1xi 1 where yi   0 if outcome is a " success" if outcome is a " failure"   ( xi )   i What can we say about the errors? If yi  1 then If yi  0 then Thus E ( )  and Var ( )  We see that the errors are binomial NOT normal! Estimation of Model Parameters (Method of Maximum Likelihood) For i th observed pair ( xi , yi ) the contribution to the likelihood is  ( xi ) (1   ( xi )) yi 1 yi 1 e o  1xi where  ( xi )  and y   i 1  e  o  1xi 0 The Likelihood Function n L    L o , 1     ( xi ) yi (1   ( xi ))1 yi ~ i 1 maximizing this as a function of both  o and  1 yields the maximum likelihood estimates of the model parameters. For computational purposes it is usually easier to maximize the logarithm of the likelihood function rather than the likelihood function itself. This is fine because the 201 logarithm is a monotonic increasing function so the maximizing parameters is the same for the likelihood and log-likelihood function. The log-likelihood function is given by n ln L(  o , 1 )   y i ln  ( xi )   (1  y i ) ln 1   ( xi )  i 1 To find the parameter estimates we solve simultaneously the equations given by setting the partial derivatives with respect to each parameter equal to 0, i.e. solve simultaneously,  ln L(  o , 1 )  0  o  ln L(  o , 1 )  0 1 Several different nonlinear optimization routines are used to find solutions to such systems. Realize of course that this process gets increasingly computationally intensive as the number of terms in the model increases. How do we measure discrepancy between observed and fitted values? In OLS regression with a continuous response we used n n n i 1 i 1 i 1 RSS   ( y i  yˆ i ) 2   ( y i  (ˆ T u i )) 2 =  ( y i  (ô  ˆ1u1i    ˆ k u ki )) 2 In logistic regression modeling we can use the deviance (typically denoted D or G2) which is defined as likelihood of saturated model D =2 ln likelihood of fitted model n  y   1  yi   D  2 y i ln  i   (1  y i ) ln   ˆ( x )   1  ˆ( x )  i 1 i  i    Because the likelihood function of the saturated model is equal to 1 when the response (𝑦𝑖 ) is 0 or 1 the deviance reduces to: D = -2 ln(likelihood of the fitted model) The deviance can be used to compare two potential models where one model is nested within the other by using the “General Chi-Square Test” for comparing rival logistic regression models. Nested model concept: 202 General Chi-Square Test Consider the comparing two rival models where the alternative hypothesis model  ( x) T H o : log( )  1 x1 (reduced model OK) 1   ( x)  ( x) T T (full model needed) H 1 : log( )  1 x1   2 x 2 1   ( x) General Chi-Square Statistic  2 = (residual deviance of reduced model) – (residual deviance of full model) = D( for model without the terms in x2 )  D( for model with the terms in x2 ) ~  df 2 2 If the full model is needed  2 is BIG and the associated p-value = P(  df   2 ) is small. Example: CHD and Age Ho : H1 : From JMP From R > summary(chd.glm) Call: glm(formula = chd ~ Age, family = "binomial") Deviance Residuals: Min 1Q Median -1.9718 -0.8456 -0.4576 3Q 0.8253 Max 2.2859 Coefficients: Estimate Std. Error z value (Intercept) -5.30945 1.13365 -4.683 Age 0.11092 0.02406 4.610 --Signif. codes: 0 '***' 0.001 '**' 0.01 Null deviance: 136.66 Residual deviance: 107.35 on 99 on 98 Pr(>|z|) 2.82e-06 *** 4.02e-06 *** '*' 0.05 '.' 0.1 ' ' 1 degrees of freedom degrees of freedom 203 Logistic Regression with a Single Dichotomous Predictor Example: CHD and Indicator of Age Over 55 Computed using standard approach Logistic Model There are two different ways to code dichotomous variables (0,1) coding or (-1,+1, i.e. contrast) coding. JMP uses contrast coding where as other packages we will generally use the (0,1) coding. The two coding types are shown below. 1 Age 55+ =  0 Age > 55 Age < 55  1 Age 55+ =   1 Age > 55 Age < 55 For the purposes of discussion we will consider the (0,1) coding. Recall  (x )  P(CHD  1 | x)  e  o  1x where x = Age 55+ indicator we have the following. 1  e o  1x Age > 55 (x = 1) e  o  1  ( x  1 )  CHD = 1 1  e  o  1 1 1   ( x  1)  CHD = 0 1  e  o  1 Age < 55 (x = 0) e o  ( x  0)  1  e o 1 1   ( x  0)  1  e o Estimating the model parameters “by hand”  ( x  1) /(1   ( x  1)  OR =  ( x  0) /(1   ( x  0) 204 Logistic Regression in R > Over55 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [53] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Levels: 0 1 > chd [1] 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 1 1 [53] 0 1 0 1 0 0 1 0 1 1 0 0 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0 1 1 1 Levels: 0 1 > table(chd,Over55) Over55 chd 0 1 0 51 6 1 22 21 > chd55 = glm(chd~Over55,family=”binomial”) > summary(chd55) Call: glm(formula = chd ~ Over55, family = "binomial") Deviance Residuals: Min 1Q Median -1.734 -0.847 -0.847 3Q 0.709 Max 1.549 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.8408 0.2551 -3.296 0.00098 *** Over55 2.0935 0.5285 3.961 7.46e-05 *** --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 136.66 on 99 degrees of freedom Residual deviance: 117.96 on 98 degrees of freedom AIC= 121.96 Number of Fisher Scoring iterations: 4 205 In JMP To fit a logistic regression model is best to use the Analyze > Fit Model option. We place CHD y/n (1 = Yes, 2 = No) in the Y box and Over 55 (1 = Yes, 2 = No) in the model effects box. The key is to have “Yes” for risk and disease alpha-numerically before “No”, thus the use of 1 for “Yes” and 2 for “No” The summary of the fitted logistic model is shown below. Notice that the parameter estimates are the not the same as those obtained from R. This because JMP uses contrast coding for the Over 55 predictor (+1 = Age > 55 and -1 = Age < 55). 206 OR’s and Fitted Probabilities Using JMP to Compute OR’s, CI’s, Fitted Probabilities For dichotomous predictors the range odds ratios compare -1 to +1 in terms of odds ratio which is precisely what we want. 207 By selecting Save Probability Formula we can save the fitted probabilities to the spreadsheet. 208 Example 1: Oral Contraceptive Use and Myocardial Infarctions Set up a text file with the data in columns with variable names at the top. The case and control counts are in separate columns. The risk factor OC use and stratification variable Age follow. > OCMI.data = read.table(file.choose(),header=T) # read in text file > OCMI.data MI NoMI Age OCuse 1 4 62 1 Yes 2 2 224 1 No 3 9 33 2 Yes 4 12 390 2 No 5 4 26 3 Yes 6 33 330 3 No 7 6 9 4 Yes 8 65 362 4 No 9 6 5 5 Yes 10 93 301 5 No > attach(OCMI.data) > OC.glm <- glm(cbind(MI,NoMI)~Age+OCuse,family=binomial) # fit model > summary(OC.glm) Call: glm(formula = cbind(MI, NoMI) ~ Age + OCuse, family = binomial) Deviance Residuals: [1] 0.456248 -0.520517 -0.130922 0.033643 [9] -0.045061 0.008822 1.377693 Coefficients: Estimate Std. Error z value (Intercept) -4.3698 0.4347 -10.054 Age2 1.1384 0.4768 2.388 Age3 1.9344 0.4582 4.221 Age4 2.6481 0.4496 5.889 Age5 3.1943 0.4474 7.140 OCuseYes 1.3852 0.2505 5.530 --Signif. codes: 0 `***' 0.001 `**' 0.01 -0.886710 Pr(>|z|) < 2e-16 0.0170 2.43e-05 3.88e-09 9.36e-13 3.19e-08 -1.685521 0.714695 *** * *** *** *** *** `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 158.0085 Residual deviance: 6.5355 AIC: 58.825 on 9 on 4 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 3 Find OR associated with oral contraceptive use ADJUSTED for age. Note: CMH procedure gave 3.97. > exp(1.3852) [1] 3.995625 Find a 95% CI for OR associated with OC use. 209 > exp(1.3852-1.96*.2505) [1] 2.445428 > exp(1.3852+1.96*.2505) [1] 6.528518 Interpreting the age effect in terms of OR’s ADJUSTING for OC use. Note: The reference group is Age = 1 which was women 25 – 29 years of age. > OC.glm$coefficients (Intercept) Age2 Age3 Age4 -4.369850 1.138363 1.934401 2.648059 > Age.coefs <- OC.glm$coefficients[2:5] > exp(Age.coefs) Age2 Age3 Age4 Age5 3.121653 6.919896 14.126585 24.392906 Age5 3.194292 OCuseYes 1.385176 Find 95% CI for age = 5 group. > exp(3.1943-1.96*.4474) [1] 10.14921 > exp(3.1943+1.96*.4474) [1] 58.62751 Example 2: Coffee Drinking and Myocardial Infarctions CoffeeMI.data = read.table(file.choose(),header=T) > CoffeeMI.data Smoking Coffee MI NoMI 1 Never > 5 7 31 2 Never < 5 55 269 3 Former > 5 7 18 4 Former < 5 20 112 5 1-14 Cigs > 5 7 24 6 1-14 Cigs < 5 33 114 7 15-25 Cigs > 5 40 45 8 15-25 Cigs < 5 88 172 9 25-34 Cigs > 5 34 24 10 25-34 Cigs < 5 50 55 11 35-44 Cigs > 5 27 24 12 35-44 Cigs < 5 55 58 13 45+ Cigs > 5 30 17 14 45+ Cigs < 5 34 17 > attach(CoffeeMI.data) > Coffee.glm = glm(cbind(MI,NoMI)~Smoking+Coffee,family=binomial) > summary(Coffee.glm) Call: glm(formula = cbind(MI, NoMI) ~ Smoking + Coffee, family = binomial) Deviance Residuals: Min 1Q Median -0.7650 -0.4510 -0.0232 3Q 0.2999 Max 0.7917 Coefficients: (Intercept) Estimate Std. Error z value Pr(>|z|) -1.2981 0.1819 -7.136 9.60e-13 *** 210 Smoking15-25 Cigs 0.6892 0.2119 3.253 0.00114 ** Smoking25-34 Cigs 1.2462 0.2398 5.197 2.02e-07 *** Smoking35-44 Cigs 1.1988 0.2389 5.017 5.24e-07 *** Smoking45+ Cigs 1.7811 0.2808 6.342 2.27e-10 *** SmokingFormer -0.3291 0.2778 -1.185 0.23616 SmokingNever -0.3153 0.2279 -1.384 0.16646 Coffee> 5 0.3200 0.1377 2.324 0.02012 * --Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 173.7899 Residual deviance: 3.7622 AIC: 84.311 on 13 on 6 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 3 OR for drinking 5 or more cups of coffee per day. Note: CMH procedure gave OR = 1.375 > exp(.3200) [1] 1.377128 95% CI for OR associated with heavy coffee drinking > exp(.3200 - 1.96*.1377) [1] 1.051385 > exp(.3200 + 1.96*.1377) [1] 1.803794 Reordering a Factor To examine the effect of smoking we might want to “reorder” the levels of smoking status so that individuals who have never smoked are used as the reference group. To do this in R you must do the following: Smoking = factor(Smoking,levels=c("Never","Former","1-14 Cigs","15-25 Cigs","25-34 Cigs","35-44 Cigs","45+ Cigs")) The first level specified in the levels subcommand will be used as the reference group, “Never” in this case. Refitting the model with the reordered smoking status factor gives the following: > Coffee.glm2 <-glm(cbind(MI,NoMI)~Smoking+Coffee,family=binomial) > summary(Coffee.glm2) Call: glm(formula = cbind(MI, NoMI) ~ Smoking + Coffee, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -0.7650 -0.4510 -0.0232 0.2999 0.7917 Coefficients: (Intercept) Estimate Std. Error z value Pr(>|z|) -1.61344 0.14068 -11.469 < 2e-16 *** 211 SmokingFormer -0.01376 Smoking1-14 Cigs 0.31533 Smoking15-25 Cigs 1.00451 Smoking25-34 Cigs 1.56150 Smoking35-44 Cigs 1.51417 Smoking45+ Cigs 2.09646 Coffee> 5 0.31995 --Signif. codes: 0 `***' 0.001 0.25376 0.22789 0.17976 0.21254 0.21132 0.25855 0.13766 -0.054 1.384 5.588 7.347 7.165 8.108 2.324 0.9568 0.1665 2.30e-08 2.03e-13 7.77e-13 5.13e-16 0.0201 *** *** *** *** * `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 173.7899 Residual deviance: 3.7622 AIC: 84.311 on 13 on 6 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 3 Notice that “SmokingNever” is now absent from the output so we know it is being used as the reference group. The OR’s associated with the various levels of smoking are computed below. > Smoke.coefs = Coffee.glm$coefficients[2:7] > exp(Smoke.coefs) SmokingFormer Smoking1-14 Cigs Smoking15-25 Cigs Smoking25-34 Cigs 0.986338 1.370715 2.730561 4.765984 Smoking35-44 Cigs Smoking45+ Cigs 4.545632 8.137279 Confidence intervals for each could be computed in the standard way. 212 Some Details for Categorical Predictors with More Than Two Levels Consider the coffee drinking/MI study above. The stratification variable smoking has seven levels. Thus it requires six dummy variables to define it. The level that is not defined using a dichotomous dummy variable serves as the reference group. The table below shows how the value of the dummy variables: Level D2 D3 D4 D5 D6 D7 Never 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 (Reference Group) Former 1 – 14 Cigs 15 – 24 Cigs 25 – 34 Cigs 35 – 44 Cigs 45+ Cigs Example: Coffee Drinking and Myocardial Infarctions CoffeeMI.data = read.table(file.choose(),header=T) > CoffeeMI.data Smoking Coffee MI NoMI 1 Never > 5 7 31 2 Never < 5 55 269 3 Former > 5 7 18 4 Former < 5 20 112 5 1-14 Cigs > 5 7 24 6 1-14 Cigs < 5 33 114 7 15-25 Cigs > 5 40 45 8 15-25 Cigs < 5 88 172 9 25-34 Cigs > 5 34 24 10 25-34 Cigs < 5 50 55 11 35-44 Cigs > 5 27 24 12 35-44 Cigs < 5 55 58 13 45+ Cigs > 5 30 17 14 45+ Cigs < 5 34 17 The Logistic Model   ( x)       Coffee   D   D   D   D   D   D ~ ln  o 1 2 2 3 3 4 4 5 5 6 6 7 7  1   ( x)  ~   where Coffee is a dichotomous predictor equal to 1 if they 5 or more cups of coffee per day. Comparing the log-odds of a heavy coffee drinker who who smokes 15-25 cigarettes day to a heavy coffee drinker who has never smoked we have. 213  1 ( x)  ~    ln  o 1 4  1  1 ( x)  ~     2 ( x)  ~   ln  o 1  1   2 ( x)  ~   Taking the difference gives,  1 ( x)  ~    1  1 ( x)  ~  ln   4  ( x )  2 ~     1   2 ( x)  ~   thus e  4  the odds ratio associated with smoking 15-24 cigarettes per day when compared to individuals who have never smoked amongst heavy coffee drinkers. Because  1 is not involved in the odds ratio the result is the same for non-heavy coffee drinkers as well! You can also consider combinations of factors, e.g. if we compared heavy coffee drinkers who smoked 15-24 cigarettes to a non-heavy coffee drinkers who have never smoked the associated OR would be given by e 1   4 . Using our fitted model the OR’s ratios discussed above would be. > summary(Coffee.glm) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.61344 0.14068 -11.469 < 2e-16 *** SmokingFormer -0.01376 0.25376 -0.054 0.9568 Smoking1-14 Cigs 0.31533 0.22789 1.384 0.1665 Smoking15-25 Cigs 1.00451 0.17976 5.588 2.30e-08 *** Smoking25-34 Cigs 1.56150 0.21254 7.347 2.03e-13 *** Smoking35-44 Cigs 1.51417 0.21132 7.165 7.77e-13 *** Smoking45+ Cigs 2.09646 0.25855 8.108 5.13e-16 *** Coffee> 5 0.31995 0.13766 2.324 0.0201 * --Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 OR for 15-24 cigarette smokers vs. never smokers (regardless of coffee drinking status) > exp(1.00451) [1] 2.730569 214 OR for 15-24 cigarette smokers who are also heavy coffee drinkers vs. non-smokers who are not heavy coffee drinkers > exp(.31995 + 1.00451) [1] 3.760154 Similar calculations could be done for other combinations of coffee and cigarette use. Using Arc when the Number Trials is not 1 Example 1: Oral contraceptive use, myocardial infarctions, and age To read these data in Arc it is easiest to create a text file that looks like: Age OCuse MI NoMI Trials 1 Yes 4 62 66 1 No 2 224 226 2 Yes 9 33 42 2 No 12 390 402 3 Yes 4 26 30 3 No 33 330 363 4 Yes 6 9 15 4 No 65 362 427 5 Yes 6 5 11 5 No 93 301 394 The Trials column contains the total number of patients in each age and oral contraceptive use category, i.e. the sum of the number of patients with MI and the number of patients without MI (NoMI). When read in Arc we have: ; loading D:\Data\Deppa Documents\Biostatistics (Biometry II)\Book Data\OCMI.txt Arc 1.06, rev July 2004, Mon Oct 16, 2006, 12:58:46. Data set name: OCMI Oral contraceptive use, age, and myocardial infarctions Name Type n Info AGE Variate 10 MI Variate 10 NOMI Variate 10 TRIALS Variate 10 OCUSE Text 10 In Arc we need to create to turn the Age variable into a factor as we don’t want to be interpreted as an actual number and we need to create a factor based on OCuse. By default Arc does things alphabetically so No would be used as “present” which is not desirable. Thus it is best to create separate dichotomous dummy variables for each level individually. This will allow us to use those who used oral contraceptives as having “risk present”. To do this in Arc we need to use the Make Factors… option in the data menu. 215 For oral contraceptive use we want two separate dummy variables, one for each level of use, i.e. Yes and No. Fitting the logistic model in Arc with MI as the response and OCUSE[YES] as the risk factor indicator. 216 Results for Fitted Logistic Model Iteration 1: deviance = 6.69914 Iteration 2: deviance = 6.53561 Data set = OCMI, Name of Fit = B1 Binomial Regression Kernel mean function = Logistic Response = MI Terms = ({F}AGE {T}OCUSE[YES]) Trials = TRIALS Coefficient Estimates Label Estimate Std. Error Constant -4.36985 0.434642 {F}AGE[2] 1.13836 0.476782 {F}AGE[3] 1.93440 0.458227 {F}AGE[4] 2.64806 0.449627 {F}AGE[5] 3.19429 0.447386 {T}OCUSE[YES] 1.38518 0.250458 Scale factor: Number of cases: Degrees of freedom: Pearson X2: Deviance: Est/SE -10.054 2.388 4.221 5.889 7.140 5.531 p-value 0.0000 0.0170 0.0000 0.0000 0.0000 0.0000 1. 10 4 6.386 6.536 We can work with these parameter estimates as above to obtain OR’s of interest etc. 217 Logistic Regression Case Study 1: Risk Factors for Low Birth Weight Response Y = low birth weight, i.e. birth weight < 2500 grams (1 = yes, 0 = no) Set of potential predictors X1 X2 X3 X4 X5 X6 X7 = = = = = = = previous history of premature labor (1 = yes, 0 = no) hypertension during pregnancy (1 = yes, 0 = no) smoker (1 = yes, 0 = no) uterine irritability (1 = yes, 0 = no) minority (1 = yes, 0 = no) mother’s age in years mother’s weight at last menstrual cycle Analysis in R > Lowbirth = read.table(file.choose(),header=T) > Lowbirth[1:5,] # print first 5 rows of the data set Low Prev Hyper Smoke Uterine Minority Age Lwt race bwt 1 0 0 0 0 1 1 19 182 2 2523 2 0 0 0 0 0 1 33 155 3 2551 3 0 0 0 1 0 0 20 105 1 2557 4 0 0 0 1 1 0 21 108 1 2594 5 0 0 0 1 1 0 18 107 1 2600 Make sure categorical variables are interpreted as factors by using the factor command > > > > > > Low = factor(Low) Prev = factor(Prev) Hyper = factor(Hyper) Smoke = factor(Smoke) Uterine = factor(Uterine) Minority = factor(Minority) Note: This is not really necessary for dichotomous variables that are coded (0,1). Fit a preliminary model using available covariates > low.glm = glm(Low~Prev+Hyper+Smoke+Uterine+Minority+Age+Lwt,family=binomial) > summary(low.glm) Call: glm(formula = Low ~ Prev + Hyper + Smoke + Uterine + Minority + Age + Lwt, family = binomial) Deviance Residuals: Min 1Q Median -1.6010 -0.8149 -0.5128 3Q 1.0188 Max 2.1977 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.378479 1.170627 0.323 0.74646 Prev1 1.196011 0.461534 2.591 0.00956 ** Hyper1 1.452236 0.652085 2.227 0.02594 * Smoke1 0.959406 0.405302 2.367 0.01793 * Uterine1 0.647498 0.466468 1.388 0.16511 Minority1 0.990929 0.404969 2.447 0.01441 * Age -0.043221 0.037493 -1.153 0.24900 Lwt -0.012047 0.006422 -1.876 0.06066 . --Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Null deviance: 232.40 Residual deviance: 196.71 AIC: 212.71 on 185 on 178 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 3 218 It appears that both uterine irritability and mother’s age are not significant. We can fit the reduced model eliminating both terms and test whether the model is significantly degraded by using the general chi-square test (see pg. 11 of the logistic notes). > low.reduced = glm(Low~Prev+Hyper+Smoke+Minority+Lwt,family=binomial) > summary(low.reduced) Call: glm(formula = Low ~ Prev + Hyper + Smoke + Minority + Lwt, family = binomial) Deviance Residuals: Min 1Q Median -1.7277 -0.8219 -0.5368 3Q 0.9867 Max 2.1517 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.261274 0.885803 -0.295 0.76803 Prev1 1.181940 0.444254 2.661 0.00780 ** Hyper1 1.397219 0.656271 2.129 0.03325 * Smoke1 0.981849 0.398300 2.465 0.01370 * Minority1 1.044804 0.394956 2.645 0.00816 ** Lwt -0.014127 0.006387 -2.212 0.02697 * --Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 232.40 Residual deviance: 200.32 AIC: 212.32 on 185 on 180 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 3   ( x)  ~   X  X  X  X  X H o : ln  o 1 1 2 2 3 3 5 5 7 7  1   ( x)  ~     ( x)  ~   X  X  X  X  X  X  X H 1 : ln  o 1 1 2 2 3 3 4 4 5 5 6 6 7 7  1   ( x)  ~   * Recall:  ( x)  P( Low  1 | X ) ~ ~ Residual Deviance Null Hypothesis Model: DH o  200.32 df = 180 Residual Deviance Alternative Hypothesis Model: DH1  196.71 df = 178 General Chi-Square Test  2  DH 0  DH1  200.32  196.71  3.607 p  value  P(  2  3.607)  .1647 Fail to reject the null, the reduced model is adequate. 2 219 Interpretation of Model Parameters OR’s Associated with Categorical Predictors > low.reduced Call: glm(formula = Low ~ Prev + Hyper + Smoke + Minority + Lwt, family = binomial) Coefficients: (Intercept) Prev1 -0.26127 1.18194 Hyper1 1.39722 Smoke1 0.98185 Degrees of Freedom: 185 Total (i.e. Null); Null Deviance: 232.4 Residual Deviance: 200.3 AIC: 212.3 Minority1 1.04480 Lwt -0.01413 180 Residual Estimated OR’s > exp(low.reduced$coefficients[2:5]) Prev1 Hyper1 Smoke1 Minority1 3.260693 4.043938 2.669388 2.842841 95% CI for OR Associated with History of Premature Labor > exp(1.182 - 1.96*.444) [1] 1.365827 > exp(1.182 + 1.96*.444) [1] 7.78532 Holding everything else constant we estimate that the odds of having an infant with low birth weight are between 1.366 and 7.785 times larger for mothers with a history of premature labor. 95% CI for OR Associated with Hypertension > exp(1.397 - 1.96*.6563) [1] 1.117006 > exp(1.397 + 1.96*.6563) [1] 14.63401 Holding everything else constant we estimate that the odds of having an infant with low birth weight are between 1.117 and 14.63 times larger for mothers with hypertension during pregnancy. 95% CI for OR Associated with Smoking > exp(.981849 - 1.96*.3983) [1] 1.222846 > exp(.981849 + 1.96*.3983) [1] 5.827086 Holding everything else constant we estimate that the odds of having an infant with low birth weight are between 1.223 and 5.827 times larger for mothers who smoked during pregnancy. 220 95% CI for OR Associated with Minority Status > exp(1.0448 - 1.96*.3950) [1] 1.310751 > exp(1.0448 + 1.96*.3950) [1] 6.16569 Holding everything else constant we estimate that the odds of having an infant with low birth weight are between 1.311 and 6.166 times larger for non-white mothers. OR Associated with Mother’s Weight at Last Menstrual Cycle Because this is a continuous predictor with values over 100 we should use an increment larger than one when considering the effect of mother’s weight on birth weight. Here we will use an increment of c = 10 lbs. although certainly there are other possibilities. > exp(-10*.014127) [1] 0.8682549 i.e. 13.2% decrease in the OR for each additional 10 lbs. in premenstrual weight. A 95% CI for this OR is: > exp(10*(-.014127) - 1.96*10*.006387) [1] 0.7660903 > exp(10*(-.014127) + 1.96*10*.006387) [1] 0.9840439 x = seq(min(Lwt),max(Lwt),.5) fit = predict(low.reduced,data.frame(Prev=factor(rep(1,length(x))), Hyper=factor(rep(0,length(x))),Smoke=factor(rep(1,length(x))),Minority= factor(rep(0,length(x))),Lwt=x),type="response") plot(x,fit,xlab=”Mother’s Weight”,ylab=”P(Low|Prev=1,Smoke=1,Lwt)”) This is a plot of the effect of premenstrual weight for smoking mothers with a history of premature labor. Using the predict command above similar plots could be constructed by examining other combinations of the categorical predictors. 221 Diagnostics (Delta Deviance and Cook’s Distance) As in the case of ordinary least squares (OLS) regression we need to be wary of cases that may have unduly high influence on our results and those that are poorly fit. The most common influence measure is Cook’s Distance and a good measure of poorly fit cases is the Delta Deviance. Essentially Cook’s Distance ( ˆ( i ) ) measures the changes in the estimated parameters when the ith observation is deleted. This change is measured for each of the observations and can be plotted versus ˆ( x) or observation number to aid in the identification of high ~ influence cases. Several cut-offs have been proposed for Cook’s Distance, the most common being to classify an observation as having large influence if ˆ( i )  1 or, in case of large sample size n, ˆ  4 / n . (details of Cook’s distance on page 38 below) ( i ) Delta deviance measures the change in the deviance (D) when the ith case is deleted. Values around 4 or larger are considered to cases that are poorly fit. These cases correspond to cases to individuals where yi  1 but ˆ( x) is small, or cases where yi  0 ~ but ˆ( x) is large. ~ In cases of both high influence and poor fit it is good to look at the covariate values for these individuals and we can begin to address the role they play in the analysis. In many cases there will be several individuals with the same covariate pattern, especially if most or all of the predictors are categorical in nature. > Diagplot.glm(low.reduced) 222 > Diagplot.log(low.reduced) Cases 11 and 13 have the highest Cook’s distances although they are not that large. It should be noted also that they are also somewhat poorly fit. Cases 129, 144, 152, and 180 appear to be poorly fit. The information on all of these cases is shown below. > Lowbirth[c(11,13,129,144,152,180),] Low Prev Hyper Smoke Uterine Minority Age Lwt race bwt 11 0 0 1 0 0 1 19 95 3 2722 13 0 0 1 0 0 1 22 95 3 2750 129 1 0 0 0 1 0 29 130 1 1021 144 1 0 0 0 1 1 21 200 2 1928 152 1 0 0 0 0 0 24 138 1 2100 180 1 0 0 1 0 0 26 190 1 2466 Case 152 had a low birth weight infant even in the absence of the identified potential risk factors. The fitted values for all four of the poorly fit cases are quite small. > fitted(low.reduced)[c(11,13,129,144,152,180)] 11 13 129 144 152 180 0.69818500 0.69818500 0.10930602 0.11486743 0.09877858 0.12307383 Cases 11 and 13 have high predicted probabilities despite the fact that they had babies with normal birth weight. Their relatively high leverage might come from the fact that 223 there were very few hypertensive minority women in the study. These two facts combined lead to the relatively large Cook’s Distances for these two cases. Plotting Estimated Conditional Probabilities ~ P( Low  1 | x~ ) A summary of the reduced model is given below: > low.reduced Call: glm(formula = Low ~ Prev + Hyper + Smoke + Minority + Lwt, family = binomial) Coefficients: (Intercept) Prev1 -0.26127 1.18194 Hyper1 1.39722 Smoke1 0.98185 Degrees of Freedom: 185 Total (i.e. Null); Null Deviance: 232.4 Residual Deviance: 200.3 AIC: 212.3 Minority1 1.04480 Lwt -0.01413 180 Residual To easily plot probabilities in R we can write a function that takes covariate values and compute the desired conditional probability. > x <- seq(min(Lwt),max(Lwt),.5) > + + + + > + > > > > PrLwt <- function(x,Prev,Hyper,Smoke,Minority) { L <- -.26127 + 1.18194*Prev + 1.39722*Hyper + .98185*Smoke + 1.0448*Minority - .01413*x exp(L)/(1 + exp(L)) } plot(x,PrLwt(x,1,1,1,1),xlab="Mother's Weight",ylab="P(Low=1|x)", ylim=c(0,1),type="l") title(main="Plot of P(Low=1|X) vs. Mother's Weight") lines(x,PrLwt(x,0,0,0,0),lty=2,col="red") lines(x,PrLwt(x,1,1,0,0),lty=3,col="blue") lines(x,PrLwt(x,0,0,1,1),lty=4,col="green") 224 Fitting Logistic Models in Arc and More Diagnostics from website) Again we consider the low birth weight case study. (lowbirtharc.txt Arc 1.03, rev Aug, 2000, Wed Oct 22, 2003, 12:10:14. Data set name: Lowbw Low birth weight study. Name Type n Info AGE Variate 189 Age of mother BWT Variate 189 Actual birthweight of child in grams HT Variate 189 Mother hypertensive during pregnancy (1 = yes, 0 = no) ID Variate 189 LOW Variate 189 (1 = low birthweight, 0 = normal birthweight) LWT Variate 189 Mothers weight at last menstrual cycle PTD Variate 189 do not know PTL Variate 189 Previous history of premature labor (1 = yes, 0 = no) RACE Variate 189 Race of mother (1 = white, 2 = black, 3 = other) SMOKE Variate 189 Mother smoke (1 = yes, 0 = no) UI Variate 189 Uterine irritability (1 = yes, 0 = no) FTV Text 189 # of doctor visits during 1st trimester {F}FTV Factor 189 Factor--first level dropped {F}HT Factor 189 Factor--first level dropped {F}PTD Factor 189 Factor--first level dropped {F}RACE Factor 189 Factor--first level dropped {F}SMOKE Factor 189 Factor--first level dropped {F}UI Factor 189 Factor--first level dropped Select Fit binomial response… from the Graph & Fit menu In the resulting dialog box, specify the model as shown on the following page. 225 Give the model a name if you want. Always include an intercept. Use the Make Factors… option from the data set menu to ensure all categorical predictors are treated as factors. Put dichotomous response in the Response… box. The response may also be the number of “successes” observed. (see below) If mi=1 for all cases then put the variable Ones in the Trials… box. If your response represented the number of “successes” observed in mi > 1 trials then you we need to import the number trials and put that variable in this box. The output below shows the results of fitting this initial model. Data set = Lowbw, Name of Fit = B1 Binomial Regression Kernel mean function = Logistic Response = LOW Terms = (AGE LWT {F}FTV {F}HT {F}PTD Trials = Ones Coefficient Estimates Label Estimate Std. Error Constant 0.386634 1.27736 AGE -0.0372340 0.0386777 LWT -0.0156530 0.00707594 {F}FTV[0] 0.436379 0.479161 {F}FTV[2+] 0.615386 0.553104 {F}HT[1] 1.91316 0.720434 {F}PTD[1] 1.34376 0.480445 {F}RACE[2] 1.19241 0.535746 {F}RACE[3] 0.740681 0.461461 {F}SMOKE[1] 0.755525 0.424764 {F}UI[1] 0.680195 0.464216 {F}RACE {F}SMOKE {F}UI) Est/SE 0.303 -0.963 -2.212 0.911 1.113 2.656 2.797 2.226 1.605 1.779 1.465 p-value 0.7621 0.3357 0.0270 0.3624 0.2659 0.0079 0.0052 0.0260 0.1085 0.0753 0.1429 Note: For FTV those who went to the doctor once during the first trimester are used as the reference group Scale factor: 1. Number of cases: 189 Degrees of freedom: 178 Pearson X2: 179.059 Deviance: 195.476 (Note: AIC = D + 2k*(scale factor) = 195.48 + 22 = 217.48) The results are identical those obtained from R. Null deviance: 234.67 Residual deviance: 195.48 on 188 on 178 degrees of freedom degrees of freedom 226 Examining Submodels – Backward Elimination and Forward Selection Forward Elimination – Select this option and click OK. It will then show terms are sequentially added to a model containing any base terms to the model. By default the base contains the intercept only. Backward Elimination – Simply select this option and click OK. It will show how terms are sequentially eliminated from the model along with the resulting AIC for the deletion. The other options do what they say. The results of backward elimination for the current low birth weight model are shown below. Data set = Lowbw, Name of Fit = B1 Binomial Regression Kernel mean function = Logistic Response = LOW Terms = (AGE LWT {F}FTV {F}HT {F}PTD {F}RACE {F}SMOKE {F}UI) Trials = Ones Backward Elimination: Sequentially remove terms that give the smallest change in AIC. All fits include an intercept. Current terms: (AGE LWT {F}FTV {F}HT {F}PTD {F}RACE {F}SMOKE {F}UI) df Deviance Pearson X2 | k AIC Delete: {F}FTV 180 196.834 180.989 | 9 214.834 * Delete: AGE 179 196.417 181.401 | 10 216.417 Delete: {F}UI 179 197.585 180.753 | 10 217.585 Delete: {F}SMOKE 179 198.674 186.809 | 10 218.674 Delete: {F}RACE 180 201.227 183.365 | 9 219.227 Delete: LWT 179 200.949 177.855 | 10 220.949 Delete: {F}HT 179 202.934 177.447 | 10 222.934 Delete: {F}PTD 179 203.584 180.74 | 10 223.584 Current terms: (AGE LWT {F}HT {F}PTD {F}RACE {F}SMOKE {F}UI) df Deviance Pearson X2 | k AIC Delete: AGE 181 197.852 183.999 | 8 213.852 Delete: {F}UI 181 199.151 184.559 | 8 215.151 Delete: {F}RACE 182 203.24 182.815 | 7 217.240 Delete: {F}SMOKE 181 201.247 186.953 | 8 217.247 Delete: LWT 181 201.833 181.355 | 8 217.833 Delete: {F}PTD 181 203.948 181.536 | 8 219.948 Delete: {F}HT 181 204.013 179.069 | 8 220.013 * 227 Current terms: (LWT {F}HT {F}PTD {F}RACE {F}SMOKE df Deviance Pearson X2 Delete: {F}UI 182 200.482 186.918 Delete: {F}SMOKE 182 202.567 189.716 Delete: {F}RACE 183 205.466 186.461 Delete: LWT 182 203.816 185.551 Delete: {F}PTD 182 204.217 182.499 Delete: {F}HT 182 205.162 182.282 {F}UI) | k | 7 | 7 | 6 | 7 | 7 | 7 Current terms: (LWT {F}HT {F}PTD {F}RACE {F}SMOKE) df Deviance Pearson X2 | Delete: {F}SMOKE 183 205.397 189.925 | Delete: {F}RACE 184 207.955 192.506 | Delete: {F}HT 183 207.039 184.17 | Delete: LWT 183 207.165 187.234 | Delete: {F}PTD 183 208.247 184.45 | Current terms: (LWT {F}HT {F}PTD {F}RACE) df Deviance Pearson X2 | Delete: {F}RACE 185 210.123 194.086 | Delete: {F}HT 184 212.18 188.048 | Delete: LWT 184 213.226 187.544 | Delete: {F}PTD 184 216.295 191.533 | k 6 5 6 6 6 k 4 5 5 5 AIC 214.482 216.567 217.466 217.816 218.217 219.162 * AIC 217.397 217.955 219.039 219.165 220.247 AIC 218.123 222.180 223.226 226.295 Current terms: (LWT {F}HT {F}PTD) df Deviance Delete: {F}HT 186 217.497 Delete: LWT 186 217.662 Delete: {F}PTD 186 221.142 Pearson X2 | 190.809 | 188.394 | 193.26 | k 3 3 3 AIC 223.497 223.662 227.142 Current terms: (LWT {F}PTD) df Deviance Delete: LWT 187 221.898 Delete: {F}PTD 187 228.691 Pearson X2 | 188.863 | 189.647 | k 2 2 AIC 225.898 232.691 * indicates a potential “final” model using the AIC criteria, Arc does not add the *’s. Making Interactions To make interactions in Arc… 1st - Select Make Interactions from the data set menu. 2nd - Placing all covariates in the righthand box will create all possible two-way interactions. 228 Deciding which interactions to include however is not as easy as in R. You could potentially include all interactions and then backward eliminate, however things will get unstable numerically with that many terms in the model. It is better to choose any interactions you feel might make physiological sense and then backward eliminate. If Arc does not use the reference group you would like to use, you can create dummy variables for each level of the factor and then leave the one for the reference group out when you specify the model. Selecting these options will create three dummy variables one for each level of FTV (0, 1, 2+). The model with the age*recoded FTV and the smoking*uterine irritability interactions we saw in the R handout is summarized below. Data set = Lowbw, Name of Fit = B6 Binomial Regression Kernel mean function = Logistic Response = LOW Terms = (AGE LWT {F}HT {F}PTD {F}SMOKE {F}UI {F}SMOKE*{F}UI {T}FTV[1] {T}FTV[2+] {T}FTV[1]*AGE {T}FTV[2+]*AGE) Trials = Ones Coefficient Estimates Label Estimate Std. Error Est/SE p-value Constant -0.582374 1.42158 -0.410 0.6821 AGE 0.0755389 0.0539665 1.400 0.1616 LWT -0.0203726 0.00749678 -2.718 0.0066 {F}HT[1] 2.06570 0.748727 2.759 0.0058 {F}PTD[1] 1.56032 0.496986 3.140 0.0017 {F}SMOKE[1] 0.780044 0.420371 1.856 0.0635 {F}UI[1] 1.81853 0.667517 2.724 0.0064 {F}SMOKE[1].{F}UI[1] -1.91668 0.973066 -1.970 0.0489 {T}FTV[1] 2.92109 2.28571 1.278 0.2013 {T}FTV[2+] 9.24491 2.66099 3.474 0.0005 {T}FTV[1].AGE -0.161824 0.0968164 -1.671 0.0946 {T}FTV[2+].AGE -0.411033 0.119117 -3.451 0.0006 Number of cases: Degrees of freedom: Pearson X2: Deviance: 189 177 179.282 183.073 Notice: The recoding of FTV so FTV=0 is now the reference group. 229 Diagnostic Plots There are several plotting options in Arc to help assess a models adequacy. They are as follows:  Residuals (deviance or chi-square) vs. the estimated logit ( Lˆ  ˆ T X ) Deviance Residual   y   1  y i   Di  2  sgn( yi  ˆ( xi ))   yi ln  i   (1  yi ) ln   ˆ( x )   1  ˆ( x )   i i     Chi-residual for the ith covariate pattern is defined as: yi  yˆ i eˆi  the sum of the squared chi-residuals = Pearson’s  m ˆ( x )(1  ˆ( x )) i i ~ ~i where yˆ i  miˆ( xi ) and yi  1 for cases and 0 for controls. ~    Plot of Cook’s distance vs. Case Number or some other quantity. Plot of Leverage (potential for influence) vs. Case Number Model checking plots Eta’U ~ Estimated Logit ( Lî  ˆ T xi ) ~ Obs-Fraction ~ yi / mi (1 and 0’s in the case mi =1) ˆ e Li Fit-Fraction ~ ˆ( xi )  ˆ ~ 1  e Li Chi-Residuals ~ see above Dev-Residuals ~ see above eˆ i T-Residuals ~ studentized chi-residual 1  hi Leverages ~ hi = ith element of the hat matrix H 2 1  eˆ  i  hi Cook’s Distance ~ Di  measures k  1  hi  1  hi influence of the ith case. Residuals vs. Estimated Logit (or some other function of the covariates) If the model is adequate, a lowess ( = .6) smooth added to the plot should be constant, i.e. flat. This plot will not work well when the number of replicates, mi are small, i.e. close to 1. Model checking plots work better for checking model adequacy in those cases. 230 As an example consider the simple, but reasonable, main effects model shown on the next page. Data set = Lowbw, Name of Fit = B3 Binomial Regression Kernel mean function = Logistic Response = LOW Terms = (LWT {F}HT {F}PTD {F}RACE {F}SMOKE {F}UI) Trials = Ones Coefficient Estimates Label Estimate Std. Error Est/SE p-value Constant -0.125327 0.967238 -0.130 0.8969 LWT -0.0159185 0.00695085 -2.290 0.0220 {F}HT[1] 1.86689 0.707212 2.640 0.0083 {F}PTD[1] 1.12886 0.450330 2.507 0.0122 {F}RACE[2] 1.30085 0.528349 2.462 0.0138 {F}RACE[3] 0.854413 0.440761 1.938 0.0526 {F}SMOKE[1] 0.866581 0.404341 2.143 0.0321 {F}UI[1] 0.750648 0.458753 1.636 0.1018 Scale factor: Number of cases: Degrees of freedom: Pearson X2: Deviance: 1. 189 181 183.999 197.852 The plots of the chi-square residuals vs. the estimated logit ( L̂ = ˆ T X ) and LWT are shown below. The lowess smooth looks fairly flat and so no model inadequacies are suggested. 231 Cook’s Distance vs. Case Number and Est. Probs - (no cases have high influence) Leverages vs. Case Numbers For leverages the average value is k/n, so values far exceeding the average have the potential to be influential. The following is a good rule of thumb: 1/n < hi < .25 no worries .25 < hi < .50 worry .50 < hi < 1 worry lots Model Checking Plots For any linear combination b T x i of the predictors of terms imagine drawing two plots: one of y / m vs. b T x , and one of ˆ( x ) vs. b T x . If the model is adequate lowess smooth i i i i ~ i of each should match for any linear combination we choose. A model checking plot is a plot with b T x i on the x-axis and both the lowess smooths described above added to the plot. If they agree for a variety of choices of b T x i then we can feel reasonably confident that our model is adequate. Large differences between these smooths can indicate model deficiencies. Common choices for b T x i include the estimated logits ( L̂ ), the individual predictors, and randomly chosen combinations of the terms in the model. 232 Here we see good agreement between the two smoothes for the estimated logits. Model checking plot with the single term LWT on the x-axis. 233 Model checking plot for one random linear combination of the terms in the model. Again we see good agreement. 234 Interactions and Higher Order Terms (Note ~ uses data frame: Lowbwt) Working with a slightly different version of the low birth weight data available which includes an additional predictor, ftv, which is a factor that indicates the number of first trimester doctor visits the woman (coded as: 0, 1, or 2+). We will examine how the model below was developed in the next section where we discuss model development. In the model below we have added an interaction between age and the number of first trimester visits. The logistic model is:   ( x)  ~      Age   Lwt   Smoke   Pr ev   HT   UI  log  o 1 2 3 4 5 6  1   ( x)  ~    7 FTV 1   8 FTV 2   9 Age * FTV 1  10 Age * FTV 2  11Smoke * UI > summary(bigmodel) Call: glm(formula = low ~ age + lwt + smoke + ptd + ht + ui + ftv + age:ftv + smoke:ui, family = binomial) Deviance Residuals: Min 1Q Median -1.8945 -0.7128 -0.4817 3Q 0.7841 Max 2.3418 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.582389 1.420834 -0.410 0.681885 age 0.075538 0.053945 1.400 0.161428 lwt -0.020372 0.007488 -2.721 0.006513 ** smoke1 0.780047 0.420043 1.857 0.063302 . ptd1 1.560304 0.496626 3.142 0.001679 ** ht1 2.065680 0.748330 2.760 0.005773 ** ui1 1.818496 0.666670 2.728 0.006377 ** ftv1 2.921068 2.284093 1.279 0.200941 ftv2+ 9.244460 2.650495 3.488 0.000487 *** age:ftv1 -0.161823 0.096736 -1.673 0.094360 . age:ftv2+ -0.411011 0.118553 -3.467 0.000527 *** smoke1:ui1 -1.916644 0.972366 -1.971 0.048711 * --Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 234.67 Residual deviance: 183.07 AIC: 207.07 on 188 on 177 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 4 > bigmodel$coefficients (Intercept) age lwt smoke1 prev1 ht1 -0.58238913 0.07553844 -0.02037234 0.78004747 1.56030401 2.06567991 ui1 ftv1 ftv2+ age:ftv1 age:ftv2+ smoke1:ui1 1.81849631 2.92106773 9.24445985 -0.16182328 -0.41101103 -1.91664380 235 Calculate P(Low|Age,FTV) for women of average pre-pregnancy weight with all other risk factors absent. Similar calculations could be done if we wanted to add in other factors as well. First we calculate the logits as function of age for three levels of FTV 0, 1, and 2+ respectively. > L <- -.5824 + .0755*agex - .02037*mean(lwt) > L1 <- -.5824 + .0755*agex - .02037*mean(lwt) + 2.9211 - .16182*agex > L2 <- -.5824 + .0755*agex - .02037*mean(lwt) + 9.2445 - .4110*agex Next we calculate the associated conditional probabilities. > P <- exp(L)/(1+exp(L)) > P1 <- exp(L1)/(1+exp(L1)) > P2 <- exp(L2)/(1+exp(L2)) Finally we plot the probability curves as function of age and FTV. > plot(agex,P,type="l",xlab="Age",ylab="P(Low|Age,FTV)",ylim=c(0,1)) > lines(agex,P1,lty=2,col="blue") > lines(agex,P2,lty=3,col="red") > title(main="Interaction Between Age and First Trimester Visits",cex=.6) The interaction between in age and FTV produces differences in direction and magnitude of the age effect. For women with no first trimester doctor visits their probability of low birth weight increases with age. However for women with at least one first trimester visit the probability of low birth weight decreases with age. The magnitude of that drop is largest for women with 2 or more first trimester visits. We also have an interaction between smoking and uterine irritability added to the model. This will affect how we interpret the two in terms of odds ratios. We need to consider the OR associated with smoking for women without uterine irritability, the OR associated with uterine irritability for nonsmokers, and finally the OR associated with smoking and having uterine irritability during pregnancy. 236 These estimated odds ratios are given below: OR for Smoking with No Uterine Irritability > exp(.7800) [1] 2.181472 OR for Uterine Irritability with No Smoking > exp(1.8185) [1] 6.162608 OR for Smoking and Uterine Irritability > exp(.7800+1.8185-1.91664) [1] 1.977553 This result is hard to explain physiologically and so this interaction term might be removed from the model. Model Selection Methods Stepwise methods used in logistic regression are the same as those used in ordinary least square regression however the measure is the AIC (Akaike Information Criteria) as opposed to Mallow’s Ck statistic. Like Mallow’s statistic, AIC balances residual deviance and the number of parameters in the model. AIC = D + 2k ˆ Where D = residual deviance, k = total number of estimated parameters, and ˆ is an estimate of the dispersion parameter which is taken to be 1 in models where overdispersion is not present. Overdispersion occurs when the data consists of the number of successes out of mi > 1 trials and the trials are not independent (e.g. male birth data from your last homework). Forward, backward, both forward and backward simultaneously, and all possible subsets regression methods can be employed to find models with small AIC values. By default R uses both forward and backward selection simultaneously. The command to do this in R has the basic form: > step(current model name) To have it select from models containing all potential two-way interactions use: > step(current model name, scope=~.^2) This sometimes will have problems with convergence due to overfitting (i.e. the estimated probabilities approach 0 and 1 as in the saturated model). If this occurs you can have R consider adding each of the potential interaction terms and then you can scan the list and decide which you might want to add to your existing model. You can then continue adding terms until the AIC criteria suggests additional terms do not improve current model. 237 These commands are illustrated for the low birth weight data with first trimester visits included in the output shown below. Base Model > low.glm <- glm(low~age+lwt+race+smoke+ht+ui+ptd+ftv,family=binomial) > summary(low.glm) Call: glm(formula = low ~ age + lwt + race + smoke + ht + ui + ptd + ftv, family = binomial) Deviance Residuals: Min 1Q Median -1.7038 -0.8068 -0.5009 3Q 0.8836 Max 2.2151 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.822706 1.240174 0.663 0.50709 age -0.037220 0.038530 -0.966 0.33404 lwt -0.015651 0.007048 -2.221 0.02637 * race2 1.192231 0.534428 2.231 0.02569 * race3 0.740513 0.459769 1.611 0.10726 smoke1 0.755374 0.423246 1.785 0.07431 . ht1 1.912974 0.718586 2.662 0.00776 ** ui1 0.680162 0.463464 1.468 0.14222 ptd1 1.343654 0.479409 2.803 0.00507 ** ftv1 -0.436331 0.477792 -0.913 0.36112 ftv2+ 0.178939 0.455227 0.393 0.69426 --Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 234.67 Residual deviance: 195.48 AIC: 217.48 on 188 on 178 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 3 Find “best” model that includes all potential two-way interactions > low.step <- step(low.glm,scope=~.^2) Start: AIC= 217.48 low ~ age + lwt + race + smoke + ht + ui + ptd + ftv + age:ftv - ftv - age <none> - ui + smoke:ui + lwt:smoke + ui:ptd + lwt:ui + ptd:ftv + ht:ptd Df Deviance AIC 2 183.00 209.00 2 196.83 214.83 1 196.42 216.42 195.48 217.48 1 197.59 217.59 1 193.76 217.76 1 194.04 218.04 1 194.24 218.24 1 194.28 218.28 2 192.38 218.38 1 194.55 218.55 238 + + + + + + + + + + + + + + + + + + + + age:ptd age:ht age:smoke race:ui smoke smoke:ht smoke:ptd race race:smoke lwt:ptd lwt:ht age:lwt age:ui ht:ftv lwt:ftv smoke:ftv age:race lwt:race race:ptd lwt race:ht ui:ftv ht ptd race:ftv 1 1 1 2 1 1 1 2 2 1 1 1 1 2 2 2 2 2 2 1 2 2 1 1 4 194.58 194.59 194.61 192.63 198.67 195.03 195.16 201.23 193.24 195.35 195.44 195.46 195.47 194.00 194.19 194.47 194.58 194.63 194.83 200.95 195.19 195.32 202.93 203.58 193.81 218.58 218.59 218.61 218.63 218.67 219.03 219.16 219.23 219.24 219.35 219.44 219.46 219.47 220.00 220.19 220.47 220.58 220.63 220.83 220.95 221.19 221.32 222.93 223.58 223.81 Step: AIC= 209 low ~ age + lwt + race + smoke + ht + ui + ptd + ftv + age:ftv + smoke:ui + lwt:smoke - race <none> + ui:ptd + lwt:ui + ht:ptd - smoke + age:smoke + race:ui + age:ptd - ui + smoke:ht + lwt:ptd + smoke:ptd + age:ht + age:ui + age:lwt + lwt:ht + race:smoke + lwt:ftv + ptd:ftv + age:race + smoke:ftv + ht:ftv + lwt:race + race:ht Df Deviance AIC 1 179.94 207.94 1 180.89 208.89 2 186.99 208.99 183.00 209.00 1 181.42 209.42 1 181.90 209.90 1 182.06 210.06 1 186.11 210.11 1 182.16 210.16 2 180.32 210.32 1 182.50 210.50 1 186.61 210.61 1 182.71 210.71 1 182.75 210.75 1 182.82 210.82 1 182.90 210.90 1 182.96 210.96 1 183.00 211.00 1 183.00 211.00 2 181.23 211.23 2 181.44 211.44 2 181.57 211.57 2 181.62 211.62 2 181.65 211.65 2 181.82 211.82 2 182.55 212.55 2 182.78 212.78 239 + + + - race:ptd lwt ui:ftv ht ptd race:ftv age:ftv 2 1 2 1 1 4 2 182.85 188.88 182.94 190.13 191.05 181.69 195.48 212.85 212.88 212.94 214.13 215.05 215.69 217.48 Step: AIC= 207.94 low ~ age + lwt + race + smoke + ht + ui + ptd + ftv + age:ftv + smoke:ui - race <none> + lwt:smoke + ht:ptd - smoke:ui + ui:ptd + age:ptd + age:smoke + smoke:ptd + lwt:ptd + lwt:ui + age:ht + smoke:ht + age:lwt + age:ui + lwt:ht + lwt:ftv + ptd:ftv + smoke:ftv + race:smoke + age:race + ht:ftv + race:ui + ui:ftv + race:ht + lwt:race + race:ptd - lwt - ht + race:ftv - ptd - age:ftv Df Deviance AIC 2 183.07 207.07 179.94 207.94 1 178.34 208.34 1 178.89 208.89 1 183.00 209.00 1 179.07 209.07 1 179.35 209.35 1 179.37 209.37 1 179.58 209.58 1 179.61 209.61 1 179.76 209.76 1 179.78 209.78 1 179.82 209.82 1 179.84 209.84 1 179.86 209.86 1 179.94 209.94 2 178.25 210.25 2 178.53 210.53 2 178.64 210.64 2 178.73 210.73 2 178.84 210.84 2 178.89 210.89 2 179.13 211.13 2 179.50 211.50 2 179.52 211.52 2 179.68 211.68 2 179.86 211.86 1 187.15 213.15 1 187.66 213.66 4 178.51 214.51 1 188.83 214.83 2 193.76 217.76 Step: AIC= 207.07 low ~ age + lwt + smoke + ht + ui + ptd + ftv + age:ftv + smoke:ui <none> + lwt:smoke + ui:ptd + ht:ptd + race + age:smoke + age:ht Df Deviance 183.07 1 181.40 1 181.88 1 181.93 2 179.94 1 181.97 1 182.64 AIC 207.07 207.40 207.88 207.93 207.94 207.97 208.64 240 + + + + + + + + + + + + + - age:ptd lwt:ptd lwt:ui smoke:ptd age:lwt smoke:ui age:ui smoke:ht lwt:ht smoke:ftv lwt:ftv ptd:ftv ui:ftv ht:ftv ht lwt ptd age:ftv 1 1 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 2 182.69 182.73 182.76 182.85 182.92 186.99 182.99 183.02 183.06 181.48 181.69 181.85 182.28 182.41 191.21 191.56 193.59 199.00 208.69 208.73 208.76 208.85 208.92 208.99 208.99 209.02 209.06 209.48 209.69 209.85 210.28 210.41 213.21 213.56 215.59 219.00 Summarize the model returned from the stepwise search > summary(low.step) Call: glm(formula = low ~ age + lwt + smoke + ht + ui + ptd + ftv + age:ftv + smoke:ui, family = binomial) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.582389 1.420834 -0.410 0.681885 age 0.075538 0.053945 1.400 0.161428 lwt -0.020372 0.007488 -2.721 0.006513 ** smoke1 0.780047 0.420043 1.857 0.063302 . ht1 2.065680 0.748330 2.760 0.005773 ** ui1 1.818496 0.666670 2.728 0.006377 ** ptd1 1.560304 0.496626 3.142 0.001679 ** ftv1 2.921068 2.284093 1.279 0.200941 ftv2+ 9.244460 2.650495 3.488 0.000487 *** age:ftv1 -0.161823 0.096736 -1.673 0.094360 . age:ftv2+ -0.411011 0.118553 -3.467 0.000527 *** smoke1:ui1 -1.916644 0.972366 -1.971 0.048711 * Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 234.67 on 188 degrees of freedom Residual deviance: 183.07 on 177 degrees of freedom AIC: 207.07 Number of Fisher Scoring iterations: 4 This is the model used to demonstrate model interpretation in the presence of interactions. An alternative to the full blown search above is to consider adding a single interaction term to the “Base Model” from the set of all possible terms. > add1(low.glm,scope=~.^2) Single term additions 241 Model: low ~ age + lwt + race Df Deviance <none> 195.48 age:lwt 1 195.46 age:race 2 194.58 age:smoke 1 194.61 age:ht 1 194.59 age:ui 1 195.47 age:ptd 1 194.58 age:ftv 2 183.00 lwt:race 2 194.63 lwt:smoke 1 194.04 lwt:ht 1 195.44 lwt:ui 1 194.28 lwt:ptd 1 195.35 lwt:ftv 2 194.19 race:smoke 2 193.24 race:ht 2 195.19 race:ui 2 192.63 race:ptd 2 194.83 race:ftv 4 193.81 smoke:ht 1 195.03 smoke:ui 1 193.76 smoke:ptd 1 195.16 smoke:ftv 2 194.47 ht:ui 0 195.48 ht:ptd 1 194.55 ht:ftv 2 194.00 ui:ptd 1 194.24 ui:ftv 2 195.32 ptd:ftv 2 192.38 + smoke + ht + ui + ptd + ftv AIC 217.48 219.46 220.58 218.61 218.59 219.47 218.58 209.00 * 220.63 218.04 219.44 218.28 219.35 220.19 219.24 221.19 218.63 220.83 223.81 219.03 217.76 219.16 220.47 217.48 218.55 220.00 218.24 221.32 218.38 We can than “manually” enter this term to our base model by using the update command in R. > low.glm2 <- update(low.glm,.~.+age:ftv) > summary(low.glm2) Call: glm(formula = low ~ age + lwt + race + smoke + ht + ui + ptd + ftv + age:ftv, family = binomial) Deviance Residuals: Min 1Q Median -2.0338 -0.7690 -0.4510 3Q 0.8354 Max 2.3383 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.636485 1.558677 -1.050 0.29376 age 0.085461 0.055734 1.533 0.12519 lwt -0.017599 0.007653 -2.300 0.02147 * race2 0.994134 0.550962 1.804 0.07118 . race3 0.700669 0.491400 1.426 0.15391 smoke1 0.792972 0.452303 1.753 0.07957 . ht1 1.936204 0.747576 2.590 0.00960 ** 242 ui1 0.938620 ptd1 1.373390 ftv1 2.877889 ftv2+ 8.264965 age:ftv1 -0.149619 age:ftv2+ -0.359454 --Signif. codes: 0 `***' 0.492240 0.495738 2.253710 2.594444 0.096342 0.115429 1.907 2.770 1.277 3.186 -1.553 -3.114 0.05654 0.00560 0.20162 0.00144 0.12043 0.00185 . ** ** ** 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 234.67 on 188 degrees of freedom Residual deviance: 183.00 on 176 degrees of freedom AIC: 209 Number of Fisher Scoring iterations: 4 Next we could use add1 to consider the remaining interaction terms for addition to this model. > add1(low.glm2,scope=~.^2) Single term additions Model: low ~ age + lwt + race + smoke + ht + ui + ptd + ftv + age:ftv Df Deviance AIC <none> 183.00 209.00 age:lwt 1 183.00 211.00 age:race 2 181.62 211.62 age:smoke 1 182.16 210.16 age:ht 1 182.90 210.90 age:ui 1 182.96 210.96 age:ptd 1 182.50 210.50 lwt:race 2 182.55 212.55 lwt:smoke 1 180.89 208.89 * lwt:ht 1 183.00 211.00 lwt:ui 1 181.90 209.90 lwt:ptd 1 182.75 210.75 lwt:ftv 2 181.44 211.44 race:smoke 2 181.23 211.23 race:ht 2 182.78 212.78 race:ui 2 180.32 210.32 race:ptd 2 182.85 212.85 race:ftv 4 181.69 215.69 smoke:ht 1 182.71 210.71 smoke:ui 1 179.94 207.94 ** smoke:ptd 1 182.82 210.82 smoke:ftv 2 181.65 211.65 ht:ui 0 183.00 209.00 ht:ptd 1 182.06 210.06 ht:ftv 2 181.82 211.82 ui:ptd 1 181.42 209.42 ui:ftv 2 182.94 212.94 ptd:ftv 2 181.57 211.57 243 Motivating Example: Recumbant Cows “The abiltiy of biochemical and haematolgical tests to predict recovery in periparturient recumbent cows.” NZ Veterinary Journal, 35, 126-133 Clark, R. G., Henderson, H. V., Hoggard, G. K. Ellison, R. S. and Young,B. J. (1987). Study Description: For unknown reasons, many pregnant dairy cows become recumbant--they lay down-either shortly before or after calving. This condition can be serious, and may lead to death of the cow. These data are from a study of blood samples of over 500 cows studied at the Ruakura (N.Z.) Animal Health Laboratory during 1983-84. A variety of blood tests were performed, and for many of the animals the outcome (survived, died, or animal was killed) was determined. The goal is to see if survival can be predicted from the blood measurements. Case numbers 12607 and 11630 were noted as having exceptional care---and they survived. Name AST Calving CK Daysrec Inflamat Myopathy Outcome PCV Urea CaseNo Type Variate Variate Variate Variate Variate Variate Variate Variate Variate Text n 429 431 413 432 136 222 435 175 266 435 Info serum asparate amino transferase (U/l at 30C) 0 if measured before calving, 1 if after Serum creatine phosphokinase (U/l at 30C) Days recumbent inflamation 0=no, 1=yes Muscle disorder, 1 if present, 0 if absent outcome: 1 if survived, 0 if died or killed (response) Packed Cell Volume (Haemactocrit), % serum urea (mmol/l) case number Because calving, inflammation, and myopathy are Bernoulli dichotomous predictors they will not be transformed, although we might consider potential interactions involving these predictors. We will not consider inflammation and myopathy however as most of the cows have that information missing. Guidelines for Transforming Predictors in Logistic Regression Examining univariate conditional density plots for continuous predictors f ( xi | y ) (Cook & Weisberg) Consider, 1 if success f ( x | y )  the conditional density of x given outcome variable y   0 if failure Idea: 244 Univariate considerations Suggested model terms f ( x | y) Normal, common variance x i.e. Var ( x | y  0)  Var ( x | y  1) Normal, unequal variances x and x2 i.e. Var ( x | y  0)  Var ( x | y  1) v Skewed right x and log2(x) x  [0,1] x is dichotomous, Bernoulli x ~ Poisson, i.e. x is a count base 2 is easier to interpret log(x) , log(1-x) x x Multivariate considerations When considering multiple continuous predictors simultaneously we look at multivariate normality. If f ( x | y ) ~ MVN (  y k , ) then use the x’s themselves f ( x | y ) ~ MVN (  y  k ,  y  k ) then include x i2 ’s and x i x j terms ~ ~ For example in the two predictor case (p = 2) x1 x2 is needed if E ( x1 | x 2 )   o  1, y  k x 2 and if the variances are different for the xi across levels of y then we add x i2 terms as well. AST 245 Clearly AST has skewed distribution and using the log 2 ( AST ) in the model would be recommended. After transformation we have In f (log 2 ( AST ) | Outcome) appears to approximately normal for both levels with a constant variance so quadratic terms in the log scale are not suggested. CK Clearly CK is extremely right skewed and would benefit from log transformation. 246 Again the conditional densities appear approximately normal with equal variance, so we will consider adding log 2 (CK ) only to the model. PCV f ( PCV | Outcome) is approximately normal for both outcome groups but the variation in PCV levels appear to be higher for cows that survived. Thus we will consider PCV and PCV2 terms in the model. Daysrec 247 Despite the fact that Daysrec is right skewed we will not log transform it. It represents a count of the number of days the cow was recumbent, so it could be modeled as a Poisson and thus the only term recommended is the Daysrec itself. Urea Consider the log transformation of urea level. f (log 2 (Urea) | Outcome) is approximately normal however the variation for cows that survived appears larger so we will consider both log 2 (Urea) and log 2 (Urea) 2 terms. 248 Data set = Downer, Name of Fit = B2 372 cases are missing at least one value. (PCV has lots of missing values also) Binomial Regression Kernel mean function = Logistic Response = Outcome Terms = (AST log2[AST] CK log2[CK] Urea log2[Urea] log2[Urea]^2 PCV PCV^2 Daysrec Calving) Trials = Ones Coefficient Estimates Label Estimate Std. Error Est/SE p-value Constant -1.03935 6.35298 -0.164 0.8700 AST -0.000720027 0.00242524 -0.297 0.7666 log2[AST] -0.330179 0.554239 -0.596 0.5514 CK -0.000109772 0.000135315 -0.811 0.4172 log2[CK] -0.0121434 0.223648 -0.054 0.9567 Urea -1.13453 1.05860 -1.072 0.2838 log2[Urea] 0.730468 2.89371 0.252 0.8007 log2[Urea]^2 0.660165 1.38757 0.476 0.6342 PCV 0.182480 0.224691 0.812 0.4167 PCV^2 -0.00165620 0.00325722 -0.508 0.6111 Daysrec -0.391937 0.157490 -2.489 0.0128 Calving 1.28561 0.648089 1.984 0.0473 Scale factor: Number of cases: Number of cases used: Degrees of freedom: Pearson X2: Deviance: 1. 435 165 153 127.410 141.988 Clearly we have some model reduction to do, as many of the current terms are not significant. Before backward eliminating we will drop all of the non-transformed versions of log scale predictors. Coefficient Estimates Label Estimate Std. Error Est/SE Constant -3.82598 5.84498 -0.655 log2[AST] -0.554005 0.293416 -1.888 log2[CK] -0.118575 0.160536 -0.739 log2[Urea] 4.09939 3.12355 1.312 log2[Urea]^2 -0.978895 0.545929 -1.793 PCV 0.218085 0.213730 1.020 PCV^2 -0.00229912 0.00305947 -0.751 Daysrec -0.383179 0.153758 -2.492 Calving 1.39322 0.647605 2.151 Scale factor: 1. Number of cases: 435 Number of cases used: 165 Degrees of freedom: 156 Pearson X2: 134.154 Deviance: 145.123 Backward Elimination: Sequentially remove terms that give the smallest change in AIC. All fits include an intercept. p-value 0.5127 0.0590 0.4601 0.1894 0.0730 0.3075 0.4524 0.0127 0.0314 249 Current terms: (log2[AST] log2[CK] log2[Urea] log2[Urea]^2 Daysrec Calving) df Deviance Pearson X2 | Delete: log2[CK] 157 145.671 134.797 | Delete: PCV^2 157 145.786 134.995 | Delete: PCV 157 146.392 135.415 | Delete: log2[Urea] 157 148.141 140.787 | Delete: log2[AST] 157 148.92 140.737 | Delete: Calving 157 150.163 141.672 | Delete: Daysrec 157 151.993 135.976 | Delete: log2[Urea]^2 157 152.536 143.299 | PCV PCV^2 k 8 8 8 8 8 8 8 8 Current terms: (log2[AST] log2[Urea] log2[Urea]^2 PCV Calving) df Deviance Pearson X2 Delete: PCV^2 158 146.202 135.813 Delete: PCV 158 146.701 136.211 Delete: log2[Urea] 158 149.035 142.035 Delete: Calving 158 151.207 140.587 Delete: Daysrec 158 152.168 136.078 Delete: log2[Urea]^2 158 153.767 145.12 Delete: log2[AST] 158 161.383 144.17 | | | | | | | | k 7 7 7 7 7 7 7 Current terms: (log2[AST] log2[Urea] log2[Urea]^2 PCV df Deviance Pearson X2 Delete: PCV 159 148.955 137.789 Delete: log2[Urea] 159 150.035 144.626 Delete: Calving 159 152.176 141.179 Delete: Daysrec 159 152.699 136.298 Delete: log2[Urea]^2 159 155.31 149.108 Delete: log2[AST] 159 163.059 140.738 Daysrec | k | 6 | 6 | 6 | 6 | 6 | 6 AIC 161.671 161.786 162.392 164.141 164.920 166.163 167.993 168.536 PCV^2 Daysrec AIC 160.202 * 160.701 163.035 165.207 166.168 167.767 175.383 Calving) AIC 160.955 162.035 164.176 164.699 167.310 175.059 Current terms: (log2[AST] log2[Urea] log2[Urea]^2 Daysrec Calving) df Deviance Pearson X2 | k AIC Delete: log2[Urea] 160 152.373 144.523 | 5 162.373 Delete: Daysrec 160 155.744 138.388 | 5 165.744 Delete: Calving 160 155.99 142.871 | 5 165.990 Delete: log2[Urea]^2 160 157.017 148.417 | 5 167.017 Delete: log2[AST] 160 164.785 143.03 | 5 174.785 Current terms: (log2[AST] log2[Urea]^2 Daysrec Calving) df Deviance Pearson X2 | Delete: Calving 161 160.932 150.399 | Delete: Daysrec 161 162.036 146.037 | Delete: log2[AST] 161 169.755 148.817 | Delete: log2[Urea]^2 161 176.794 157.24 | k 4 4 4 4 AIC 168.932 170.036 177.755 184.794 Current terms: (log2[AST] log2[Urea]^2 Daysrec) df Deviance Pearson X2 | Delete: Daysrec 162 167.184 150.961 | Delete: log2[AST] 162 178.021 150.618 | Delete: log2[Urea]^2 162 181.641 162.028 | k 3 3 3 AIC 173.184 184.021 187.641 Current terms: (log2[AST] log2[Urea]^2) df Deviance Delete: log2[Urea]^2 163 182.688 Delete: log2[AST] 163 192.479 k 2 2 AIC 186.688 196.479 Pearson X2 | 162.386 | 151.943 | 250 Forward selection suggests the same model. “Final” Model Data set = Downer, Name of Fit = B5 372 cases are missing at least one value. Binomial Regression Kernel mean function = Logistic Response = Outcome Terms = (log2[AST] log2[Urea] log2[Urea]^2 PCV Daysrec Calving) Trials = Ones Coefficient Estimates Label Estimate Std. Error Est/SE p-value Constant -1.12404 5.01853 -0.224 0.8228 log2[AST] -0.733670 0.196044 -3.742 0.0002 log2[Urea] 4.44950 3.17044 1.403 0.1605 log2[Urea]^2 -1.05918 0.554282 -1.911 0.0560 PCV 0.0514512 0.0335256 1.535 0.1249 Daysrec -0.386695 0.153067 -2.526 0.0115 Calving 1.44641 0.623820 2.319 0.0204 Scale factor: Number of cases: Number of cases used: Degrees of freedom: Pearson X2: Deviance: 1. 435 170 163 138.509 148.269 Diagnostics and Model Checking Plots Chi-residuals vs. estimated logits ~ Looks good. 251 Cook’s Distance and Leverage vs. Case Numbers Model Checking Plots (Estimated Logits and Marginals) LOGIT 252 AST UREA 253 PCV DAYSREC All of these plots look OK. The largest departure observed is in the case of urea but the discrepancy there is primarily due to one observation stands out from the rest. 254 In R To replicate the analysis above in R you will need the following functions to look and the conditional densities: 𝑓(𝑥|𝑦 = 0) and 𝑓(𝑥|𝑦 = 1) The first two functions are used to make pretty histograms in the conplot function. The function conplot replicates the fitting density estimates conditional on the value of the outcome variable Y, by taking the predictor X and the Y as arguments. If there are missing values on either the response or the predictor those cases are automatically removed before constructing the plot. nclass.FD = function (x) { r <- quantile(x, c(0.25, 0.75)) names(r) <- NULL h <- 2 * (r[2] - r[1]) * length(x)^{ -1/3 } ceiling(diff(range(x))/h) } bandwidth.nrd = function (x) { r <- quantile(x, c(0.25, 0.75)) h <- (r[2] - r[1])/1.34 4 * 1.06 * min(sqrt(var(x)), h) * length(x)^(-1/5) } conplot = function (x, xname = deparse(substitute(x)),y) { xname <- deparse(substitute(x)) data = na.omit(cbind(x,y)) x = data[,1] y = as.numeric(data[,2]) lev = unique(y) par(err = -1) dens0 <- density(x[y==0], width = bandwidth.nrd(x[y==lev[1]])) dens1 <- density(x[y==1], width = bandwidth.nrd(x[y==lev[2]])) ylim <- range(c(dens0$y,dens1$y)) xlim <- range(c(dens0$x,dens1$x)) hist(x, nclass.FD(x), prob = T, xlab = xname, xlim = xlim, ylim = ylim,main=paste("Conditional X|Y Plot of ",xname)) lines(dens0,col="blue") lines(dens1,col="red") invisible() } 255 > conplot(x=AST,y=Outcome) > conplot(x=log(AST),y=Outcome) Etc… To obtain model checking plots in R you will need to install the package car from the CRAN which essentially is a collection of functions to replicate Arc in R. The two functions that create model checking plots in the car library are called mmp and mmps, the latter creates model checking plots for each predictor as well as the overall fit. 256 Downer Example in R > mod1 = glm(Outcome~AST+Urea+PCV+Calving+Daysrec+CK,family="binomial") > summary(mod1) Call: glm(formula = Outcome ~ AST + Urea + PCV + Calving + Daysrec + CK, family = "binomial") Deviance Residuals: Min 1Q Median -1.7678 -0.7541 -0.1928 3Q 0.7546 Max 2.0696 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.3313771 1.1987644 0.276 0.78222 AST -0.0022405 0.0014726 -1.521 0.12815 Urea -0.3140380 0.0770497 -4.076 4.59e-05 *** PCV 0.0601745 0.0339726 1.771 0.07652 . Calving 1.3192777 0.6238318 2.115 0.03445 * Daysrec -0.4804961 0.1498000 -3.208 0.00134 ** CK -0.0001435 0.0001121 -1.280 0.20068 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 212.71 on 164 degrees of freedom Residual deviance: 146.39 on 158 degrees of freedom (270 observations deleted due to missingness) AIC: 160.39 Number of Fisher Scoring iterations: 7 > mmps(mod1) 257 Using the same approach as in the analysis in Arc we might use the model with several terms based on predictor transformations. > > > > > logAST = log2(AST) logCK = log2(CK) logUrea = log2(Urea) logUrea2 = logUrea^2 PCV2 = PCV^2 > Downer2 = data.frame(Outcome,logAST,logCK,logUrea,logUrea2,PCV,PCV2,Daysrec,Calving) > Downer2 = na.omit(Downer2) > attach(Downer2) > mod2 = glm(Outcome~logAST+logCK+logUrea+logUrea2+PCV+PCV2+ Daysrec+Calving,family="binomial",data=Downer2) > summary(mod2) Call: glm(formula = Outcome ~ logAST + logCK + logUrea + logUrea2 + PCV + PCV2 + Daysrec + Calving, family = "binomial", data = Downer2) Deviance Residuals: Min 1Q Median -1.9522 -0.7094 -0.2869 3Q 0.7109 Max 2.0585 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.826289 5.856671 -0.653 0.5135 logAST -0.554007 0.293529 -1.887 0.0591 . logCK -0.118574 0.160621 -0.738 0.4604 logUrea 4.099642 3.137295 1.307 0.1913 logUrea2 -0.978940 0.548487 -1.785 0.0743 . PCV 0.218083 0.213771 1.020 0.3076 PCV2 -0.002299 0.003060 -0.751 0.4525 Daysrec -0.383178 0.153795 -2.491 0.0127 * Calving 1.393222 0.647759 2.151 0.0315 * --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 212.71 Residual deviance: 145.12 AIC: 163.12 on 164 on 156 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 7 258 Backwards eliminate using the step() function > mod3 = step(mod2) Start: AIC=163.12 Outcome ~ logAST + logCK + logUrea + logUrea2 + PCV + PCV2 + Daysrec + Calving - logCK - PCV2 - PCV <none> - logUrea - logAST - Calving - Daysrec - logUrea2 Df Deviance AIC 1 145.67 161.67 1 145.79 161.79 1 146.39 162.39 145.12 163.12 1 148.14 164.14 1 148.92 164.92 1 150.16 166.16 1 151.99 167.99 1 152.54 168.54 Step: AIC=161.67 Outcome ~ logAST + logUrea + logUrea2 + PCV + PCV2 + Daysrec + Calving - PCV2 - PCV <none> - logUrea - Calving - Daysrec - logUrea2 - logAST Df Deviance AIC 1 146.20 160.20 1 146.70 160.70 145.67 161.67 1 149.03 163.03 1 151.21 165.21 1 152.17 166.17 1 153.77 167.77 1 161.38 175.38 Step: AIC=160.2 Outcome ~ logAST + logUrea + logUrea2 + PCV + Daysrec + Calving <none> - PCV - logUrea - Calving - Daysrec - logUrea2 - logAST Df Deviance 146.20 1 148.96 1 150.03 1 152.18 1 152.70 1 155.31 1 163.06 AIC 160.20 160.96 162.03 164.18 164.70 167.31 175.06 > summary(mod3) Call: glm(formula = Outcome ~ logAST + logUrea + logUrea2 + PCV + Daysrec + Calving, family = "binomial", data = Downer2) Deviance Residuals: Min 1Q Median -2.0329 -0.6836 -0.2644 3Q 0.7002 Max 2.0893 259 Coefficients: Estimate Std. Error z value (Intercept) -1.49051 5.04944 -0.295 logAST -0.72986 0.19514 -3.740 logUrea 4.61037 3.19802 1.442 logUrea2 -1.08728 0.55899 -1.945 PCV 0.05489 0.03370 1.629 Daysrec -0.37191 0.15422 -2.411 Calving 1.45572 0.62845 2.316 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 Pr(>|z|) 0.767853 0.000184 0.149406 0.051768 0.103394 0.015888 0.020538 *** . * * ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 212.71 Residual deviance: 146.20 AIC: 160.2 on 164 on 158 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 7 > mmps(mod3) 260 We could consider adding interaction terms to our “final” model. This is easily done using the scope option. > mod3 = step(mod2,scope=~.^2) > summary(mod3) Call: glm(formula = Outcome ~ logAST + logUrea + logUrea2 + PCV + PCV2 + Daysrec + Calving + PCV:Calving + logAST:PCV2 + logAST:PCV, family = "binomial", data = Downer2) Deviance Residuals: Min 1Q Median -1.8712 -0.6639 -0.1012 3Q 0.6784 Max 2.5298 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 44.950653 39.720672 1.132 0.2578 logAST -9.472551 5.720701 -1.656 0.0978 . logUrea 6.107698 3.425871 1.783 0.0746 . logUrea2 -1.315449 0.602659 -2.183 0.0291 * PCV -3.814468 2.386123 -1.599 0.1099 PCV2 0.069737 0.036368 1.918 0.0552 . Daysrec -0.388998 0.162716 -2.391 0.0168 * Calving 12.376954 6.037254 2.050 0.0404 * PCV:Calving -0.322466 0.168853 -1.910 0.0562 . logAST:PCV2 -0.010067 0.005179 -1.944 0.0519 . logAST:PCV 0.608959 0.344917 1.766 0.0775 . --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 212.71 Residual deviance: 134.15 AIC: 156.15 on 164 on 154 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 7 261 > mmps(mod3) There is a slight improvement in fit. The same model fit in JMP produces the following ROC curve. The resulting classification is very good using this model. 262 MORE EXAMPLES OF LOGISTIC REGRESSION Example 8.1 - Classification of Credit Card Defaults In this example, we seek to develop classification models to predict which customers will default on their credit card debt. The data frame is called Default and is in the ISLR library. The variables in the data frame are summarized below: > summary(Default) default No :9667 Yes: 333 student No :7056 Yes:2944 balance Min. : 0.0 1st Qu.: 481.7 Median : 823.6 Mean : 835.4 3rd Qu.:1166.3 Max. :2654.3 income Min. : 772 1st Qu.:21340 Median :34553 Mean :33517 3rd Qu.:43808 Max. :73554 1 𝑖𝑓 𝑑𝑒𝑓𝑎𝑢𝑙𝑡 = 𝒀𝒆𝒔 The response is 𝑑𝑒𝑓𝑎𝑢𝑙𝑡 = { and the predictors are customer student 0 𝑖𝑓 𝑑𝑒𝑓𝑎𝑢𝑙𝑡 = 𝑵𝒐 status (Yes or No), the average balance on credit card ($) after making their monthly payment, and the customer’s annual income ($). > def.glm1 = glm(default~student,data=Default,family="binomial") > summary(def.glm1) Call: glm(formula = default ~ student, family = "binomial", data = Default) Deviance Residuals: Min 1Q Median -0.2970 -0.2970 -0.2434 3Q -0.2434 Max 2.6585 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.50413 0.07071 -49.55 < 2e-16 *** studentYes 0.40489 0.11502 3.52 0.000431 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 2920.6 Residual deviance: 2908.7 AIC: 2912.7 on 9999 on 9998 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 6 > > > > logits = predict(def.glm1,type=”link”) Pdefault = 1/(1+exp(-logits)) table(Pdefault) table(Pdefault) Pdefault 0.0291950113382457 0.0431385869565177 7056 2944 263 > table(student,default) default student No Yes No 6850 206 Yes 2817 127 > mosaicplot(~student+default,color=3:5,main=”Mosaic Plot of Defaults vs. Student Status”) > def.glm2 = glm(default~.,data=Default,family="binomial") > summary(def.glm2) Call: glm(formula = default ~ ., family = "binomial", data = Default) Deviance Residuals: Min 1Q Median -2.469 -0.142 -0.056 3Q -0.020 Max 3.738 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.09e+01 4.92e-01 -22.08 <2e-16 *** studentYes -6.47e-01 2.36e-01 -2.74 0.0062 ** balance 5.74e-03 2.32e-04 24.74 <2e-16 *** income 3.03e-06 8.20e-06 0.37 0.7115 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 2920.6 Residual deviance: 1571.5 AIC: 1580 on 9999 on 9996 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 8 > PrDefault = function(x,student){ L = -10.9 - .647*student + .00574*balance + .00000303*33517 264 1/(1+exp(-L)) } > plot(balance,PrDefault(balance,student=1),col=5,ylab="P(Default|X)",pch=19) > points(balance,PrDefault(balance,student=0),col=3,pch=20) > legend(250,.8,c("Student","Non-student"),col=c(3,5),pch=c(19,20)) > par(mfrow=c(1,2)) > boxplot(split(balance,student),col=c(3:5)) > boxplot(split(income,student),col=c(3:5)) 265 > def.glm3 = glm(default~.^2,data=Default,family="binomial") > summary(def.glm3) Call: glm(formula = default ~ .^2, family = "binomial", data = Default) Deviance Residuals: Min 1Q Median -2.485 -0.142 -0.055 3Q -0.020 Max 3.758 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.10e+01 1.87e+00 -5.91 3.3e-09 *** studentYes -5.20e-01 1.34e+00 -0.39 0.70 balance 5.88e-03 1.18e-03 4.98 6.3e-07 *** income 4.05e-06 4.46e-05 0.09 0.93 studentYes:balance -2.55e-04 7.90e-04 -0.32 0.75 studentYes:income 1.45e-05 2.78e-05 0.52 0.60 balance:income -1.58e-09 2.82e-08 -0.06 0.96 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 2920.6 Residual deviance: 1571.1 AIC: 1585 on 9999 on 9993 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 8 > def.step = step(def.glm3) Start: AIC=1585 default ~ (student + balance + income)^2 - balance:income - student:balance - student:income <none> Df Deviance AIC 1 1571 1583 1 1571 1583 1 1571 1583 1571 1585 Step: AIC=1583 default ~ student + balance + income + student:balance + student:income - student:balance - student:income <none> Df Deviance AIC 1 1571 1581 1 1571 1581 1571 1583 Step: AIC=1581 default ~ student + balance + income + student:income - student:income <none> - balance Df Deviance AIC 1 1572 1580 1571 1581 1 2907 2915 Step: AIC=1580 default ~ student + balance + income 266 - income <none> - student - balance Df Deviance AIC 1 1572 1578 1572 1580 1 1579 1585 1 2907 2913 Step: AIC=1578 default ~ student + balance <none> - student - balance Df Deviance AIC 1572 1578 1 1596 1600 1 2909 2913 > summary(def.step) Call: glm(formula = default ~ student + balance, family = "binomial", data = Default) Deviance Residuals: Min 1Q Median -2.458 -0.142 -0.056 3Q -0.020 Max 3.743 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.07e+01 3.69e-01 -29.12 < 2e-16 *** studentYes -7.15e-01 1.48e-01 -4.85 1.3e-06 *** balance 5.74e-03 2.32e-04 24.75 < 2e-16 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 2920.6 Residual deviance: 1571.7 AIC: 1578 on 9999 on 9997 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 8 The package ROCR contains functions to examine classification performance of a model where predicted probabilities of class membership are returned by modeling method. Given the estimated probabilities we first run the function pred to compare predicted probabilities to actual class memberships for the training data. We can then examine various performance measures and plot them by using the perf command. The ROC curve is obtained by running the performance function with the True Positive Rate (tpr) and False Positive Rate (fpr) as arguments. Plotting the results will give the ROC curve. The area underneath the curve requires another run of perf function with “auc” as the performance measure. This process is demonstrated below for our simple model to classify credit card defaulters. 267 > > > > > > library(ROCR) PrDefault = fitted(def.step) pred = prediction(PrDefault,default) perf = performance(pred,"tpr","fpr") plot(perf,main="ROC Curve for Credit Card Default") performance(pred,”auc”) AUC = .9495 Example 8.2 - Classification of Real vs. Forged Swiss Francs names(Swiss) [1] "id" "leng" "left" "right" "bottom" "top" > Swiss = Swiss[,-1] > attach(Swiss) > pairs(Swiss[,-7],panel=function(x,y){ points(x[genu==1],y[genu==1],pch="+",col="blue") points(x[genu==0],y[genu==0],pch="o",col="red") }) "diagon" "genu" 268 > pairs.image(Swiss[,-7],cont=T) 269 > pairs.persp(Swiss[,-7]) Two predictor model with only linear terms > > > > rb.sim = glm(genu~right+bottom,data=Swiss,family="binomial") right.seq = seq(min(right),max(right),length=100) bottom.seq = seq(min(bottom),max(bottom),length=100) rb.grid = expand.grid(right=right.seq,bottom=bottom.seq) > > > > PrGenu = predict(rb.glm,newdata=rb.grid,"response") plot(right,bottom,xlab="Right (mm)",ylab="Bottom (mm)",type="n") points(right,bottom,col=as.numeric(genu)+3,pch=17+as.numeric(genu)) z = matrix(PrGenu,100,100) 270 > contour(right.seq,bottom.seq,z,add=T,levels=.5,lty=1,lwd=2) Two predictor model with non-linear terms > rb.glm = glm(genu~poly(right,2)+poly(bottom,2)+right:bottom,data=Swiss,family="binomial") > summary(rb.glm) Call: glm(formula = genu ~ poly(right, 2) + poly(bottom, 2) + right:bottom, family = "binomial", data = Swiss) Deviance Residuals: Min 1Q Median -2.1890 -0.0486 0.0024 3Q 0.1998 Max 2.7819 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1855.43 1370.41 -1.35 0.17576 poly(right, 2)1 -106.44 59.52 -1.79 0.07373 . poly(right, 2)2 9.81 5.06 1.94 0.05230 . poly(bottom, 2)1 -4110.61 2973.80 -1.38 0.16689 poly(bottom, 2)2 -43.92 13.15 -3.34 0.00084 *** right:bottom 1.51 1.12 1.35 0.17637 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 277.259 Residual deviance: 76.157 AIC: 88.16 on 199 on 194 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 8 > PrGenu = fitted(rb.glm,type="response") > table(PrGenu>.5,genu) genu 0 1 FALSE 90 8 TRUE 10 92 > PrGenu = predict(rb.glm,newdata=rb.grid,"response") > plot(right,bottom,xlab="Right (mm)",ylab="Bottom (mm)",type="n") 271 > points(right,bottom,col=as.numeric(genu)+3,pch=17+as.numeric(genu)) > z = matrix(PrGenu,100,100) > contour(right.seq,bottom.seq,z,add=T)  adds contours of P(genu=1) > plot(right,bottom,xlab="Right (mm)",ylab="Bottom (mm)",type="n") > points(right,bottom,col=as.numeric(genu)+3,pch=17+as.numeric(genu)) Add decision boundary for rule: P(genu=1)>.50  Real Swiss Franc > contour(right.seq,bottom.seq,z,add=T,levels=0.5,lty=1,lwd=2 Building a logistic model using all available bill dimensions > swiss.glm = glm(genu~.,data=Swiss,family="binomial") Warning messages: 1: glm.fit: algorithm did not converge 2: glm.fit: fitted probabilities numerically 0 or 1 occurred 272 Logistic regression will become unstable if the estimated probabilities are near 0 and/or 1. For these data, this is precisely what happens. Despite this instability, the model does nearly produce a perfect classification of the Swiss francs in the training data with an overall misclassification rate of .015 or 1.5%. > table(PrGenu>.5,genu) genu 0 1 FALSE 99 2 TRUE 1 98 In cases where this instability occurs, both ridge and Lasso logistic regression are good options. These are also good options when you have a “wide data” problem where n < p or when p is large and also when you have some highly correlated predictors. For logistic regression, the regularized logistic models using the ridge and Lasso are given below. Ridge Logistic Model: 𝑙𝑛 ( Lasso Logistic Model: 𝑙𝑛 ( 𝜃(𝒙) ) = 𝜂𝑜 + ∑𝑘𝑗=1 𝜂𝑗 𝑢𝑗 + 𝜆 ∑𝑘𝑗=1 𝜂𝑗2 1−𝜃(𝒙) 𝜃(𝒙) 1−𝜃(𝒙) ) = 𝜂𝑜 + ∑𝑘𝑗=1 𝜂𝑗 𝑢𝑗 + 𝜆 ∑𝑘𝑗=1|𝜂𝑗 | We now consider fitting both a ridge and Lasso logistic regression to the Swiss Franc data. X = model.matrix(genu~.,data=Swiss[,-1])[,-1] y = Swiss$genu forg.ridge = glmnet(X,y,alpha=0,family="binomial") forg.lasso = glmnet(X,y,alpha=1,family="binomial") ridge.cv = cv.glmnet(X,y,alpha=0,family="binomial") lasso.cv = cv.glmnet(X,y,alpha=1,family="binomial") plot(ridge.cv) ridge.lam = ridge.cv$lambda.min ridge.lam [1] 0.04476108 273 plot(lasso.cv) lasso.lam = lasso.cv$lambda.min lasso.lam [1] 0.001849533 ypred.ridge = predict(forg.ridge,newx=X,s=ridge.lam,type=”response”) ypred.lasso = predict(forg.lasso,newx=X,s=lasso.lam,type=”response”) table(ypred.ridge>.5,y) y 0 1 FALSE 100 1 TRUE 0 99 table(ypred.lasso>.5,y) y 0 1 FALSE 100 0 TRUE 0 100 274 Cross-validation of a Classification from GLM Models (non-regularized or regularized, i.e. ridge and Lasso) log.cv = function (fit, B=50,p = .67, pcut = 0.5) { cv <- rep(0, B) data = fit$data y = fit$y n = dim(data)[1] k = floor(n*p) for (i in 1:B) { sam <- sample(1:n,k,replace=F) fit2 <- glm(formula(fit),data = data[sam,],family = "binomial") phat <- predict(fit2, newdata = data[-sam,], type="response") predclass <- phat > pcut tab <- table(predclass, y[-sam]) mc <- (n-k) - sum(diag(tab)) cv[i] <- mc/(n-k) } cv } It should not be hard to modify this code to handle the ridge and Lasso glmnet() models as well. glmnetlog.cv = function (X,y,s=.10,alpha=0,B=50,p = .67, pcut = 0.5) { cv <- rep(0, B) n = length(y) k = floor(n*p) for (i in 1:B) { sam <- sample(1:n,k,replace=F) fit2 <- glmnet(X[sam,],y[sam],alpha=alpha,family = "binomial") phat <- predict(fit2, newx = X[-sam,], type="response",s=s) predclass <- phat > pcut tab <- table(predclass, y[-sam]) mc <- (n-k) - sum(diag(tab)) cv[i] <- mc/(n-k) } cv } Recall, alpha = 0  RIDGE LOGISTIC REGRESSION alpha = 1  LASSO LOGISTIC REGRESSION and s will be the value for  you found to be optimal from running the cv.glmnet() function. 275

Logistic Regression ~ Handout #1

Related documents

Products

Support

Logistic Regression ~ Handout #1

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib