Section 15

advertisement
Handout 15 – Introduction to Logistic Regression
This handout covers material found in Section 13.8 of your text.
You may also want to review regression techniques in Chapter 11.
These data are taken from the text “Applied Logistic Regression” by Hosmer and
Lemeshow. Researchers are interested in the relationship between age and presence or
absence of evidence of coronary heart disease (CHD).
The smooth is an estimate of:
E(CHD|Age) = P(CHD=1|Age) Why?
Expectation of a Bernoulli Random Variable
Let θ(Agei) denote the probability of having CHD
for a given Age i.
Note: that CHD i | Agei is a Bernoulli random
variable with the following probability
distribution:
CHDi|Agei
0
1
P(CHDi|Agei)
1 - θ(Agei)
θ(Agei)
We can find the expected value of the Bernoulli
random variable as follows:
1
How do we develop a parametric model for a dichotomous response like CHD using Age
of the person as the covariate? We might try a linear regression model with Age as a
predictor and CHD as the response. Before we do this in SAS, consider our linear
regression model:
CHD i | Agei  η0  η1 Agei  e i ,
 0 in the absenceof CHD,
1 in the presenceof CHD.
where CHD i | Agei  
Note that the mean function is given by E(CHD i | Agei )  η0  η1 Agei . As we saw above,
we can find the expected value of the Bernoulli random variable as follows:
E(CHD i | Agei )  0  1  θ(Age i )  1  θ(Age i )  θ(Age i ) .
Why is this important? This shows that…
(see previous page)
E(CHD i | Agei )  η0  η1 Agei  θ(Age i )  P(CHD i | Agei )
That is, the regression line gives an estimate of the probability of having CHD for a given
Age.
In SAS… (using file CHD.sas)
proc reg;
model CHD = age;
plot CHD*age; run;
2
Some Problems that Arise Using this Model
1. Non-normality of the error terms. Only two different error terms are
possible for each Agei: - θ(Agei) if the response is 0, and 1 - θ(Agei) if the response
is 1.
2. Non-constant variance of the error terms. Since CHD i | Agei is a Bernouilli
random variable, we know that Var( CHD i | Agei ) = θ(Agei) × [1- θ(Agei)]. This
then implies that,
Var( CHD i | Agei ) = [ η0  η1 Agei ]×[1- η0  η1 Agei  ]
That is, the variance function varies with Age and is NOT constant.
3. Constraints on the response function. A linear representation permits
estimates or predictions outside the range 0 to 1, which is not correct when
modeling probabilities. For example, what is our estimate of θ(Age=20) if we use
a linear regression model?
Comment: The constraint that the mean function fall between 0 and 1 frequently rules
out a linear response function. For our CHD example, the use of a linear response
function might require us to assume a probability of 0 for the mean response for all
individuals beneath a certain age and a probability of 1 for all individuals over a certain
age (see below). Such a model is often considered unreasonable, however.
Ideally, we’d like to find a model where the probabilities 0 and 1 are reached
asymptotically. One such model is the logistic regression model.
Recall:
3
The Simple Logistic Mean Function
We parameterize this model as follows:
E(y i | x i )  θ(x i ) 
exp(η0  η1 x i )
.
1  exp(η0  η1 x i )
Some examples of simple logistic mean functions are shown below:
With η0 = 0
With η1 = -1
Comments:
1. The logistic mean function is always between 0 and 1.
2. As η1 increases, the function becomes more S-shaped; therefore, the function
changes more rapidly in the center.
3. When η1 is positive, the function is monotone increasing; when η1 is negative, the
function is monotone decreasing.
4. Changing η0 shifts the function horizontally.
5. The logistic function possesses the property of symmetry. If the response
variable is recoded by changing 1s to 0s and 0s to 1s, the signs of all coefficients
will be reversed.
4
To fit the logistic regression model in SAS, you can use the following programming
statements:
ods html;
ods graphics on;
proc logistic descending;
model CHD = age / link=logit;
graphics estprob; run;
This curve is a plot of:
ˆ (CHD | Age ) 
θˆ (Age i )  P
i
ods graphics off;
ods html close;
ex p(η 0  η 1 Age i )
1  ex p(η 0  η 1 Age i )
Questions:
1. Based on the plot, find θˆ (40)  Pˆ (CHD | Age  40).
2. Based on the plot, find θˆ (60)  Pˆ (CHD | Age  60).
5
Interpreting the Model Parameters
Mean function: E(CHD | Agei )  θ(Age i ) 
exp(η0  η1 Agei )
.
1  exp(η0  η1 Agei )
Fitted Model Equation (or Fitted Probabilities):
ˆ (CHD | Age )  θˆ (Age ) 
E
i
i
ˆ0  η
ˆ 1 Agei )
exp(η
ˆ0  η
ˆ 1 Agei )
1  exp(η
Note that in the mean function, the probabilities θ(Agei) are nonlinear functions of η0
and η1. However, a simple transformation results in a linear model. That is, we can
show the following:
 θ(Age i ) 
ln 
  η 0  η1 Agei
 1  θ(Age i ) 
Proof of the previous claim:
6
Fitting the Model in JMP
Select Analyze > Fit Y by X and place CHD (y/n) in the Y box and age in the X box.
The resulting output is shown below. Because the response is a dichotomous categorical
variable logistic regression is performed.
The curve is a plot of:
exp(ˆo  ˆ1 Age)
Pˆ (CHD | Age) 
1  exp(ˆo  ˆ1 Age)
Example:
Pˆ (CHD | Age  40) 
Pˆ (CHD | Age  60) 
7
Interpretation of Model Parameters
eo 1 Age
P(CHD=1|Age)   ( x) 
~
1  eo 1 Age
Odds for Success
 ( x)
~
1   ( x)

~
Thus
  ( x) 
~
     Age
ln 
o
1
 1   ( x) 
~ 

Suppose we contrast individuals who are Age = x to those who are Age = x + c. What
can we say about the increased risk associated with a c year increase in age? The logistic
model gives us a means to do this through the odds ratio (OR).
  ( Age  x  c) 


1   ( Age  x  c) 

ln( OR associated with a c year increase in age )  ln


 ( Age  x)


 1   ( Age  x) 
  ( Age  x  c) 
  ( Age  x) 
  ln 
   o  1 ( Age  c)  ( o  1 Age)  c1
 ln 
 1   ( Age  x  c) 
 1   ( Age  x) 
Exponentiating both sides gives
Thus the multiplicative increase (or decrease if 1  0 ) in odds associated with a c year
increase in age is e
c1
.
8
Example: Interpreting a c year increase in age.
Question: Is it reasonable to assume that a c unit increase in a continuous predictor is
constant regardless of starting point? For example, does the risk associated with a 5 year
increase in age remain constant throughout one’s life?
9
Statistical Inference for the Logistic Regression Model
Given estimates for the model parameters and their estimated standard errors what types
of statistical inferences can be made? One approach is to use the normal-theory based
methods outlined below.
Hypothesis Testing
For testing:
H o :i  0
H a :i  0
Large sample test for significance of “slope” parameter ( i )
ˆi
z
 N (0,1)
SE (ˆi )
Confidence Intervals for Parameters and Corresponding OR’s
For dichotomous categorical predictors (i.e. 0/1 predictors)
100(1   )% CI for  i
ˆi  z1 / 2 SE (ˆi )
100(1   )% CI for OR Associated with  i
exp(ˆi  z1 / 2 SE (ˆi ))
If  i corresponds to a continuous predictor and we wish to examine the OR
associated with a c unit increase the CI for the OR becomes
exp( cˆi  z1 / 2 cSE (ˆi ))
Often times categorical predictors have more than two levels and we will see to handle
that case later in the notes.
Example:
What is the OR for CHD associated with a 10 year increase in age? Give a 95%
confidence interval based on this estimate.
10
Some Mathematical Details: Estimation of the Model Parameters
In a linear regression analysis, the regression coefficients are estimated based on the
least squares method. That is, the estimates are obtained by minimizing the sum of
the squared residuals. In a logistic regression analysis, the model parameters are
estimated through a process called the maximum likelihood method. The basic
principle of maximum likelihood is to choose as estimates those parameter values which,
if true, would maximize the probability of observing what we have actually observed.
This involves:
1. Finding an expression (i.e., the likelihood function) for the
probability of the data as a function of the unknown parameters.
For the logistic model, the binary response variable is assumed to follow a
binomial distribution with a single trial (n=1) and probability of “success” equal
to θ(xi). Therefore, for the ith observed pair (x i , y i ) , the contribution to the
likelihood is
yi
θ(x i ) (1  θ(x i ))
eo 1x i
1
where θ(x i ) 
and y i  
 o 1x i
1 e
0
1 y i
Then, since we assume independence across observations, the likelihood
function is given by
L   L o ,1    θ(x i ) yi (1  θ(x i ))1 yi
~
i 1
n
2. Finding the values of the unknown parameters which make the value
of this expression as large as possible.
For computational purposes it is usually easier to maximize the logarithm of the
likelihood function rather than the likelihood function itself. This works because
the logarithm is a monotonic increasing function; therefore, the maximizing
parameters are the same for the likelihood and log-likelihood functions. The loglikelihood function is given by
n
lnL(  o ,1 )   y i ln θ(x i )   (1  y i )ln 1  θ(x i ) 
i 1
To find the parameter estimates, we solve simultaneously the equations given by
setting the partial derivatives with respect to each parameter equal to 0:

 o
lnL(  o ,1 )  0

lnL(  o ,1 )  0
1
Several different nonlinear optimization routines are used to find solutions to
such systems. This process gets increasingly computationally intensive as the
number of terms in the model increases.
11
Example: Estimating Model Parameters with a Single Dichotomous Predictor
CHD and Indicator of Age Over 55
Computed using standard approach
Logistic Model
There are two different ways to code dichotomous variables (0,1) coding or (-1,+1), i.e.
contrast) coding. JMP uses contrast coding by default whereas other packages will
generally use the (0,1) coding as default. The two coding types are shown below.
1
Age 55+ = 
0
 1
Age 55+ = 
 1
Age > 55
Age < 55
Age > 55
Age < 55
For the purposes of discussion we will consider the (0,1) coding.
Recall
 (x )  P(CHD  1 | x) 
eo 1x
where x = Age 55+ indicator we have the following.
1  eo 1x
Age ≥ 55 (x = 1)
CHD = 1
CHD = 0
exp(η0  η1 )
θ(x i  1) 
1  exp(η0  η1 )
1- θ(x i  1) 
1
1  exp(η0  η1 )
Age < 55 (x = 0)
θ(x i  0) 
exp(η0 )
1  exp(η0 )
1- θ(x i  0) 
1
1  exp(η0 )
12
Estimating the model parameters “by hand”
 ( x  1) /(1   ( x  1) 
OR =
 ( x  0) /(1   ( x  0)
13
EXAMPLE: Using the data in the file CHD.sas we will create a dummy variable
indicating whether the subject is over age 55 or not. Then instead of examining the
relationship between the CONTINUOUS variable age and the presence or absence of
evidence of coronary heart disease (CHD), we could consider the dichotomous predictor:
0 if Age  55
Ov er55  
 1 if Age  55
data CHD;
input ageGrp$ age CHD;
if (age ge 55) then Over55=1; else Over55=0;
datalines;
1
20
0
.
.
.
.
.
.
proc sort data=CHD;
by descending CHD descending Over55;
run;
proc freq order=data;
tables CHD*Over55 / all;
run;
Standard OR output from SAS:
Using PROC LOGISTIC to fit the model:
proc logistic data=CHD descending;
model CHD = Over55 / link=logit;
output out=probs predicted=predicted_probabilities;
run;
14
Questions:
1. Use the model parameters to predict the probability of having CHD for a person
who is 55 or over and for a person who is younger than 55.
2. Given only the estimates of the model parameters, could you find the odds ratio
for having CHD associated with being 55 or over?
Verify these values from the SAS output:
proc print data=probs;
run;
15
Analysis in JMP (CHD55.JMP)
To fit a logistic regression model is best to use the Analyze > Fit Model option.
We place CHD (1 = Yes, 2 = No) in the Y box and Age > 55 (1 = Yes, 2 = No) in the
model effects box. The key is to have “Yes” for risk and disease alpha-numerically
before “No”, thus the use of 1 for “Yes” and 2 for “No”.
The summary of the fitted logistic model is shown below. Notice that the parameter
estimates are the not the same as those obtained from SAS. This is because JMP uses
contrast coding for the Age > 55 predictor (+1 = Age > 55 and -1 = Age < 55).
16
OR’s and Fitted Probabilities
Using JMP to Compute OR’s, CI’s, and Fitted Probabilities
Because we have the disease and the risk factor are alpha-numerically ordered the OR’s
are correct as given.
17
By selecting Save Probability Formula we can save the fitted probabilities to the
spreadsheet.
Example: CHD and Age Over 55 in R
> Over55
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[53] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Levels: 0 1
> chd
[1] 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 1 1
[53] 0 1 0 1 0 0 1 0 1 1 0 0 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0 1 1 1
Levels: 0 1
> table(chd,Over55)
Over55
chd 0 1
0 51 6
1 22 21
> chd55 = glm(chd~Over55,family=”binomial”)
> summary(chd55)
Call:
glm(formula = chd ~ Over55, family = "binomial")
Deviance Residuals:
Min
1Q Median
-1.734 -0.847 -0.847
3Q
0.709
Max
1.549
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.8408
0.2551 -3.296 0.00098 ***
Over55
2.0935
0.5285
3.961 7.46e-05 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 136.66 on 99 degrees of freedom
Residual deviance: 117.96 on 98 degrees of freedom
AIC= 121.96
Number of Fisher Scoring iterations: 4
18
How do we measure discrepancy between observed and fitted values?
In OLS regression with a continuous response we used
n
n
n
i 1
i 1
i 1
RSS   ( y i  yˆ i ) 2   ( y i  (ˆ T u i )) 2 =  ( y i  (ˆo  ˆ1u1i    ˆ k u ki )) 2
In logistic regression modeling we can use the deviance (typically denoted D or
G2) which is defined as
likelihood of saturated model
D =2 ln
likelihood of fitted model
n
 y 
 1  yi 

D  2 y i ln  i   (1  y i ) ln 




ˆ
ˆ

(
x
)
1


(
x
)
i 1
i 
i 


Because the likelihood function of the saturated model is equal to 1 when the
response (𝑦𝑖 ) is 0 or 1, the deviance reduces to:
D = -2 ln(likelihood of the fitted model)
= -2 ∑𝑛𝑖=1[𝑦𝑖 ln (𝜃̂(𝑥𝑖 )) + (1 − 𝑦𝑖 )ln(1 − 𝜃̂ (𝑥𝑖 ))]
The deviance can be used to compare two potential models where one model is
nested within the other by using the “General Chi-Square Test” for comparing
rival logistic regression models. We will see more applications of this in more
detail when we discuss multiple logistic regression and model development,
however we will demonstrate this process below when considering a single
predictor x.
The general nested model concept:
General Chi-Square Test
Consider the comparing two rival models where the alternative hypothesis model
 ( x)
T
H o : log(
)  1 x1
(reduced model OK)
1   ( x)
 ( x)
T
T
(full model needed)
H 1 : log(
)  1 x1   2 x 2
1   ( x)
19
General Chi-Square Statistic
 2 = (residual deviance of reduced model) – (residual deviance of full model)
=
D( for model without the terms in x2 )  D( for model with the terms in x2 ) ~  df
2
2
If the full model is needed  2 is BIG and the associated p-value = P(  df
  2 ) is
small.
Example: CHD and Age ~ a single numeric predictor
Ho :
H1 :
From JMP
From R
> summary(chd.glm)
Call:
glm(formula = chd ~ Age, family = "binomial")
Deviance Residuals:
Min
1Q
Median
-1.9718 -0.8456 -0.4576
3Q
0.8253
Max
2.2859
Coefficients:
Estimate Std. Error z value
(Intercept) -5.30945
1.13365 -4.683
Age
0.11092
0.02406
4.610
--Signif. codes: 0 '***' 0.001 '**' 0.01
Null deviance: 136.66
Residual deviance: 107.35
on 99
on 98
Pr(>|z|)
2.82e-06 ***
4.02e-06 ***
'*' 0.05 '.' 0.1 ' ' 1
degrees of freedom
degrees of freedom
20
Statistical Inference for the Logistic Regression Model (In SAS)
First, consider the following output from PROC LOGISTIC:
proc logistic descending;
model CHD = age / link=logit;
output out=get_values predicted=predicted_probabilities;
run;
All of these statistics are testing the same null hypothesis:
Ho: all explanatory variables in the model have coefficients of zero.
Ha: at least one explanatory variable in the model has a coefficient different from zero.

The Likelihood Ratio test compares the log-likelihood for the fitted model with
the likelihood for a model with NO explanatory variables. PROC LOGISTIC
reports
-2×log-likelihood for each of these models, and the chi-square test statistic is the
difference of these two numbers. Note that the df = 1 corresponds to the one
independent variable in the model.

The Score statistic is a function of the first and second derivatives of the loglikelihood function under the null hypothesis. There is some evidence that this
test does not perform as well as the likelihood ratio test for small samples.

The Wald statistic is an approximation that is more accurate with larger sample
sizes.
21
Hypothesis Testing For Individual Coefficients
H o :i  0
H a :i  0
When the sample size is large, the test for significance of the “slope” parameter (i ) can
be calculated as follows:
z
ˆi
=
SE(ˆi )
χ2= z2 =
Confidence Intervals for Coefficients and Corresponding Odds Ratios
A 100(1  α)% confidence interval for  i can be calculated as follows:
ˆi  z1α/2SE(ˆi )
A 100(1  α)% confidence interval for the odds ratio associated with  i is calculated as
follows:
exp(ˆi  z1α/2SE(ˆi ))
These intervals can be calculated in SAS PROC LOGISTIC as follows:
proc logistic descending;
model CHD = age / link=logit clparm=wald;
run;
22
If β i corresponds to a continuous predictor and we wish to examine the odds ratio
associated with a c unit increase, the confidence interval for the odds ratio becomes
exp(c ˆi  z1α/2 c  SE(ˆi ))
Example: Find the odds ratio for CHD associated with a 10 year increase in age, and give
a 95% confidence interval based on this estimate.
proc logistic descending;
model CHD = age / link=logit clparm=pl clodds=pl;
units age=10;
run;
The preceding intervals are all known as Wald intervals (based on normal-theory methods).
These may not be appropriate for small samples; therefore, you may want to consider another
method called the Profile Likelihood method. This involves an iterative evaluation of the
likelihood function and produces intervals that may not be symmetric around the estimate.
proc logistic descending;
model CHD = age / link=logit clparm=pl;
run;
Questions:
1. How do these compare to the Wald confidence intervals?
2. How would you find the profile likelihood confidence interval for the odds ratio?
23
Statistics Measuring Predictive Power
Once again, consider the model using the continuous variable age to predict CHD:
proc logistic descending;
model CHD = age / link=logit;
output out=get_values predicted=predicted_probabilities;
run;
Recall that the p-values shown above are used to test the usefulness of the logistic
regression model. We can also consider a few other statistics to investigate the model’s
predictive power:

Generalized R2
 Likelihood Ratio Chi - Square 
=
n


This is calculated as follows: 1  exp
You can also request this quantity from SAS:
proc logistic descending;
model CHD = age / link=logit rsq;
run;
Note that the upper-bound of the generalized R2 is less than 1. Therefore, PROC
LOGISTIC also reports a quantity labeled the “Max-rescaled R-Square,” which
divides the original generalized R2 by its upper bound.

Ordinal Measures of Association
SAS PROC LOGISTIC also reports the following statistics:
24
The idea behind these statistics is as follows. For the 100 observations in the data set,
there exist 100×(99)/2 = 4,950 different ways to pair them up (without pairing an
observation with itself). Of these pairs, 2,499 have either both 1s or both 0s for an
observed response. These are ignored, leaving 2,451 pairs in which one case has a 0 and
the other case has a 1. For these pairs, SAS determines whether the observation with a 1
has a higher predicted value (based on the model) than does the observation with a 0. If
this is the case, the pair is called concordant. If not, the pair is discordant.
Let C = the number of concordant pairs =
D = the number of discordant pairs =
T = the number of ties =
N = the total number of pairs (before eliminating any) =
The four measures of association are given as
1. Somer’s D =
2. Gamma =
3. Tau-a =
CD
CDT
CD
CD
CD
N
4. C = .5×(1 + Somer’s D)
All four measures vary between 0 and 1, with large values corresponding to stronger
associations between the predicted and observed values. Finally, note that the measure
known as C has another familiar interpretation. Consider the following programming
statements.
ods html;
ods graphics on;
proc logistic data=CHD descending;
model CHD = age / link=logit outroc=roc_data;
run;
ods graphics off;
ods html close;
proc print data=roc_data; run;
25
These statements request the following output.
.
.
The ROC curve is obtained by changing the classification rule based on the estimated
probability. Note that the area under the ROC curve is the same as C.
26
More Analysis in JMP – Logistic Regression with a Single Numeric Predictor
OPTIONS FOR LOGISTIC REGRESSION
Likelihood Ratio Tests – same as in SAS
Wald Tests – normal-theory based
Confidence Intervals – gives CI’s for population parameters in the model.
Odds Ratios –Gives odds ratio associated with a unit increase in x, i.e. c = 1 and the
odds ratio associated with being at the maximum of x vs. the minimum of x.
ROC Curve – if we use
ˆ( x) = P̂ (CHD| x ) to construct a rule for classifying a
~
~
patient as having CHD vs. No CHD this option gives the ROC curve coming from all
possible cutpoints based on this estimated probability.
Estimated Odds Ratios
ROC Curve and Table
27
By changing the classification rule based on estimated probability we can obtain an ROC curve.
28
Analysis in R – Logistic Regression with Single Numeric Predictor
> CHD <- read.table(file.choose(),header=T)
> CHD
agegrp age chd
1
1 20
0
2
1 23
0
3
1 24
0
4
1 25
0
5
1 25
1
.
.
.
.
.
.
.
.
.
.
.
.
96
8 63
1
97
8 64
0
98
8 64
1
99
8 65
1
100
8 69
1
> names(CHD)
[1] "agegrp" "age"
"chd"
Make sure that you specify family=”binomial” or
R will perform ordinary least squares
> attach(CHD)
> chd <- factor(chd)
> chd.glm <- glm(chd~age,family="binomial")
> summary(chd.glm)
Call:
glm(formula = chd ~ age, family = "binomial")
Deviance Residuals:
Min
1Q
Median
-1.9718 -0.8456 -0.4576
3Q
0.8253
Max
2.2859
Coefficients:
Estimate Std. Error z value
(Intercept) -5.30945
1.13263 -4.688
age
0.11092
0.02404
4.614
--Signif. codes: 0 `***' 0.001 `**' 0.01
Pr(>|z|)
2.76e-06 ***
3.95e-06 ***
`*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 136.66
Residual deviance: 107.35
AIC: 111.35
on 99
on 98
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 3
> probCHD <- exp(-5.30945 + .11092*age)/(1+exp(-5.30945 + .11092*age))
29
> plot(age,probCHD,type="b",ylab="P(CHD|Age)",xlab="Age")
P(CHD | Age) 
eo 1 Age
1  eo 1 Age
ˆo  5.310
ˆ1  .11092
An easier way obtain the estimated probabilities is to extract them from the model
object.
> probCHD <- fitted(chd.glm)
> plot(Age,probCHD,type=”b”,ylab=”P(CHD|Age)”) # This produces plot above
We can obtain the estimated logit ( Lˆi  ˆo  ˆ1 Age ) by using the predicted
command.
> chd.logit = predict(chd.glm)
> plot(Age,chd.logit,type="b",ylab="L = no + n1*Age")
> title(main="Plot of Estimated Logit vs. Age")
30
Multiple Logistic Regression
The multiple logistic mean function has the basic form,
𝑙𝑛 (
̃)
𝜃(𝒙
̃)
1−𝜃(𝒙
)= 𝜂0 + 𝜂1 𝑢1 + 𝜂2 𝑢2 + ⋯ + 𝜂𝑘−1 𝑢𝑘−1
where the
𝑢𝑖 = 𝑎𝑟𝑒 𝒕𝒆𝒓𝒎𝒔 𝑏𝑎𝑠𝑒𝑑 𝑜𝑛 𝑡ℎ𝑒 𝑥𝑗 ′𝑠
What are terms?
31
Terms (cont’d)
32
EXAMPLE 1: The data in the file OC_Use.sas are from a case-control study
comparing the use of oral contraceptives and the occurrence of myocardial infarctions.
Subjects were also classified into one of five age groups.
data OCUse;
input AgeGrp$ Status$ OCuse$ count;
datalines;
1
Case
Yes 4
1
Case
No
2
1
Control
Yes 62
1
Control
No
224
2
Case
Yes 9
2
Case
No
12
2
Control
Yes 33
2
Control
No
390
3
Case
Yes 4
3
Case
No
33
3
Control
Yes 26
3
Control
No
330
4
Case
Yes 6
4
Case
No
65
4
Control
Yes 9
4
Control
No
362
5
Case
Yes 6
5
Case
No
93
5
Control
Yes 5
5
Control
No
301
;
In Handout 14, we discussed using the Cochran-Mantel-Haenszel test for controlling for
a single categorical covariate while assessing the association between two other
variables. We could apply this test to these data in order to “adjust” for age group when
examining the relationship between oral contraceptive use and disease status:
proc freq order=data;
tables AgeGrp*status*OCuse / cmh all;
weight count;
run;
Question: What do you conclude from this test?
33
The “/ all” option gives us the odds ratio for having myocardial infarction associated
with oral contraceptive use for each age group:
Age
Grou
p
Odds Ratio
1
2
3
4
5
Recall that we can test for a difference in these odds ratios:
34
Finally, the CMH methods also provide us with a common odds ratio:
You can think of this as an estimate of the odds ratio for having myocardial infarction
associated with oral contraceptive use after controlling for age group.
Fitting Multiple Logistic Regression Models in SAS
Logistic regression methods also provide us with a method for controlling for
confounding variables. Note that we can use a multiple logistic regression model to
predict the probability of myocardial infarction based on oral contraceptive use.
Moreover, we can add age group to the model in order to adjust for age.
First, we must make our binary response variable numeric in order to use PROC
LOGISTIC:
data OCuse2;
set OCuse;
if Status="Case" then MI = 1; else MI = 0;
run;
We can leave the two predictor variables (OC use and Age group) in categorical format;
however, we must place these variable names in the ‘class’ statement in PROC
LOGISTIC:
proc logistic descending;
class OCUse agegrp;
model MI = OCUse agegrp;
weight count;
run;
35
Questions:
1. Is the logistic regression model useful? Explain.
2. Why does SAS report four coefficients for Age Group?
The Multiple Logistic Mean Function
The multiple logistic regression model for this example is parameterized as follows:
exp(η0  η1 OCuse  η2 Age1  η3 Age2  η4 Age3  η5 Age4)
.
E(MI | ~
x  OCuse, AgeGrp)  θ(~
x) 
1  exp(η0  η1 OCuse  η2 Age1  η3 Age2  η4 Age3  η5 Age4)
x) 
 θ(~
  η 0  η1 OCuse  η 2 Age1  η 3 Age2  η 4 Age3  η 5 Age4 .
~
1
θ(
x) 

Also, recall that ln
Note that since Age Group has five levels, its definition requires four dummy (or
indicator) variables. SAS lists the value of these dummy variables in the PROC
LOGISTIC output:
36
This method is known as “effects coding” (the reference group is identified by -1):
 1 if Age group1

Age 1  - 1 if Age group 5
 0 otherwise

 1 if Age group2

Age 2  - 1 if Age group 5
 0 otherwise

 1 if Age group 3

Age 3  - 1 if Age group 5
 0 otherwise

 1 if Age group 4

Age 4  - 1 if Age group 5
 0 otherwise

1 if non - user
OCuse  
 - 1 if user
PROC LOGISTIC reports the following parameter estimates and odds ratios:
Questions:
1. How does SAS calculate the odds ratio for MI associated with not using oral
contraceptives for those in Age group 1?
x) 
 θ(~
ln
~   η 0  η1 OCuse  η 2 Age1  η 3 Age2  η 4 Age3  η 5 Age4
 1 - θ( x) 
37
2. How does SAS calculate the odds ratio for MI associated with not using oral
contraceptives for those in Age group 2?
3. How does SAS calculate the odds ratio for MI associated with being in Age group
1 versus Age group 5, adjusted for oral contraceptive use?
Reordering the Factors
To examine the effects of OC use and Age Group, we may want to “reorder” the levels of
both variables. That is, we may want to use the non-OC users as the reference group.
Also, we may want to use the youngest age group as our baseline.
PROC LOGISTIC allows you to specify a reference group in the class statement:
proc logistic descending;
class OCUse(param=ref ref='No') agegrp(param=ref ref='1');
model MI = OCUse agegrp;
weight count;
run;
38
Note that the indicator variables are now defined using a method known as “dummy
coding”:
1 if Age group 2
1 if Age group 3
1 if Age group 4
1 if Age group 5
Age 2  
Age 3  
Age 4  
Age 5  
 0 otherwise
 0 otherwise
 0 otherwise
 0 otherwise
0 if non - user
OCuse  
 1 if user
Questions:
1. Suppose we want to find the odds ratio associated with being in Age group 5
when compared to being in Age group 1 after adjusting for oral contraceptive use.
x) 
 θ(~
ln
~   η 0  η 1 OCuse  η 2 Age2  η 3 Age3 η 4 Age4 η 5 Age5
 1 - θ( x) 
2. Find the odds ratio associated with oral contraceptive use ADJUSTED for age.
How does this compare to the CMH estimate?
3. Find a 95% confidence interval for this odds ratio.
39
Note that PROC LOGISTIC returns these odds ratios and their confidence intervals:
4. Interpret the age effect in terms of odds ratios after adjusting for OC use.
5. How would you compare OC users in Age group 5 to non-OC users in Age group
1?
6. How would you compare OC users in Age group 4 to non-OC users in Age group
3?
40
Example 1 in JMP
To fit a logistic regression model using OC use and Age group as covariates in JMP
select Analyze > Fit Model and place both Age and OC use in the Construct Model
Effects box and Case-Control status as the response as shown below:
Then click Run to fit the model for these data. The resulting output is shown below.
41
We can see the odds for MI will be given
which is the response category of interest here.
Had it read No MI/MI we would want to
recode the response so MI was the response
value of interest.
OC Use = No is the category of interest and
OC Use = Yes is being used as the reference
group, i.e. the denominator odds in the odds
ratio. This is not what we want. We want to
find the OR for having an MI associate with
OC Use = Yes, i.e the risk associated with oral
contraceptive use. We can achieve this by
recoding OC Use so OC Use = Yes is the
category of interest using the Value Ordering
option in the Column Info… This process is
shown below.
Highlight Yes and click Move
Up so Yes is at the top of the
list. This will make OC Use =
Yes the response value of
interest in terms of computing
the odds ratio for MI associated
with OC Use.
42
Repeating the model fit above we obtain the results shown below. I have selected the
Wald Tests and the Odds Ratio options from the Nominal Logistic Fit pull-down menu.
Interpretation of the results from JMP:
43
EXAMPLE 2: Consider the data found in the file Lowbirth.JMP. These data are
from a study to identify potential risk factors for low birth weight. A random sample of
new mothers was taken and the following variables were recorded:








Low = birth weight less than 2500 grams (Y or N)
Prev = previous history of premature labor (History or None)
Hyper = hypertension during pregnancy (HT or Normal)
Smoke = smoked during pregnancy (Cig or No Cig)
Uterine = uterine irritability during pregnancy (Irritation or None)
Minority = minority status of mother (White or Nonwhite)
Age = mother’s age in years (yrs.)
Lwt = mother’s weight at last menstrual cycle (lbs.)
Let’s begin by fitting a model with all predictors in JMP using Analyze > Fit Model.
Questions:
1. Is the overall model useful? Explain.
2. Are all predictors significant in the model? Explain.
44
Comparing Models with the Likelihood Ratio Test
We can fit the reduced model eliminating those terms that are not significant and then
test whether the reduced model is adequate.
 θ(~
x) 
Ho: ln 
~   η 0  η 2 Lwt  η 3 Minority  η 4Smoke  η 5 Prev  η 6 Hyper
 1 - θ(x) 
Ha:
 θ(~
x) 
ln 
~   η 0  η1 Age  η 2 Lwt  η 3 Minority  η 4Smoke  η 5 Prev  η 6 Hyper  η 7 Uterine
 1 - θ(x) 
The test statistic is given by
χ2 = (Residual Deviance of reduced model) - (Residual Deviance of full model)
Note: Residual Deviance = -2×log-likelihood
Under the null hypothesis, this test statistic follows the chi-square distribution with
degrees of freedom equal to the change in degrees of freedom between the two
competing models.
Fitting the Null Hypothesis Model: (Age and Uterine dropped from the model)
Fitting the Alternative Hypothesis Model:
Carrying out the test:

Residual Deviance for Null Hypothesis Model:

Residual Deviance for Alternative Hypothesis Model:
45

Test Statistic, χ2 =

df =

To find the p-value use R:
> 1 – pchisq(3.60666,df=2)
[1] 0.1647543
 Conclusion:
Interpretation of Model Parameters for Reduced Model
Questions:
1. Find and interpret the odds ratio for low birth weight associated with being a
minority.
2. Find and interpret the odds ratio for low birth weight associated with being a
smoker.
46
3. Find and interpret the odds ratio for low birth weight associated with having
hypertension.
4. Find and interpret the odds ratio for low birth weight associated with having a
history of preterm labor.
5. Find and interpret the odds ratio for low birth weight associated with a 10 pound
increase in pre-pregnancy weight.
47
Logistic Regression Diagnostics: Residuals and Influence Statistics
As in the case of ordinary least squares (OLS) regression, we need to be wary of cases
that are poorly fit and those that may have excessive influence on our results.
Residuals
Pearson and Deviance residuals are useful in identifying observations that are not
explained well by the model. Pearson residuals are components of the Pearson chisquare statistic and deviance residuals are components of the deviance.

Pearson Residual: The Pearson residual for the ith observation is defined by
eˆ χ i 
y i  yˆ i
n i θˆ (~
x i )(1  θˆ (~
x i ))
Note that the Pearson’s chi-square statistic is the sum of the squared chiresiduals.

Deviance Residual: The deviance residual for the ith observation is defined by

 y



i
  2(n  y )ln n i  y i

D i  sgn(y i  θˆ (x i ))  2y i ln
i
i
 n θˆ (x ) 
 n (1  θˆ (x )) 

i 
i
 i
 i


1
2
Note that the deviance is the sum of squares of the deviance residuals.
Influence Statistics
These measures can be used to identify cases that are highly influential on the logistic
regression estimates.

DFBETAS: For each parameter estimate, a DFBETAS diagnostic is calculated for
each observation. This is the standardized difference in the parameter estimate
due to deleting the observation, and it can be used to assess the effect of an
individual observation on each estimated parameter of the fitted model. These
measures are useful for detecting observations that are causing instability in the
selected coefficients.

C and CBAR: These diagnostics provide scalar measures of the influence of
individual observations on the regression estimates. They are based on the same
idea as the Cook distance in linear regression theory.

DIFDEV and DIFCHISQ: These are diagnostics for detecting ill-fitted
observations; in other words, observations that contribute heavily to the
disagreement between the data and the predicted values of the fitted model.
DIFDEV is the change in the deviance due to deleting an individual observation
48
while DIFCHISQ is the change in the Pearson chi-square statistic for the same
deletion.
In cases of both poor fit and high influence, it is good to look at the covariate values for
these individuals to address the role they play in the analysis. In many cases there will
be several individuals with the same covariate pattern, especially if most or all of the
predictors are categorical in nature.
To obtain these measures from SAS PROC LOGISTIC, use the following code:
ods html;
ods graphics on;
*Reduced model;
proc logistic data=LowBirthWeight descending;
class MINORITY(param=ref ref='0') SMOKE(param=ref ref='0')
PTL(param=ref ref='0') HTN(param=ref ref='0');
model LOW = WEIGHT MINORITY SMOKE PTL HTN / link=logit influence;
run;
ods graphics off;
ods html close;
SAS returns the following plots:
49
50
51
52
53
54
55
Logistic Regression in R
In this section of the notes we examine logistic regression in R. There are several
functions that I wrote for plotting diagnostics similar to what SAS does, although the
inspiration for them came from work Prof. Malone and I did for OLS as part of his senior
project.
Example 1: Oral Contraceptive Use and Myocardial Infarctions
Set up a text file with the data in columns with variable names at the top. The case and control
counts are in separate columns. The risk factor OC use and stratification variable Age follow.
> OCMI.data = read.table(file.choose(),header=T)
# read in text file
> OCMI.data
MI NoMI Age OCuse
1
4
62
1
Yes
2
2 224
1
No
3
9
33
2
Yes
4 12 390
2
No
5
4
26
3
Yes
6 33 330
3
No
7
6
9
4
Yes
8 65 362
4
No
9
6
5
5
Yes
10 93 301
5
No
> attach(OCMI.data)
> OC.glm <- glm(cbind(MI,NoMI)~Age+OCuse,family=binomial)
# fit model
> summary(OC.glm)
Call:
glm(formula = cbind(MI, NoMI) ~ Age + OCuse, family = binomial)
Deviance Residuals:
[1]
0.456248 -0.520517
[9] -0.045061
0.008822
1.377693
-0.886710
-1.685521
Coefficients:
Estimate Std. Error z value
(Intercept) -4.3698
0.4347 -10.054
Age2
1.1384
0.4768
2.388
Age3
1.9344
0.4582
4.221
Age4
2.6481
0.4496
5.889
Age5
3.1943
0.4474
7.140
OCuseYes
1.3852
0.2505
5.530
--Signif. codes: 0 `***' 0.001 `**' 0.01
0.714695
Pr(>|z|)
< 2e-16
0.0170
2.43e-05
3.88e-09
9.36e-13
3.19e-08
-0.130922
0.033643
***
*
***
***
***
***
`*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 158.0085
Residual deviance:
6.5355
AIC: 58.825
on 9
on 4
degrees of freedom
degrees of freedom
56
Number of Fisher Scoring iterations: 3
Find OR associated with oral contraceptive use ADJUSTED for age.
Recall: CMH procedure gave 3.97.
> exp(1.3852)
[1] 3.995625
Find a 95% CI for OR associated with OC use.
> exp(1.3852-1.96*.2505)
[1] 2.445428
> exp(1.3852+1.96*.2505)
[1] 6.528518
Interpreting the age effect in terms of OR’s ADJUSTING for OC use.
Note: The reference group is Age = 1 which was women 25 – 29 years of age.
> OC.glm$coefficients
(Intercept)
Age2
-4.369850
1.138363
Age3
1.934401
Age4
2.648059
Age5
3.194292
OCuseYes
1.385176
> Age.coefs <- OC.glm$coefficients[2:5]
> exp(Age.coefs)
Age2
Age3
Age4
Age5
3.121653 6.919896 14.126585 24.392906
Find 95% CI for age = 5 group.
> exp(3.1943-1.96*.4474)
[1] 10.14921
> exp(3.1943+1.96*.4474)
[1] 58.62751
Example 2: Coffee Drinking and Myocardial Infarctions
CoffeeMI.data = read.table(file.choose(),header=T)
> CoffeeMI.data
Smoking Coffee MI NoMI
1
Never
> 5 7
31
2
Never
< 5 55 269
3
Former
> 5 7
18
4
Former
< 5 20 112
5
1-14 Cigs
> 5 7
24
6
1-14 Cigs
< 5 33 114
7 15-25 Cigs
> 5 40
45
8 15-25 Cigs
< 5 88 172
9 25-34 Cigs
> 5 34
24
10 25-34 Cigs
< 5 50
55
11 35-44 Cigs
> 5 27
24
12 35-44 Cigs
< 5 55
58
13
45+ Cigs
> 5 30
17
14
45+ Cigs
< 5 34
17
> attach(CoffeeMI.data)
> Coffee.glm = glm(cbind(MI,NoMI)~Smoking+Coffee,family=binomial)
57
> summary(Coffee.glm)
Call:
glm(formula = cbind(MI, NoMI) ~ Smoking + Coffee, family = binomial)
Deviance Residuals:
Min
1Q
Median
-0.7650 -0.4510 -0.0232
3Q
0.2999
Max
0.7917
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)
-1.2981
0.1819 -7.136 9.60e-13 ***
Smoking15-25 Cigs
0.6892
0.2119
3.253 0.00114 **
Smoking25-34 Cigs
1.2462
0.2398
5.197 2.02e-07 ***
Smoking35-44 Cigs
1.1988
0.2389
5.017 5.24e-07 ***
Smoking45+ Cigs
1.7811
0.2808
6.342 2.27e-10 ***
SmokingFormer
-0.3291
0.2778 -1.185 0.23616
SmokingNever
-0.3153
0.2279 -1.384 0.16646
Coffee> 5
0.3200
0.1377
2.324 0.02012 *
--Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 173.7899
Residual deviance:
3.7622
AIC: 84.311
on 13
on 6
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 3
OR for drinking 5 or more cups of coffee per day.
Note: CMH procedure gave OR = 1.375
> exp(.3200)
[1] 1.377128
95% CI for OR associated with heavy coffee drinking
> exp(.3200 - 1.96*.1377)
[1] 1.051385
> exp(.3200 + 1.96*.1377)
[1] 1.803794
Reordering a Factor
To examine the effect of smoking we might want to “reorder” the levels of smoking
status so that individuals who have never smoked are used as the reference group. To do
this in R you must do the following:
Smoking = factor(Smoking,levels=c("Never","Former","1-14 Cigs","15-25
Cigs","25-34 Cigs","35-44 Cigs","45+ Cigs"))
The first level specified in the levels subcommand will be used as the reference group,
“Never” in this case. Refitting the model with the reordered smoking status factor gives
the following:
58
> Coffee.glm2 <-glm(cbind(MI,NoMI)~Smoking+Coffee,family=binomial)
> summary(Coffee.glm2)
Call:
glm(formula = cbind(MI, NoMI) ~ Smoking + Coffee, family = binomial)
Deviance Residuals:
Min
1Q
Median
3Q
Max
-0.7650 -0.4510 -0.0232
0.2999
0.7917
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)
-1.61344
0.14068 -11.469 < 2e-16 ***
SmokingFormer
-0.01376
0.25376 -0.054
0.9568
Smoking1-14 Cigs
0.31533
0.22789
1.384
0.1665
Smoking15-25 Cigs 1.00451
0.17976
5.588 2.30e-08 ***
Smoking25-34 Cigs 1.56150
0.21254
7.347 2.03e-13 ***
Smoking35-44 Cigs 1.51417
0.21132
7.165 7.77e-13 ***
Smoking45+ Cigs
2.09646
0.25855
8.108 5.13e-16 ***
Coffee> 5
0.31995
0.13766
2.324
0.0201 *
--Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 173.7899
Residual deviance:
3.7622
AIC: 84.311
on 13
on 6
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 3
Notice that “SmokingNever” is now absent from the output so we know it is being used
as the reference group. The OR’s associated with the various levels of smoking are
computed below.
> Smoke.coefs = Coffee.glm$coefficients[2:7]
> exp(Smoke.coefs)
SmokingFormer Smoking1-14 Cigs Smoking15-25 Cigs Smoking25-34 Cigs
0.986338
1.370715
2.730561
4.765984
Smoking35-44 Cigs
Smoking45+ Cigs
4.545632
8.137279
Confidence intervals for each could be computed in the standard way.
59
Some Details for Categorical Predictors with More Than Two Levels
Consider the coffee drinking/MI study above. The stratification variable smoking has
seven levels. Thus it requires six dummy variables to define it. The level that is not
defined using a dichotomous dummy variable serves as the reference group. The table
below shows how the value of the dummy variables:
Level
D2
D3
D4
D5
D6
D7
Never
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
1
(Reference
Group)
Former
1 – 14 Cigs
15 – 24 Cigs
25 – 34 Cigs
35 – 44 Cigs
45+ Cigs
Example: Coffee Drinking and Myocardial Infarctions
CoffeeMI.data = read.table(file.choose(),header=T)
> CoffeeMI.data
Smoking Coffee MI NoMI
1
Never
> 5 7
31
2
Never
< 5 55 269
3
Former
> 5 7
18
4
Former
< 5 20 112
5
1-14 Cigs
> 5 7
24
6
1-14 Cigs
< 5 33 114
7 15-25 Cigs
> 5 40
45
8 15-25 Cigs
< 5 88 172
9 25-34 Cigs
> 5 34
24
10 25-34 Cigs
< 5 50
55
11 35-44 Cigs
> 5 27
24
12 35-44 Cigs
< 5 55
58
13
45+ Cigs
> 5 30
17
14
45+ Cigs
< 5 34
17
The Logistic Model
  ( x) 
     Coffee   D   D   D   D   D   D
~
ln 
o
1
2 2
3 3
4 4
5 5
6 6
7 7
 1   ( x) 
~


where Coffee is a dichotomous predictor equal to 1 if they drink 5 or more cups of coffee
per day.
Comparing the log-odds of a heavy coffee drinker who who smokes 15-25 cigarettes day
to a heavy coffee drinker who has never smoked we have.
60
 1 ( x) 
~
    
ln 
o
1
4
 1  1 ( x) 
~ 

  2 ( x) 
~
   
ln 
o
1
 1   2 ( x) 
~ 

Taking the difference gives,
 1 ( x) 
~


 1  1 ( x) 
~ 
ln 
 4
  2 ( x~ ) 


 1   2 ( x) 
~ 

thus
e 4  the odds ratio associated with smoking 15-24 cigarettes per day when compared to
individuals who have never smoked amongst heavy coffee drinkers. Because 1 is not
involved in the odds ratio the result is the same for non-heavy coffee drinkers as well!
You can also consider combinations of factors, e.g. if we compared heavy coffee drinkers
who smoked 15-24 cigarettes to a non-heavy coffee drinkers who have never smoked the
associated OR would be given by e1  4 .
Using our fitted model the OR’s ratios discussed above would be.
> summary(Coffee.glm)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)
-1.61344
0.14068 -11.469 < 2e-16 ***
SmokingFormer
-0.01376
0.25376 -0.054
0.9568
Smoking1-14 Cigs
0.31533
0.22789
1.384
0.1665
Smoking15-25 Cigs 1.00451
0.17976
5.588 2.30e-08 ***
Smoking25-34 Cigs 1.56150
0.21254
7.347 2.03e-13 ***
Smoking35-44 Cigs 1.51417
0.21132
7.165 7.77e-13 ***
Smoking45+ Cigs
2.09646
0.25855
8.108 5.13e-16 ***
Coffee> 5
0.31995
0.13766
2.324
0.0201 *
--Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
OR for 15-24 cigarette smokers vs. never smokers (regardless of coffee drinking status)
> exp(1.00451)
[1] 2.730569
61
OR for 15-24 cigarette smokers who are also heavy coffee drinkers vs. non-smokers who
are not heavy coffee drinkers
> exp(.31995 + 1.00451)
[1] 3.760154
Similar calculations could be done for other combinations of coffee and cigarette use.
Example 3: Risk Factors for Low Birth Weight
Response
Low = low birth weight, i.e. birth weight < 2500 grams(1 = yes, 0 = no)
Set of potential predictors







Prev = previous history of premature labor (1 = yes, 0 = no)
Hyper = hypertension during pregnancy (1 = yes, 0 = no)
Smoke = smoker (1 = yes, 0 = no)
Uterine = uterine irritability (1 = yes, 0 = no)
Minority = minority (1 = yes, 0 = no)
Age = mother’s age in years
Lwt = mother’s weight at last menstrual cycle
Analysis in R
> Lowbirth = read.table(file.choose(),header=T)
> Lowbirth[1:5,]
# print first 5 rows of the data set
Low Prev Hyper Smoke Uterine Minority Age Lwt race bwt
1
0
0
0
0
1
1 19 182
2 2523
2
0
0
0
0
0
1 33 155
3 2551
3
0
0
0
1
0
0 20 105
1 2557
4
0
0
0
1
1
0 21 108
1 2594
5
0
0
0
1
1
0 18 107
1 2600
Make sure categorical variables are interpreted as factors by using the factor command
>
>
>
>
>
>
Low = factor(Low)
Prev = factor(Prev)
Hyper = factor(Hyper)
Smoke = factor(Smoke)
Uterine = factor(Uterine)
Minority = factor(Minority)
Note: This is not really necessary for dichotomous variables that are coded (0,1).
Fit a preliminary model using all available covariates
> low.glm = glm(Low~Prev+Hyper+Smoke+Uterine+Minority+Age+Lwt,family=binomial)
> summary(low.glm)
Call:
glm(formula = Low ~ Prev + Hyper + Smoke + Uterine + Minority +
Age + Lwt, family = binomial)
Deviance Residuals:
Min
1Q
Median
-1.6010 -0.8149 -0.5128
3Q
1.0188
Max
2.1977
62
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.378479
1.170627
0.323 0.74646
Prev1
1.196011
0.461534
2.591 0.00956 **
Hyper1
1.452236
0.652085
2.227 0.02594 *
Smoke1
0.959406
0.405302
2.367 0.01793 *
Uterine1
0.647498
0.466468
1.388 0.16511
Minority1
0.990929
0.404969
2.447 0.01441 *
Age
-0.043221
0.037493 -1.153 0.24900
Lwt
-0.012047
0.006422 -1.876 0.06066 .
--Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
Null deviance: 232.40
Residual deviance: 196.71
AIC: 212.71
on 185
on 178
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 3
It appears that both uterine irritability and mother’s age are not significant. We can fit
the reduced model eliminating both terms and test whether the model is significantly
degraded by using the general chi-square test (see the JMP example above).
> low.reduced = glm(Low~Prev+Hyper+Smoke+Minority+Lwt,family=binomial)
> summary(low.reduced)
Call:
glm(formula = Low ~ Prev + Hyper + Smoke + Minority + Lwt, family =
binomial)
Deviance Residuals:
Min
1Q
Median
-1.7277 -0.8219 -0.5368
3Q
0.9867
Max
2.1517
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.261274
0.885803 -0.295 0.76803
Prev1
1.181940
0.444254
2.661 0.00780 **
Hyper1
1.397219
0.656271
2.129 0.03325 *
Smoke1
0.981849
0.398300
2.465 0.01370 *
Minority1
1.044804
0.394956
2.645 0.00816 **
Lwt
-0.014127
0.006387 -2.212 0.02697 *
--Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 232.40
Residual deviance: 200.32
AIC: 212.32
on 185
on 180
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 3
63
Reduced Model:
 θ(~
x) 
Ho: ln 
~   η 0  η 2 Lwt  η 3 Minority  η 4Smoke  η 5 Prev  η 6 Hyper
 1 - θ(x) 
Full Model:
 θ(~
x) 
ln 
~   η 0  η1 Age  η 2 Lwt  η 3 Minority  η 4Smoke  η 5 Prev  η 6 Hyper  η 7 Uterine
 1 - θ(x) 
* Recall:  ( x)  P( Low  1 | X )
~
~
DH o  200.32
df = 180
Residual Deviance Alternative Hypothesis Model: DH1  196.71
df = 178
Residual Deviance Null Hypothesis Model:
General Chi-Square Test
 2  DH 0  DH1  200.32  196.71  3.607
p  value  P(  2  3.607)  .1647
Fail to reject the null, the reduced model is adequate.
2
Interpretation of Model Parameters
OR’s Associated with Categorical Predictors
> low.reduced
Call: glm(formula = Low ~ Prev + Hyper + Smoke + Minority + Lwt,
family = binomial)
Coefficients:
(Intercept)
Prev1
-0.26127 1.18194
Hyper1
1.39722
Smoke1
0.98185
Degrees of Freedom: 185 Total (i.e. Null);
Null Deviance:
232.4
Residual Deviance: 200.3
AIC: 212.3
Minority1
1.04480
Lwt
-0.01413
180 Residual
Estimated OR’s
> exp(low.reduced$coefficients[2:5])
Prev1
Hyper1
Smoke1 Minority1
3.260693 4.043938 2.669388 2.842841
95% CI for OR Associated with History of Premature Labor (Wald Intervals)
> exp(1.182 - 1.96*.444)
[1] 1.365827
> exp(1.182 + 1.96*.444)
[1] 7.78532
Holding everything else constant we estimate that the odds of having an infant with low birth
weight are between 1.366 and 7.785 times larger for mothers with a history of premature labor.
64
95% CI for OR Associated with Hypertension
> exp(1.397 - 1.96*.6563)
[1] 1.117006
> exp(1.397 + 1.96*.6563)
[1] 14.63401
Holding everything else constant we estimate that the odds of having an infant with low birth
weight are between 1.117 and 14.63 times larger for mothers with hypertension during
pregnancy.
95% CI for OR Associated with Smoking
> exp(.981849 - 1.96*.3983)
[1] 1.222846
> exp(.981849 + 1.96*.3983)
[1] 5.827086
Holding everything else constant we estimate that the odds of having an infant with low birth
weight are between 1.223 and 5.827 times larger for mothers who smoked during pregnancy.
95% CI for OR Associated with Minority Status
> exp(1.0448 - 1.96*.3950)
[1] 1.310751
> exp(1.0448 + 1.96*.3950)
[1] 6.16569
Holding everything else constant we estimate that the odds of having an infant with low birth
weight are between 1.311 and 6.166 times larger for non-white mothers.
OR Associated with Mother’s Weight at Last Menstrual Cycle
Because this is a continuous predictor with values over 100 we should use an increment
larger than one when considering the effect of mother’s weight on birth weight. Here we
will use an increment of c = 10 lbs., although certainly there are other possibilities.
> exp(-10*.014127)
[1] 0.8682549
i.e. 13.2% decrease in the OR for each additional 10 lbs. in premenstrual weight.
A 95% CI for this OR is:
> exp(10*(-.014127) - 1.96*10*.006387)
[1] 0.7660903
> exp(10*(-.014127) + 1.96*10*.006387)
[1] 0.9840439
Create a sequence of weights from smallest observed weight to the largest observed weight by ½
pound increments.
> x = seq(min(Lwt),max(Lwt),.5)
65
Here I have set the other covariates as follows: previous history (1 = yes), hypertension
(0 = no), smoking status (1 = yes), and minority (0 = no).
> fit = predict(low.reduced,data.frame(Prev=factor(rep(1,length(x))),
Hyper=factor(rep(0,length(x))),Smoke=factor(rep(1,length(x))),Minority=
factor(rep(0,length(x))),Lwt=x),type="response")
plot(x,fit,xlab=”Mother’s Weight”,ylab=”P(Low|Prev=1,Smoke=1,Lwt)”)
This is a plot of the effect of
premenstrual weight for smoking
mothers with a history of premature
labor. Using the predict command
above similar plots could be
constructed by examining other
combinations of the categorical
predictors.
66
Case Diagnostics (Delta Deviance and Cook’s Distance)
As in the case of ordinary least squares (OLS) regression we need to be wary of cases
that may have unduly high influence on our results and those that are poorly fit. The
most common influence measure is Cook’s Distance and a good measure of poorly fit
cases is the Delta Deviance.
Essentially Cook’s Distance ( ˆ( i ) or 𝐷𝑖 ) measures the changes in the estimated
parameters when the ith observation is deleted. This change is measured for each of the
observations and can be plotted versus ˆ( x) or observation number to aid in the
~
identification of high influence cases. Several cut-offs have been proposed for Cook’s
Distance, the most common being to classify an observation as having large influence if
ˆ( i )  1 or, in case of large sample size n, ˆ( i )  4 / n .
Cook’s Distance
 ( i ) 
2
1  eˆ i
k  1  hi
where eˆ χ i 
 hi

 1  hi

y i  yˆ i
is the Pearson’s residual defined above.
n i θˆ (~
x i )(1  θˆ (~
x i ))
Delta deviance measures the change in the deviance (D) when the ith case is deleted.
Values around 4 or larger are considered to cases that are poorly fit.
These cases correspond to cases to individuals where yi  1 but ˆ( x) is small, or cases
~
where yi  0 but ˆ( x) is large.
~
In cases of both high influence and poor fit it is good to look at the covariate values for
these individuals and we can begin to address the role they play in the analysis. In many
cases there will be several individuals with the same covariate pattern, especially if most
or all of the predictors are categorical in nature.
> Diagplot.glm(low.reduced)
67
> Diagplot.log(low.reduced)
Cases 11 and 13 have the highest Cook’s distances although they are not that large. It
should be noted also that they are also somewhat poorly fit. Cases 129, 144, 152, and
180 appear to be poorly fit. The information on all of these cases is shown below.
> Lowbirth[c(11,13,129,144,152,180),]
Low Prev Hyper Smoke Uterine Minority Age Lwt race bwt
11
0
0
1
0
0
1 19 95
3 2722
13
0
0
1
0
0
1 22 95
3 2750
129
1
0
0
0
1
0 29 130
1 1021
144
1
0
0
0
1
1 21 200
2 1928
152
1
0
0
0
0
0 24 138
1 2100
180
1
0
0
1
0
0 26 190
1 2466
68
Case 152 had a low birth weight infant even in the absence of the identified potential risk
factors. The fitted values for all four of the poorly fit cases are quite small.
> fitted(low.reduced)[c(11,13,129,144,152,180)]
11
13
129
144
152
180
0.69818500 0.69818500 0.10930602 0.11486743 0.09877858 0.12307383
Cases 11 and 13 have high predicted probabilities despite the fact that they had babies
with normal birth weight. Their relatively high leverage might come from the fact that
there were very few hypertensive minority women in the study. These two facts
combined lead to the relatively large Cook’s Distances for these two cases.
Plotting Estimated Conditional Probabilities ~ P( Low  1 | x~ )
A summary of the reduced model is given below:
> low.reduced
Call: glm(formula = Low ~ Prev + Hyper + Smoke + Minority + Lwt,
family = binomial)
Coefficients:
(Intercept)
Prev1
-0.26127 1.18194
Hyper1
1.39722
Smoke1
0.98185
Degrees of Freedom: 185 Total (i.e. Null);
Null Deviance:
232.4
Residual Deviance: 200.3
AIC: 212.3
Minority1
1.04480
Lwt
-0.01413
180 Residual
To easily plot probabilities in R we can write a function that takes covariate values and
compute the desired conditional probability.
> x <- seq(min(Lwt),max(Lwt),.5)
>
+
+
+
+
>
+
>
>
>
>
PrLwt <- function(x,Prev,Hyper,Smoke,Minority) {
L <- -.26127 + 1.18194*Prev + 1.39722*Hyper + .98185*Smoke +
1.0448*Minority - .01413*x
exp(L)/(1 + exp(L))
}
plot(x,PrLwt(x,1,1,1,1),xlab="Mother's Weight",ylab="P(Low=1|x)",
ylim=c(0,1),type="l")
title(main="Plot of P(Low=1|X) vs. Mother's Weight")
lines(x,PrLwt(x,0,0,0,0),lty=2,col="red")
lines(x,PrLwt(x,1,1,0,0),lty=3,col="blue")
lines(x,PrLwt(x,0,0,1,1),lty=4,col="green")
69
R Function – Diagplot.log
Plot Cook’s Distance and Delta Deviance for Logistic Regression Models
Diagplot.log = function(glm1)
{
k <- length(glm1$coef)
h <- lm.influence(glm1)$hat
fv <- fitted(glm1)
pr <- resid(glm1, type = "pearson")
dr <- resid(glm1, type = "deviance")
par(mfrow = c(2, 1))
n <- length(fv)
index <- seq(1, n, 1)
Ck <- (1/k)*((pr^2) * h)/((1 - h)^2)
Cd <- dr^2/(1 - h)
plot(index, Ck, type = "n", xlab = "Index", ylab =
"Cook's Distance", cex = 0.7, main =
"Plot of Cook's Distance vs. Index", col = 1)
points(index, Ck, col = 2)
identify(index, Ck)
plot(index, Cd, type = "n", xlab = "Index", ylab =
"Delta Deviance", cex = 0.7, main =
"Plot of Delta Deviance vs. Index")
points(index, Cd, col = 2)
identify(index, Cd)
par(mfrow = c(1, 1))
invisible()
}
70
Diagplot.glm - displays case diagnositic plots for a logistic regression
Diagplot.glm
function (lm1, lms = summary(lm1), lmi = lm.influence(lm1))
{
par(mfrow = c(2, 2))
h <- lmi$hat
pr <- residuals(lm1, type = "pearson")
dr <- residuals(lm1, type = "deviance")
dB <- ((pr^2) * h)/((1 - h)^2)
dD <- dr^2/(1 - h)
fv <- lm1$fitted.values
plot(fv, dB, main = "Plot of dB vs. Fitted Values", xlab = "Fitted Values",
ylab = "dB")
points(fv[dB > 1], dB[dB > 1], col = "blue")
plot(fv, dD, main = "Plot of dD vs. Fitted Values", xlab = "Fitted Values",
ylab = "dD")
points(fv[dD > 4], dD[dD > 4], col = "blue")
index <- seq(1:length(fv))
plot(dB, main = "Plot of dB vs. Index Number", xlab = "Index Number")
points(index[dB > 1], dB[dB > 1], col = "blue")
identify(index, dB, cex = 0.75)
plot(dD, main = "Plot of dD vs. Index Number", xlab = "Index Number")
points(index[dD > 4], dD[dD > 4], col = "blue")
identify(index, dD, cex = 0.75)
par(mfrow = c(1, 1))
invisible()
}
71
Interactions and Higher Order Terms (Note ~ uses data frame: Lowbwt)
Working with a slightly different version of the low birth weight data available which
includes an additional predictor, ftv, which is a factor that indicates the number of first
trimester doctor visits the woman (coded as: 0, 1, or 2+). We will examine how the
model below was developed in the next section where we discuss model development.
In the model below we have added an interaction between age and the number of first
trimester visits. The logistic model is:
  ( x) 
~
     Age   Lwt   Smoke   Pr ev   HT   UI 
log 
o
1
2
3
4
5
6
 1   ( x) 
~ 

 7 FTV 1   8 FTV 2   9 Age * FTV 1  10 Age * FTV 2  11Smoke * UI
> summary(bigmodel)
Call:
glm(formula = low ~ age + lwt + smoke + ptd + ht + ui + ftv +
age:ftv + smoke:ui, family = binomial)
Deviance Residuals:
Min
1Q
Median
-1.8945 -0.7128 -0.4817
3Q
0.7841
Max
2.3418
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.582389
1.420834 -0.410 0.681885
age
0.075538
0.053945
1.400 0.161428
lwt
-0.020372
0.007488 -2.721 0.006513 **
smoke1
0.780047
0.420043
1.857 0.063302 .
ptd1
1.560304
0.496626
3.142 0.001679 **
ht1
2.065680
0.748330
2.760 0.005773 **
ui1
1.818496
0.666670
2.728 0.006377 **
ftv1
2.921068
2.284093
1.279 0.200941
ftv2+
9.244460
2.650495
3.488 0.000487 ***
age:ftv1
-0.161823
0.096736 -1.673 0.094360 .
age:ftv2+
-0.411011
0.118553 -3.467 0.000527 ***
smoke1:ui1 -1.916644
0.972366 -1.971 0.048711 *
--Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 234.67
Residual deviance: 183.07
AIC: 207.07
on 188
on 177
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 4
> bigmodel$coefficients
(Intercept)
age
lwt
smoke1
prev1
ht1
-0.58238913 0.07553844 -0.02037234 0.78004747 1.56030401 2.06567991
ui1
ftv1
ftv2+
age:ftv1
age:ftv2+ smoke1:ui1
1.81849631 2.92106773 9.24445985 -0.16182328 -0.41101103 -1.91664380
72
Calculate P(Low|Age,FTV) for women of average pre-pregnancy weight with all other
risk factors absent. Similar calculations could be done if we wanted to add in other
factors as well.
First we calculate the logits as function of age for three levels of FTV 0, 1, and 2+
respectively.
> L <- -.5824 + .0755*agex - .02037*mean(lwt)
> L1 <- -.5824 + .0755*agex - .02037*mean(lwt) + 2.9211 - .16182*agex
> L2 <- -.5824 + .0755*agex - .02037*mean(lwt) + 9.2445 - .4110*agex
Next we calculate the associated conditional probabilities.
> P <- exp(L)/(1+exp(L))
> P1 <- exp(L1)/(1+exp(L1))
> P2 <- exp(L2)/(1+exp(L2))
Finally we plot the probability curves as function of age and FTV.
> plot(agex,P,type="l",xlab="Age",ylab="P(Low|Age,FTV)",ylim=c(0,1))
> lines(agex,P1,lty=2,col="blue")
> lines(agex,P2,lty=3,col="red")
> title(main="Interaction Between Age and First Trimester
Visits",cex=.6)
The interaction between in age and
FTV produces differences in
direction and magnitude of the age
effect. For women with no first
trimester doctor visits their
probability of low birth weight
increases with age. However for
women with at least one first
trimester visit the probability of low
birth weight decreases with age.
The magnitude of that drop is
largest for women with 2 or more
first trimester visits.
We also have an interaction between smoking and uterine irritability added to the model.
This will affect how we interpret the two in terms of odds ratios. We need to consider
the OR associated with smoking for women without uterine irritability, the OR associated
with uterine irritability for nonsmokers, and finally the OR associated with smoking and
having uterine irritability during pregnancy.
73
These estimated odds ratios are given below:
OR for Smoking with No Uterine Irritability
> exp(.7800)
[1] 2.181472
OR for Uterine Irritability with No Smoking
> exp(1.8185)
[1] 6.162608
OR for Smoking and Uterine Irritability
> exp(.7800+1.8185-1.91664)
[1] 1.977553
This result is hard to explain physiologically and so this interaction term might be
removed from the model.
Model Selection Methods
Stepwise methods used in logistic regression are the same as those used in ordinary least
square regression however the measure is the AIC (Akaike Information Criteria) as
opposed to Mallow’s Ck statistic. Like Mallow’s statistic, AIC balances residual
deviance and the number of parameters in the model.
AIC = D + 2k ˆ
Where D = residual deviance, k = total number of estimated parameters, and ˆ is an
estimate of the dispersion parameter which is taken to be 1 in models where
overdispersion is not present. Overdispersion occurs when the data consists of the
number of successes out of mi > 1 trials and the trials are not independent (e.g. male birth
data from your last homework).
Forward, backward, both forward and backward simultaneously, and all possible subsets
regression methods can be employed to find models with small AIC values. By default R
uses both forward and backward selection simultaneously. The command to do this in R
has the basic form:
> step(current model name)
To have it select from models containing all potential two-way interactions use:
> step(current model name, scope=~.^2)
This sometimes will have problems with convergence due to overfitting (i.e. the
estimated probabilities approach 0 and 1 as in the saturated model). If this occurs you
can have R consider adding each of the potential interaction terms and then you can scan
the list and decide which you might want to add to your existing model. You can then
continue adding terms until the AIC criteria suggests additional terms do not improve
current model.
74
These commands are illustrated for the low birth weight data with first trimester visits
included in the output shown below.
Base Model
> low.glm <- glm(low~age+lwt+race+smoke+ht+ui+ptd+ftv,family=binomial)
> summary(low.glm)
Call:
glm(formula = low ~ age + lwt + race + smoke + ht + ui + ptd +
ftv, family = binomial)
Deviance Residuals:
Min
1Q
Median
-1.7038 -0.8068 -0.5009
3Q
0.8836
Max
2.2151
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.822706
1.240174
0.663 0.50709
age
-0.037220
0.038530 -0.966 0.33404
lwt
-0.015651
0.007048 -2.221 0.02637 *
race2
1.192231
0.534428
2.231 0.02569 *
race3
0.740513
0.459769
1.611 0.10726
smoke1
0.755374
0.423246
1.785 0.07431 .
ht1
1.912974
0.718586
2.662 0.00776 **
ui1
0.680162
0.463464
1.468 0.14222
ptd1
1.343654
0.479409
2.803 0.00507 **
ftv1
-0.436331
0.477792 -0.913 0.36112
ftv2+
0.178939
0.455227
0.393 0.69426
--Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 234.67
Residual deviance: 195.48
AIC: 217.48
on 188
on 178
degrees of freedom
degrees of freedom
Find “best” model that includes all potential two-way interactions.
> low.step <- step(low.glm,scope=~.^2)
Start: AIC= 217.48
low ~ age + lwt + race + smoke + ht + ui + ptd + ftv
+ age:ftv
- ftv
- age
<none>
- ui
+ smoke:ui
+ lwt:smoke
+ ui:ptd
+ lwt:ui
+ ptd:ftv
+ ht:ptd
Df Deviance
AIC
2
183.00 209.00
2
196.83 214.83
1
196.42 216.42
195.48 217.48
1
197.59 217.59
1
193.76 217.76
1
194.04 218.04
1
194.24 218.24
1
194.28 218.28
2
192.38 218.38
1
194.55 218.55
75
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
age:ptd
age:ht
age:smoke
race:ui
smoke
smoke:ht
smoke:ptd
race
race:smoke
lwt:ptd
lwt:ht
age:lwt
age:ui
ht:ftv
lwt:ftv
smoke:ftv
age:race
lwt:race
race:ptd
lwt
race:ht
ui:ftv
ht
ptd
race:ftv
1
1
1
2
1
1
1
2
2
1
1
1
1
2
2
2
2
2
2
1
2
2
1
1
4
194.58
194.59
194.61
192.63
198.67
195.03
195.16
201.23
193.24
195.35
195.44
195.46
195.47
194.00
194.19
194.47
194.58
194.63
194.83
200.95
195.19
195.32
202.93
203.58
193.81
218.58
218.59
218.61
218.63
218.67
219.03
219.16
219.23
219.24
219.35
219.44
219.46
219.47
220.00
220.19
220.47
220.58
220.63
220.83
220.95
221.19
221.32
222.93
223.58
223.81
Step: AIC= 209
low ~ age + lwt + race + smoke + ht + ui + ptd + ftv + age:ftv
+ smoke:ui
+ lwt:smoke
- race
<none>
+ ui:ptd
+ lwt:ui
+ ht:ptd
- smoke
+ age:smoke
+ race:ui
+ age:ptd
- ui
+ smoke:ht
+ lwt:ptd
+ smoke:ptd
+ age:ht
+ age:ui
+ age:lwt
+ lwt:ht
+ race:smoke
+ lwt:ftv
+ ptd:ftv
+ age:race
+ smoke:ftv
+ ht:ftv
+ lwt:race
+ race:ht
Df Deviance
AIC
1
179.94 207.94
1
180.89 208.89
2
186.99 208.99
183.00 209.00
1
181.42 209.42
1
181.90 209.90
1
182.06 210.06
1
186.11 210.11
1
182.16 210.16
2
180.32 210.32
1
182.50 210.50
1
186.61 210.61
1
182.71 210.71
1
182.75 210.75
1
182.82 210.82
1
182.90 210.90
1
182.96 210.96
1
183.00 211.00
1
183.00 211.00
2
181.23 211.23
2
181.44 211.44
2
181.57 211.57
2
181.62 211.62
2
181.65 211.65
2
181.82 211.82
2
182.55 212.55
2
182.78 212.78
76
+
+
+
-
race:ptd
lwt
ui:ftv
ht
ptd
race:ftv
age:ftv
2
1
2
1
1
4
2
182.85
188.88
182.94
190.13
191.05
181.69
195.48
212.85
212.88
212.94
214.13
215.05
215.69
217.48
Step: AIC= 207.94
low ~ age + lwt + race + smoke + ht + ui + ptd + ftv + age:ftv +
smoke:ui
- race
<none>
+ lwt:smoke
+ ht:ptd
- smoke:ui
+ ui:ptd
+ age:ptd
+ age:smoke
+ smoke:ptd
+ lwt:ptd
+ lwt:ui
+ age:ht
+ smoke:ht
+ age:lwt
+ age:ui
+ lwt:ht
+ lwt:ftv
+ ptd:ftv
+ smoke:ftv
+ race:smoke
+ age:race
+ ht:ftv
+ race:ui
+ ui:ftv
+ race:ht
+ lwt:race
+ race:ptd
- lwt
- ht
+ race:ftv
- ptd
- age:ftv
Df Deviance
AIC
2
183.07 207.07
179.94 207.94
1
178.34 208.34
1
178.89 208.89
1
183.00 209.00
1
179.07 209.07
1
179.35 209.35
1
179.37 209.37
1
179.58 209.58
1
179.61 209.61
1
179.76 209.76
1
179.78 209.78
1
179.82 209.82
1
179.84 209.84
1
179.86 209.86
1
179.94 209.94
2
178.25 210.25
2
178.53 210.53
2
178.64 210.64
2
178.73 210.73
2
178.84 210.84
2
178.89 210.89
2
179.13 211.13
2
179.50 211.50
2
179.52 211.52
2
179.68 211.68
2
179.86 211.86
1
187.15 213.15
1
187.66 213.66
4
178.51 214.51
1
188.83 214.83
2
193.76 217.76
Step: AIC= 207.07
low ~ age + lwt + smoke + ht + ui + ptd + ftv + age:ftv + smoke:ui
<none>
+ lwt:smoke
+ ui:ptd
+ ht:ptd
+ race
+ age:smoke
+ age:ht
Df Deviance
183.07
1
181.40
1
181.88
1
181.93
2
179.94
1
181.97
1
182.64
AIC
207.07
207.40
207.88
207.93
207.94
207.97
208.64
77
+
+
+
+
+
+
+
+
+
+
+
+
+
-
age:ptd
lwt:ptd
lwt:ui
smoke:ptd
age:lwt
smoke:ui
age:ui
smoke:ht
lwt:ht
smoke:ftv
lwt:ftv
ptd:ftv
ui:ftv
ht:ftv
ht
lwt
ptd
age:ftv
1
1
1
1
1
1
1
1
1
2
2
2
2
2
1
1
1
2
182.69
182.73
182.76
182.85
182.92
186.99
182.99
183.02
183.06
181.48
181.69
181.85
182.28
182.41
191.21
191.56
193.59
199.00
208.69
208.73
208.76
208.85
208.92
208.99
208.99
209.02
209.06
209.48
209.69
209.85
210.28
210.41
213.21
213.56
215.59
219.00
Summarize the model returned from the stepwise search
> summary(low.step)
Call:
glm(formula = low ~ age + lwt + smoke + ht + ui + ptd + ftv +
age:ftv + smoke:ui, family = binomial)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.582389
1.420834 -0.410 0.681885
age
0.075538
0.053945
1.400 0.161428
lwt
-0.020372
0.007488 -2.721 0.006513 **
smoke1
0.780047
0.420043
1.857 0.063302 .
ht1
2.065680
0.748330
2.760 0.005773 **
ui1
1.818496
0.666670
2.728 0.006377 **
ptd1
1.560304
0.496626
3.142 0.001679 **
ftv1
2.921068
2.284093
1.279 0.200941
ftv2+
9.244460
2.650495
3.488 0.000487 ***
age:ftv1
-0.161823
0.096736 -1.673 0.094360 .
age:ftv2+
-0.411011
0.118553 -3.467 0.000527 ***
smoke1:ui1 -1.916644
0.972366 -1.971 0.048711 *
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 234.67 on 188 degrees of freedom
Residual deviance: 183.07 on 177 degrees of freedom
AIC: 207.07
Number of Fisher Scoring iterations: 4
This is the model used to demonstrate model interpretation in the presence of
interactions.
78
An alternative to the full blown search above is to consider adding a single interaction
term to the “Base Model” from the set of all possible terms.
> add1(low.glm,scope=~.^2)
Single term additions
Model:
low ~ age + lwt + race
Df Deviance
<none>
195.48
age:lwt
1
195.46
age:race
2
194.58
age:smoke
1
194.61
age:ht
1
194.59
age:ui
1
195.47
age:ptd
1
194.58
age:ftv
2
183.00
lwt:race
2
194.63
lwt:smoke
1
194.04
lwt:ht
1
195.44
lwt:ui
1
194.28
lwt:ptd
1
195.35
lwt:ftv
2
194.19
race:smoke 2
193.24
race:ht
2
195.19
race:ui
2
192.63
race:ptd
2
194.83
race:ftv
4
193.81
smoke:ht
1
195.03
smoke:ui
1
193.76
smoke:ptd
1
195.16
smoke:ftv
2
194.47
ht:ui
0
195.48
ht:ptd
1
194.55
ht:ftv
2
194.00
ui:ptd
1
194.24
ui:ftv
2
195.32
ptd:ftv
2
192.38
+ smoke + ht + ui + ptd + ftv
AIC
217.48
219.46
220.58
218.61
218.59
219.47
218.58
209.00 *
220.63
218.04
219.44
218.28
219.35
220.19
219.24
221.19
218.63
220.83
223.81
219.03
217.76
219.16
220.47
217.48
218.55
220.00
218.24
221.32
218.38
We can than “manually” enter this term to our base model by using the update
command in R.
> low.glm2 <- update(low.glm,.~.+age:ftv)
> summary(low.glm2)
Call:
glm(formula = low ~ age + lwt + race + smoke + ht + ui + ptd +
ftv + age:ftv, family = binomial)
Deviance Residuals:
Min
1Q
Median
-2.0338 -0.7690 -0.4510
3Q
0.8354
Max
2.3383
Coefficients:
Estimate Std. Error z value Pr(>|z|)
79
(Intercept) -1.636485
age
0.085461
lwt
-0.017599
race2
0.994134
race3
0.700669
smoke1
0.792972
ht1
1.936204
ui1
0.938620
ptd1
1.373390
ftv1
2.877889
ftv2+
8.264965
age:ftv1
-0.149619
age:ftv2+
-0.359454
--Signif. codes: 0 `***'
1.558677
0.055734
0.007653
0.550962
0.491400
0.452303
0.747576
0.492240
0.495738
2.253710
2.594444
0.096342
0.115429
-1.050
1.533
-2.300
1.804
1.426
1.753
2.590
1.907
2.770
1.277
3.186
-1.553
-3.114
0.29376
0.12519
0.02147
0.07118
0.15391
0.07957
0.00960
0.05654
0.00560
0.20162
0.00144
0.12043
0.00185
*
.
.
**
.
**
**
**
0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 234.67 on 188 degrees of freedom
Residual deviance: 183.00 on 176 degrees of freedom
AIC: 209
Number of Fisher Scoring iterations: 4
Next we could use add1 to consider the remaining interaction terms for addition to this
model.
> add1(low.glm2,scope=~.^2)
Single term additions
Model:
low ~ age + lwt + race + smoke + ht + ui + ptd + ftv + age:ftv
Df Deviance
AIC
<none>
183.00 209.00
age:lwt
1
183.00 211.00
age:race
2
181.62 211.62
age:smoke
1
182.16 210.16
age:ht
1
182.90 210.90
age:ui
1
182.96 210.96
age:ptd
1
182.50 210.50
lwt:race
2
182.55 212.55
lwt:smoke
1
180.89 208.89 *
lwt:ht
1
183.00 211.00
lwt:ui
1
181.90 209.90
lwt:ptd
1
182.75 210.75
lwt:ftv
2
181.44 211.44
race:smoke 2
181.23 211.23
race:ht
2
182.78 212.78
race:ui
2
180.32 210.32
race:ptd
2
182.85 212.85
race:ftv
4
181.69 215.69
smoke:ht
1
182.71 210.71
smoke:ui
1
179.94 207.94 **
smoke:ptd
1
182.82 210.82
smoke:ftv
2
181.65 211.65
ht:ui
0
183.00 209.00
ht:ptd
1
182.06 210.06
ht:ftv
2
181.82 211.82
ui:ptd
1
181.42 209.42
ui:ftv
2
182.94 212.94
ptd:ftv
2
181.57 211.57
80
Download