Regression with a Dichotomous Dependent Variable:
An Introduction to Logistic Regression
to accompany
Prepared by
Steven Prus
Carleton University
Regression with a Dichotomous Dependent Variable: An Introduction to
Logistic Regression to accompany
Statistics: A Tool for Social Research, Second Canadian Edition
By Joseph F. Healey and Steven G. Prus
COPYRIGHT © 2013 by Nelson Education Ltd. Nelson is a registered trademark used herein
under licence. All rights reserved.
For more information, contact Nelson, 1120 Birchmount Road, Toronto, ON M1K 5G4. Or you
can visit our Internet site at www.nelson.com.
ALL RIGHTS RESERVED. No part of this work covered by the copyright hereon may be
reproduced or used in any form or by any means—graphic, electronic, or mechanical, including
photocopying, recording, taping, web distribution or information storage and retrieval systems—
without the written permission of the publisher.
Copyright © 2013 by Nelson Education Ltd.
Regression with a Dichotomous Dependent
Variable: An Introduction to Logistic Regression
Introduction
In the on-line chapter titled “Linear Regression with Dummy Variables,” we looked at how to
use a dichotomous (dummy) independent variable in linear regression. It is also possible that
the dependent variable be a dichotomous variable. In this situation we commonly use a
statistical technique similar to linear regression called logistic regression.
Logistic regression shares the same objective as linear regression. Logistic and linear regression
are statistical methods used to predict the dependent variable based on one or more independent
variables. They both produce a regression model to summarize the relationship between these
variables, as well as other statistics to describe how well the model fits the data. Accordingly,
you may wish to review the material on linear regression in Chapters 13 and 14 before reading
any further.
While linear and logistic regression are forms of regression analysis, the former is used for the
modeling of an interval-ratio dependent variable and the latter for a dichotomous (dummy)
variable. Both procedures require the independent variable to be measured at the interval-ratio
level, though nominal and ordinal independent variables can be converted to a set of dummy
variables. Here, we will consider the simplest (bivariate) form of logistic regression, though
like linear regression, it can be extended to include more than one independent variable.
Probabilities, Odds, and Logits
To understand logistic regression, we must first understand the concepts of probabilities, odds,
and logits. Each concept is considered in turn.
Probabilities In logistic regression, the dependent variable is a dichotomous variable, coded as
1 for the category of “interest” or “success” and 0 for the “reference” or “failure” category.
Specifically, we are interested in the probability that the dependent variable, Y, equals 1. 1
As defined in Chapter 4 of your textbook, a probability, p, is the likelihood that a particular
event will occur. The probability of an event, which ranges from 0 to 1, is expressed by the
ratio of the number of actual occurrences of the event to the total number of possible
occurrences of the event:
FORMULA 1
π‘ƒπ‘Ÿπ‘œπ‘π‘Žπ‘π‘–π‘™π‘–π‘‘π‘¦ (𝑝) =
number of occurrences of event
number of possible occurrences of event
The category of interest/success is arbitrarily selected by the researcher, so the researcher
chooses which category to code as 1.
1
1
Copyright © 2013 by Nelson Education Ltd.
For example, suppose we are interested in the probability that the variable self-rated health, Y,
equals 1 (where 1 = good health and 0 = poor health). If have 1,000 people in our study, and
750 are coded as 1 and 250 coded as 0, then the proportion of the sample with good health is
0.75. This value is also the probability, p, that Y equals 1 (i.e., the proportion and probability of
Y = 1 are the same):
750
𝑝 = 1,000 = 0.75
Odds Odds are an alternative way of expressing the chance that a particular event will occur,
and are directly related to probabilities. While probabilities measure chance by using the “ratio
of occurrence to the whole,” odds measure chance by using the “ratio of occurrence to nonoccurrence.” That is, odds are the ratio of the probability that an event will occur to the
probability that it will not occur:
FORMULA 2
𝑝
π‘œπ‘‘π‘‘π‘  = 1−𝑝
where p = probability of Y = 1.
Continuing with our previous example, if the probability of good health (Y = 1) is 0.75, then the
odds of good health occurring are:
π‘œπ‘‘π‘‘π‘  =
0.75
0.75
=
=3
(1 − 0.75) 0.25
So, the odds of good health are 3 to 1, often written as 3:1.
As another example, if the probability of rain tomorrow (Y = 1) is 0.80 (i.e., if there’s an 80%
chance of rain tomorrow), then the odds of rain occurring tomorrow are 4 to 1:
π‘œπ‘‘π‘‘π‘  =
0.80
0.80
=
=4
(1 − 0.80) 0.20
Odds Ratio Extending our discussion of odds, we call the ratio of two odds an “odds ratio,”
symbolized as OR. That is, the odds ratio is the ratio of the odds of an event occurring (odds
that Y =1) in one group to the odds of the event occurring (odds that Y =1) in another group.
The groups might be men and women, lower educated persons and higher educated persons,
and so on. We write the formula for the odds ratio as:
FORMULA 3
π‘œπ‘‘π‘‘π‘  π‘Ÿπ‘Žπ‘‘π‘–π‘œ (𝑂𝑅) =
𝑝1
)
1−𝑝1
𝑝2
(
)
1−𝑝2
(
=
π‘œπ‘‘π‘‘π‘ 1
π‘œπ‘‘π‘‘π‘ 2
where p1 = probability of Y = 1 in Group 1
p2 = probability of Y = 1 in Group 2
2
Copyright © 2013 by Nelson Education Ltd.
For example, let's say that 90% of persons with a post-secondary education (Group 1) and 75%
of persons with a high-school education or less (Group 2) have good health. Thus, the odds of
good health for persons with a post-secondary education are 0.90/0.10 = 9 and the odds of good
health for persons with a high-school education or less are 0.75/0.25 = 3. Hence, the odds ratio
is 9/3 = 3; that is, the odds of person with a post-secondary education having good health are 3
times higher than the odds of a person with a high-school education or less:
π‘œπ‘‘π‘‘π‘  π‘Ÿπ‘Žπ‘‘π‘–π‘œ (𝑂𝑅) =
0.9
(1 − 0.90)
0.75
)
1 − 0.75
=
(
9
=3
3
Odds Ratio vs. Relative Risk
As a caveat to this discussion, it is sometimes the case that
people mistake the odds ratio (OR) for the risk ratio, sometimes called the relative risk, (RR),
though they are different statistical concepts.
The risk ratio is the ratio of the probability of the event occurring in one group relative to
another group while the odds ratio is a ratio of the odds of the event occurring in one group
relative to another group. Hence, the RR is the ratio of probabilities and the OR the ratio of
odds:
𝑝1
)
1−𝑝1
𝑝2
(
)
1−𝑝2
(
=
π‘œπ‘‘π‘‘π‘ 1
FORMULA 4
π‘œπ‘‘π‘‘π‘  π‘Ÿπ‘Žπ‘‘π‘–π‘œ (𝑂𝑅) =
FORMULA 5
π‘Ÿπ‘–π‘ π‘˜ π‘Ÿπ‘Žπ‘‘π‘–π‘œ (i. e. , π‘Ÿπ‘’π‘™π‘Žπ‘‘π‘–π‘£π‘’ π‘Ÿπ‘–π‘ π‘˜)(RR) = (𝑝1 )
π‘œπ‘‘π‘‘π‘ 2
(𝑝 )
2
where p1 = probability of Y = 1 in Group 1
p2 = probability of Y = 1 in Group 2
Let’s suppose that we are interested in the odds and risks of unemployment for persons under
the age of 25 (Group1) and for persons 25 years or older (Group 2), and we collect the
following data:
Employed
Unemployed
<25
40
60
100
25+
80
20
100
So, if the probability of unemployment for persons <25, p1, is 0.60 (i. e. ,
20
60
) and the
100
probability of unemployment for persons 25+, p2, is 0.20 ( 100), the relative risk (RR) is 3:
𝑅𝑅 =
3
(𝑝1 )
(𝑝2
(0.60)
= (0.20) = 3
)
Copyright © 2013 by Nelson Education Ltd.
That is, the probability (or risk) of unemployment for persons <25 is 3 times higher than the
probability (risk) of unemployment for persons 25+ (i.e., a person <25 is 3 times more likely to
be unemployment than a person 25+).
0.60
On the other hand, if the odds of unemployment for persons <25, π‘œπ‘‘π‘‘π‘ 1 , are 1.5 (i. e. , 1−0.60)
0.20
and the odds of unemployment for persons 25+,π‘œπ‘‘π‘‘π‘ 2 , are 0.25 (1−0.20), then the odds ratio
(OR) is 6:
π‘œπ‘‘π‘‘π‘ 
1.5
𝑂𝑅 = π‘œπ‘‘π‘‘π‘ 1 = 0.25 = 6
2
That is, the odds of unemployment for persons <25 are 6 times higher than the odds of
unemployed for persons 25+.
Clearly, the OR and RR are different from each other. When p is small, OR and RR will be
similar, otherwise, as in our example, the OR and RR are not at all similar and must not be
confused. It not uncommon, however, for people to confuse them; you cannot say that an odds
ratio of 6 means that “a person <25 is 6 times more likely to be unemployed than a person
25+.” This is confusing and wrong, since the saying “6 times more likely” implies a ratio of
probabilities (that is an RR) and not a ratio of odds (OR). Logistic regression results are
interpreted using the OR, so it is important to remember its precise definition.
Logits In logistic regression, for reasons described in the next section, the dependent variable is
transformed from a probability to what is called a logit. By taking the natural logarithm,
symbolized as ln, of the odds of an event occurring (i.e., odds that Y = 1), we get the “natural
logarithm of the odds of an event occurring” (i.e., natural logarithm of the odds that Y = 1), or
what we simply call the “log odds” or “logit”. It is sometimes written as logit(p), but we will
refer to it as logit:
FORMULA 6
𝑝
π‘™π‘œπ‘”π‘–π‘‘ = 𝑙𝑛 (1−𝑝)
where p = probability of Y = 1
ln = natural logarithm
But this invites the question, “what is a natural logarithm?” To answer the question let’s first
define a “logarithm.”
Generally speaking, the logarithm of a number is the exponent (or power) by which a given
value, called the base, has to be raised to produce that number. As an example, the logarithm of
the number 100 to base 10 is 2 (i.e., 100 is 10 to the power 2 or 10 × 10), which is written
aslog10 (100) = 2. In this example the number is 100, the base is 10, and the logarithm is 2.
Continuing with this example, the logarithm of the number 1,000 to base 10 is 3, or 10 × 10
× 10 [log10 (1,000) = 3]; the logarithm of the number 10,000 to base 10 is 4, or 10 × 10
× 10 × 10 [log10 (10,000) = 4]; and so on. This logarithm, with a base 10, is often called the
"common” logarithm.
4
Copyright © 2013 by Nelson Education Ltd.
Another frequently used base is 2. For example, the logarithm of the number 8 with base 2 is 3,
or 2 × 2 × 2 = 8. That is, we need to multiply 2 three times to get 8. This is written as log 2 (8) =
3.
Finally, besides 2 and 10, another commonly used base is e, which is approximately 2.718. (e,
often considered the most important number in all of mathematics, is a mathematical constant
like Pi, π. A mathematical constant is a number that occurs in many areas of mathematics, is
significantly important in some way, and is completely fixed—as opposed to a mathematical
variable whose symbols stand for a value that may vary (e.g., in the algebraic expression “x2 +
y2”, x and y are variables since the numbers they represent can vary). A logarithm, with base e,
is called a “natural” logarithm. It is written as log 𝑒 or simply ln. As we will see in the next
section, the natural logarithm plays a central role in logistic regression.
Continuing with our earlier examples, if the odds of good health occurring (odds that Y = 1) are
3, then the natural logarithm of the odds (logit) is about 1.10, or simply written as “ln(3) =
1.10”:
0.75
0.75
π‘™π‘œπ‘”π‘–π‘‘ = 𝑙𝑛 (1−0.75) = 𝑙𝑛 (0.25) = 𝑙𝑛(3) = 1.10
In other words, we need to use e about 1.10 times in a multiplication to get the number 3,
or2.7181.10 = 3.
And, if the odds of rain tomorrow are 4, then the natural logarithm of the odds is about 1.39:
π‘™π‘œπ‘”π‘–π‘‘ = 𝑙𝑛 (
0.80
0.80
) = 𝑙𝑛 (
) = 𝑙𝑛(4) = 1.39
1 − 0.80
0.20
That is,2.7181.39 = 4.
Summary
Let’s review what we have learned so far:
1. A probability is the number of times an event actually occurs divided by the number of times
an event might occur.
2. An odds is the probability that an event will occur divided by the probability that an event
will not occur. An odds ratio is the ratio of the odds for one group divided by the odds for the
other group.
3. A log odds ( logit) is the natural logarithm of the odds of an event occurring.
Also take note of the following relationships between the log odds, odds, and probability:
1. (logit =0) = (odds =1) = (probability =0.50)
2. (logit <0) = (odds <1) = (probability <0.50)
3. (logit >0) = (odds >1) = (probability >0.50)
5
Copyright © 2013 by Nelson Education Ltd.
What this means is, for example, that when the logit is equal to 0, the odds will be equal to 1
and the probability will be equal to 0.50.
Overview of Logistic Regression
Why Use Logistic Regression?
Now that you understand logits, odds, and probabilities,
let’s begin our examination of logistic regression with an illustration. Suppose we want to
analyze the relationship between how much students prepare for a final exam and the final
exam results. Table 1 presents the hypothetical scores of 65 students in a course on each of the
two variables: actual number of hours spent studying for the final exam, X, and the result of the
final exam, Y, where 1 = pass and 0 = fail. Given the pass/fail dichotomy of the dependent
variable (it’s a dummy variable), logistic regression can be used to analyze this relationship.
Table 1 Number of Hours Studied for Final Exam, X, and Final Exam Result, Y. (Raw data)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
X
0
0
0
1
1
2
2
3
5
5
Y
0
0
0
0
0
0
0
0
0
0
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
X
5
6
6
7
7
7
8
10
10
10
Y
0
0
0
0
0
0
0
0
0
0
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
X Y
10 1
11 0
12 0
14 0
14 0
15 0
15 0
18 0
18 1
19 0
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
X
20
20
21
24
25
26
28
28
29
30
Y
0
0
0
1
0
0
1
1
1
0
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
X Y
30 0
32 0
32 1
33 1
34 1
35 0
35 1
35 1
36 1
37 1
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
X
38
39
39
40
40
40
40
42
42
45
Y
1
1
1
1
1
1
1
1
1
1
61.
62.
63.
64.
65.
A quick glance at Table 1 suggests that the two variables are related—students who spend more
time studying for the exam are more likely to pass the exam. While it is completely appropriate
to use logistic regression to analyze the relationship using the raw scores of the students in
Table 1, it is easier to explain and visualize logistic regression if the independent variable is
collapsed into categories, and, as such, we calculate the proportion of students who pass the
exam at different categories of hours of study.
Table 2 shows the data from Table 1 grouped into ten equal categories of 5-hour intervals, and
provides the following information:
1. the independent variable X—number of hours spent studying for the final exam; 2
2. the dependent variable Y—result of the final exam, where 0=fail and 1=pass; 3
The interval-width of 5-hours was arbitrarily selected. An interval of another size, such as a 2or 3-hour interval, could have been just as easily used for this example.
3
We also arbitrarily selected the category of interest (Y=1) as “pass.” We could have
alternatively coded the dependent variable as 1 for “fail” and 0 for “pass.” As a general rule of
2
6
Copyright © 2013 by Nelson Education Ltd.
X
45
48
48
49
49
Y
1
1
1
1
1
3. the total number of cases (students) of Y in each category of X, n;
4. the proportion or probability of Y = 1 or “pass” for each category of X, p (p is calculated as
the ratio of the number of cases of Y = 1 to the total number of cases of Y for that category. For
example, the proportion of students with the score 1 on the dependent variable in the 25-29
hour category is 0.60 or 3 ÷ 5).
Table 2 Number of Hours Studied, X, Final Exam Result, Y, and Probability of Y = 1, p, for
each Category of X. (Grouped data based Table 1).
X
(Hours Studied)
0-4
5-9
10-14
15-19
20-24
25-29
30-34
35-39
40-44
45-49
Y
(exam)
0
1
(fail) (pass)
8
0
9
0
7
1
4
1
3
1
2
3
3
3
1
7
0
6
0
6
n
8
9
8
5
4
5
6
8
6
6
p
(Y=1)
0.000
0.000
0.125
0.200
0.250
0.600
0.500
0.875
1.000
1.000
To get a visual sense of the relationship between the two variables in Table 2, Figure 1 plots the
data in the form of a scattergram. The figure plots the probability or proportion, p, of
observations with the value 1 (pass) on the dependent variable (Y = 1), arrayed along the
vertical axis, for each category of the independent variable (hours of study) X, arrayed along the
horizontal axis. 4 So, for example, the scattergram points out that the proportion of students that
spent between 25 and 29 hours studying for the exam that passed the exam is 0.60. In other
words, the probability of passing the exam for those who study between 25 and 29 hours is
0.60. (Remember, proportion and probability of Y = 1 are the same).
Figure 1 Probability of Passing Exam, p, on Hours of Study, X. (Based on data from Table 2).
thumb, the value coded 1 on the dependent variable should represent the desired outcome or
category of “interest” to the researcher.
4
It is important to emphasize that we are plotting the proportion of students with the score 1 on
the dependent variable from Table 2 and not the actual raw scores of students from Table 1.
7
Copyright © 2013 by Nelson Education Ltd.
1.0
0.9
0.8
0.7
0.6
p
0.5
0.4
0.3
0.2
0.1
0.0
0
10
20
30
40
50
X
Figure 1 clearly reveals that the two variables are related, though the observation points in the
scattergram form a pattern that is non-linear—Figure 1 shows that the variables have a
curvilinear relationship. In fact, it is often the case that when we plot the relationship between a
dichotomous dependent variable and an independent variable, the relationship is curvilinear.
So, what approach should we use to model a relationship with a dichotomous dependent
variable? While the relationship between number of hours studied and exam results is nonlinear, it is still possible to fit it with an ordinary linear (straight-line) regression model (see
Chapter 13 for more detail on the linear regression model). This model indeed provides a good
fit as Figure 2 illustrates.
Nonetheless, it might be apparent to you that we should not use a linear regression model to fit
a relationship between a dichotomous dependent variable and an independent variable. For one
thing, and importantly, a linear regression model will predict probabilities of less than 0 or
greater than 1 as you move far enough down or up the X-axis. In our example, if we fit the data
with a linear regression model, the predicted probability of passing the final exam for those
who study 70 hours would be about 1.65. Such a probability is not logical or meaningful.
Remember, a probability of an event ranges from 0 to 1.
Figure 2 Probability of Passing Exam, p, on Hours of Study, X, with Fitted Linear and Logistic
Regression Models. (Based on data from Table 2).
8
Copyright © 2013 by Nelson Education Ltd.
p
1.2
1.1
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-0.1 0
-0.2
Logistic Regression Model
Linear Regression Model
10
20
30
40
50
X
A model called the logistic regression model, on the other hand, uses an S-shaped curve
(sometimes referred to as a sigmoid or logistic curve) to fit the relationship between a
dichotomous dependent variable and an independent variable. This curve tapers off at 0 on one
side and at 1 on the other and thus constrains predicted probabilities to the 0 to 1 range—
predicted probabilities are between 0 and 1.
This is evident in Figure 2, which compares the linear regression model (straight-line) and
logistic regression model (S-shaped curve) of the probability of passing the exam (Y = 1) on
hours of study, X. Arrows have been added to the models to show direction of prediction of
each model; points in the scattergram are actual observations. The logistic regression model
obviously provides a superior fit over the linear regression model when the relationship is Sshaped and the dependent variable restricted to a 0-1 range. This fit is precisely the reason why
we use logistic regression to examine a relationship with a dichotomous dependent variable.
Logistic Regression Model The logistic curve (logistic regression model) shown in Figure 2
is expressed mathematically in the following equation:
FORMULA 7
𝑒 π‘Ž+𝑏𝑋
𝑝 = 1+𝑒 π‘Ž+𝑏𝑋
where p = probability of Y = 1
e = base of the natural logarithm (approximately equal to 2.718)
a and b = parameters of the logistic model
Because of the curvilinear relationship between p and X, the value for b in this model (Formula
7) does not have the same clear-cut interpretation that the b value has in the ordinary linear
regression model.
The same problem also arises if we transform the dependent variable from a probability to an
odds. The relationship between odds and X is non-linear with the equation:
9
Copyright © 2013 by Nelson Education Ltd.
𝑝
FORMULA 8
1−𝑝
where
𝑝
1−𝑝
= 𝑒 π‘Ž+𝑏𝑋
= odds of Y = 1
e = base of the natural logarithm
a and b = parameters of the logistic model
There is a way, however, to “linearize” the logistic regression model (i.e., to covert it from an
S-shaped model to a straight-line model) so that the b coefficient in this model has the same
straightforward interpretation as it does in ordinary linear regression. We linearize the logistic
regression model by transforming the dependent variable from a probability to a logit (log
odds). This transformed logistic regression model is represented with the following equation:
𝑝
𝑙𝑛 (1−𝑝) = π‘Ž + 𝑏𝑋
FORMULA 9
𝑝
where 𝑙𝑛 (1−𝑝) = log odds (logit) of Y = 1
a and b = parameters of the logistic model
𝑝
The logit,𝑙𝑛 (1−𝑝), is therefore called a link function since it provides a linear transformation of
the logistic regression model. More specifically, the log transformation of the probability
values allows us to create a link with the ordinary linear regression model. Logistic regression
is thus considered a “generalized” linear model.
Continuing with our example data from Table 2, Figure 3 summarizes this transformation. Each
graph illustrates the shape of the relationship between the independent variable and dependent
variable, along with the equations that produce each graph. As we convert probabilities to
logits, the relationship transforms from an S-shaped curve to a straight line and the range of
values changes from 0 to 1 for probabilities to -∞ to +∞, negative infinity to positive infinity,
for logits.
10
Copyright © 2013 by Nelson Education Ltd.
Figure 3 Logistic Regression of Predicted Probability (Graph A), Predicted Odds (Graph B), and Predicted Logit (Graph C) of
Passing Exam on Hours of Study, X. (Models fitted from data in Table 2).
Graph A
Graph B
𝑒 π‘Ž+𝑏𝑋
𝑝
= 𝑒 π‘Ž+𝑏𝑋
1−𝑝
𝑝 = 1+𝑒 π‘Ž+𝑏𝑋
𝑙𝑛 (1−𝑝) = π‘Ž + 𝑏𝑋
1.0
40
5
0.9
35
4
0.8
3
30
2
0.7
25
0.5
0.4
1
Logit
0.6
Odds
probability
Graph C
𝑝
20
15
-2
0.3
10
-3
0.2
0.1
5
0.0
0
0
10
20
30
X
11
0
-1
40
50
-4
-5
0
10
20
30
40
50
X
Copyright © 2013 by Nelson Education Ltd.
0
10
20
30
X
40
50
Given that the relationship between X (independent variable) and the log odds or logit
𝑝
(dependent variable) is now linear (i. e. , 𝑙𝑛 (1−𝑝) = π‘Ž + 𝑏𝑋), the a and b coefficients in the
logistic regression model are interpreted following the same logic used when interpreting
ordinary linear regression coefficients, but taking into account that the quantity represented on
the left-hand side of the equation is the log odds (logit) of dependent variable and not the
dependent variable itself as it is in ordinary linear regression.
More specifically, as discussed in Chapter 13 of your textbook, the b, or slope coefficient, in
ordinary linear regression is interpreted as the amount of change predicted in the dependent
variable for a one-unit change in the independent variable—the slope is expressed in the units
of the dependent variable. In logistic regression, the slope coefficient, b, is interpreted in the
same way, except the dependent variable is measured in log odds, so b tells us the amount of
change in the predicted log odds for a one unit change in the independent variable. Likewise,
just as the intercept (or constant), a, in the ordinary linear regression model tells us the value of
Y when X is zero, the a coefficient in the logistic regression model tells us the value of the log
odds when X is zero.
Estimating the Logistic Regression Model
Since the logistic regression model has
been linearized, it may be tempting to use the least-squares method used in ordinary linear
regression (again, see Chapter 13) to obtain estimates of the a and b parameters in the logistic
regression model (Formula 9). Instead, for reasons we will not consider here, the parameters a
and b are commonly estimated (by many statistical software packages including SPSS) by a
method called “maximum likelihood estimation.”
Maximum likelihood estimation is a way of finding the smallest possible difference between
the observed and predicted values. It uses a complex iteration, or some other similar numerical,
process to find different solutions until it finds the one with the smallest possible difference. It
should be pointed out that in many situations there is little difference between the results of the
two estimation methods, and in certain cases, ordinary least-squares estimation and maximum
likelihood estimation produce the same estimates.
Assumptions of Logistic Regression
Logistic regression fortunately does not make
many of the assumptions of ordinary linear regression such as normality and homoscedasticity.
Nonetheless, a few key assumptions of logistic regression need to be met:
1. the dependent variable must be a dichotomous (dummy) variable with just two values. (Note
that a closely related method, called multinomial logistic regression, allows for more than two
categories on the dependent variable. Multinomial logistic regression can be thought of as
running a series of logistic regression models, where the number of models is equal to the
number of categories minus 1. We will not consider this method here).
2. the logit is linearly related to the independent variable. (Note, the independent and dependent
variables do not have to be linearly related; the independent variable must be related linearly to
only the log odds).
3. the observations must be independent from each other.
12
Copyright © 2013 by Nelson Education Ltd.
4. there should little multicollinearity—the independent variables should be independent from
each other.
A Second Look at Logistic Regression
Let’s take another look at logistic regression by considering a hypothetical example of a
university that received 900 applications for admission to its school. Table 3 below provides
information about the applicants on their overall high school average (HS average), X, and
whether or not they were offered admission to the university (admission), Y, coded as 1 for
“yes” and 0 for “no.” 5 As we did in our previous example, we again group the independent
variable into categories. 6 As a result, we calculate the proportion of applicants with a 1 or
“yes” (p of Y = 1) for each level of the independent variable, X, as shown in Table 3, as well as
the associated odds and log odds (logits) of Y = 1.
Table 3 High School Average, X, University Admission, Y, and Probability, Odds, and Log
Odds of Y = 1 for each Category of X.
Y
(admission)
n
X
(number of
0
1
(HS average) (no) (yes) observations)
1. <60
60
40
100
2. 60-64
59
41
100
3. 65-69
50
50
100
4. 70-74
37
63
100
5. 75-79
33
67
100
6. 80-84
21
79
100
7. 85-89
20
80
100
8. 90-94
10
90
100
9. 95+
8
92
100
p
(Y = 1)
0.40
0.41
0.50
0.63
0.67
0.79
0.80
0.90
0.92
Odds
(Y = 1)
0.67
0.69
1.00
1.70
2.03
3.76
4.00
9.00
11.50
Logit
(Y = 1)
-0.41
-0.36
0.00
0.53
0.71
1.32
1.39
2.20
2.44
The way the categories of a dependent variable are coded affects the direction and sign of the
logistic regression coefficients. The results will differ depending on whether admission is coded
as 1 for “yes” and 0 for “no” vs. coded as 1 for “no” and 0 for “yes.” As previously mentioned,
the value coded 1 on the dependent variable should represent the desired outcome or category
of “interest” to the researcher.
6
We again emphasize that it is just as appropriate to use logistic regression to analyze a
relationship using the actual raw ungrouped scores of the independent variable. In our
hypothetical example, the raw scores (not shown here) would be the applicants’ actual HS
grade-point average. The independent variable is grouped into categories in this example
simply for ease of demonstration.
5
13
Copyright © 2013 by Nelson Education Ltd.
Since the dependent variable is a dichotomous dummy variable, we will use logistic regression
to examine the relationship between HS average and admission. Of course the most practical
way to conduct a logistic regression is to use specialized software such as SPSS. Thus, Table 4
provides an excerpt of the output for a logistic regression on the data in Table 3. The SPSS
output contains more information than shown here, though for our purposes the most important
statistics are provided in this table. 7
Table 4 SPSS Logistic Regression Output of Admission on HS Average, X.
B
X (HS average) .367
Constant
-.986
S.E.
.033
Wald
123.284
df
1
Sig.
.000
Exp(B)
1.443
.160
37.958
1
.000
.373
Log Odds Ratio
Recall that the transformed logistic regression model is expressed in the
𝑝
equation: 𝑙𝑛 (1−𝑝) = π‘Ž + 𝑏𝑋. Using the results in Table 4, we can write the equation for our
𝑝
example data as 𝑙𝑛 (1−𝑝) = −.986 + .367𝑋. (SPSS puts the a and b logistic regression
coefficients in the column labeled “B”).
Just like ordinary linear regression, we can use the logistic regression model to make
predictions of the dependent variable—to predict log odds of the dependent variable having a
value of 1. Furthermore, these logistic regression coefficients are interpreted in the same way as
the regression coefficients in ordinary linear regression, except that they are measured in units
of log-odds (on the logit scale).
So, the constant or intercept a is the point on the Y-axis crossed by the regression line when X is
0. In the present example, the coefficient for the constant, -0.986, is the predicted log odds of
being offered admission to the university when an applicant has a HS average of zero. Since the
a coefficient often has very little practical interpretation, we will not consider it further.
Much more useful is the slope coefficient b. We interpret the slope b as the amount change in
the predicted log odds that Y = 1 for each one-unit change in X. Thus, b is the log of the odds
ratio since it compares the log odds for one unit to the next unit of the independent variable.
In the SPSS logistic regression command, the dichotomous dependent variable is regressed
on an independent variable, which is treated the same way as it is in ordinary linear regression
—an interval-ratio independent variable is entered directly into the logistic regression analysis
while nominal and ordinal independent variables are treated as dummy variables and entered
together as a group into the analysis. A notable difference is that the SPSS logistic regression
command will create dummy variables for you while—as we saw in the “Linear Regression
with Dummy Variables” chapter—you must first manually create dummy variables using the
SPSS recode command before you run the SPSS ordinary linear regression command.
7
14
Copyright © 2013 by Nelson Education Ltd.
In our example, we interpret the slope coefficient, 0.367, as the amount of change in the
predicted log odds of admission to the university when HS average is increased by one unit. So,
as we move-up one category to the next on the HS average scale—for example from category 4
(i.e., HS average of 70 to 74; see Table 3) to category 5 (75 to 79)—the predicted log odds of
admission increase by 0.367. 8 Again, since we are comparing the log odds for one unit to the
next of the independent variable the slope coefficient b is the log odds ratio.
Next, we will want to know if the coefficient is statistically significant. The “Wald” statistic is
used to test the significance of the coefficients in the logistic regression model, and is
calculated as follows:
FORMULA 10
Wald statistic = (
π‘π‘œπ‘’π‘“π‘“π‘–π‘π‘–π‘’π‘›π‘‘
2
)
𝑆𝐸 π‘œπ‘“ π‘π‘œπ‘’π‘“π‘–π‘π‘–π‘’π‘›π‘‘
where SE = Standard Error
So, for the b coefficient in our example:
0.367 2
Wald statistic = (
) = 123
0.033
Each Wald statistic is compared with a chi-square distribution with 1 degree of freedom, df,
though it is more convenient to assess the Wald statistic by looking at the significance value in
the column labelled “Sig.” as shown in Table 4. If the value is less than 0.05, we reject the null
hypothesis that the variable does not make a statistically significant contribution. The sig. value
of .000 for HS average in Table 4 tells us the probability that the observed relationship between
HS average and admission occurred by chance alone is very low.
Odds Ratios (OR) Of course, the b coefficient is not very intuitive as it is in units of log
odds. The usual method of interpreting b in logistic regression analysis is to take its antilog, the
odds ratio, since it is easier to understand effects on the “odds” scale than the “log odds” scale.
(See the discussion at the beginning of this chapter for more information on the odds ratio). By
getting rid of the log, the logit becomes an odds. What is more, as we will see later, while the
log odds ratio, b, gives the additive (i.e., linear) effect on the logit, the odds ratio gives the
multiplicative effect on the odds.
We get the odds ratio by exponentiating b—the odds ratio is the exponent of the slope, or e
raised to the exponent (power) of b. It is often written as exp(b) or eb. Recall that e, the base of
the natural logarithm, is equal to approximately 2.718.
So, in our example where b = 0.367, eb is:
eb = 2.7180.367 = 1.443
Hypothetically speaking, had the slope coefficient b been a negative value, -0.367, we would
say that the predicted log odds of admission decrease by 0.367. Thus, when b>0 the log odds
increase and decrease when b<0.
8
15
Copyright © 2013 by Nelson Education Ltd.
That is, the odds ratio (OR) is 1.443.
You can find the value of e raised to an exponent using most standard calculators. The e
function key typically looks like e* or ex on scientific calculators. To find this value, first enter
your exponent in the calculator, then press the e* or ex key. However, SPSS conveniently
reports this value, the odds ratio, in the column in labelled "Exp(B).” (See Table 4).
By converting the b coefficient to an odds ratio (OR) we find out how much the predicted odds
of Y =1 (the dependent variable having a value of 1) change multiplicatively with a one-unit
change in the independent variable, X. In other words, the odds ratio is simply the ratio of the
odds at any two consecutive values of the independent variable. We can write this statement as:
OR = ratio the odds of Y = 1 at X +1 and odds of Y = 1 at X, for all values of X
In our example, the odds ratio is 1.443, which we will round off to 1.4 for convenience.
Therefore, the predicted odds of admission to the university for applicants with a HS average of
60 to 64 are 1.4 times as large as the predicted odds for applicants with a HS average of <60;
the predicted odds for applicants with a HS average of 65 to 69 are 1.4 times as large as the
predicted odds for applicants with a HS average of 60 to 64; the predicted odds for applicants
with a HS average of 70 to 74 are 1.4 times as large as the predicted odds for applicants with a
HS average of 65 to 69; and so on. Hence, the predicted odds increase in a multiplicative
fashion—they are 1.4 times as large—with each unit increase in the independent variable. 9 10
To further illustrate how a change in a unit of X multiplies the predicted odds by 1.4, consider
the respective predicted odds of admission for an applicant with a HS average of <60
(category 1 in Table 3), 60-64 (category 2), 65-69 (category 3), and 70-74 (category 4) (note
that to calculate predicted odds we use Formula 8, odds = 𝑒 π‘Ž+𝑏𝑋 ):
π‘ƒπ‘Ÿπ‘’π‘‘π‘–π‘π‘‘π‘’π‘‘ 𝑂𝑑𝑑𝑠 π‘“π‘œπ‘Ÿ πΆπ‘Žπ‘‘π‘’π‘”π‘œπ‘Ÿπ‘¦ 1 = 𝑒 −0.986+0.367(1) = 2.718−0.619 = 0.538
π‘ƒπ‘Ÿπ‘’π‘‘π‘–π‘π‘‘π‘’π‘‘ 𝑂𝑑𝑑𝑠 π‘“π‘œπ‘Ÿ πΆπ‘Žπ‘‘π‘’π‘”π‘œπ‘Ÿπ‘¦ 2 = 𝑒 −0.986+0.367(2) = 2.718−0.252 = 0.777
9
If you prefer, you can subtract 1 from the odds ratio and multiply by 100 to get the percent
change in odds (i.e., for a one-unit increase in X, we expect to see a % increase in the odds of
the dependent variable). So, in our example, we can say either “the predicted odds of admission
are about 1.4 times higher with each unit increase in HS average” or “the predicted odds of
admission are about 40%— (1.4-1.0)*100 = 40% —higher with each unit increase in HS
average.” These descriptions are equal, so is it your choice which style you use to report the
results.
10
The odds ratio, eb, has a simpler interpretation in the case of a nominal or ordinal (i.e.,
dummy) independent variable. In this case the odds ratio represents the odds of having the
characteristic of interest (Y = 1) for one category compared with the other. Let’s say, for
instance, that we are interested in the relationship between sex, X, and admission, Y, where sex
is coded as 1 for “female” and 0 for “male.” If eb = 1.09, we would interpret this value as: the
odds of admission are 1.09 times (or 9%) greater for a female applicant than for an applicant
who is male.
16
Copyright © 2013 by Nelson Education Ltd.
π‘ƒπ‘Ÿπ‘’π‘‘π‘–π‘π‘‘π‘’π‘‘ 𝑂𝑑𝑑𝑠 π‘“π‘œπ‘Ÿ πΆπ‘Žπ‘‘π‘’π‘”π‘œπ‘Ÿπ‘¦ 3 = 𝑒 −0.986+0.367(3) = 2.7180.115 = 1.122
π‘ƒπ‘Ÿπ‘’π‘‘π‘–π‘π‘‘π‘’π‘‘ 𝑂𝑑𝑑𝑠 π‘“π‘œπ‘Ÿ πΆπ‘Žπ‘‘π‘’π‘”π‘œπ‘Ÿπ‘¦ 4 = 𝑒 −0.986+0.367(4) = 2.7180.482 = 1.619
So, the predicted odds of admission for those in category 2 (HS average of 60 to 64) and
category 1 (HS average of <60) are 0.777 and 0.538 respectively, giving an odds ratio of1.4.
Recall, from Formula 3, the odds ratio (OR) is:
𝑂𝑅 =
π‘œπ‘‘π‘‘π‘ 2 0.777
=
= 1.4
π‘œπ‘‘π‘‘π‘ 1 0.538
Next, the predicted odds of admission for those in category 3 (HS average of 65 to 69) and
category 2 (HS average of 60 to 64) are 1.122 and 0.777 respectively, giving an odds ratio of
1.4 :
𝑂𝑅 =
π‘œπ‘‘π‘‘π‘ 3 1.122
=
= 1.4
π‘œπ‘‘π‘‘π‘ 2 0.777
Next, the predicted odds of admission for those in category 4 (HS average of 70 to 74) and
category 3 (HS average of 65 to 69) are 1.619 and 1.122 respectively, giving an odds ratio
of1.4:
𝑂𝑅 =
π‘œπ‘‘π‘‘π‘ 4 1.619
=
= 1.4
π‘œπ‘‘π‘‘π‘ 3 1.122
and so on. Therefore, an odds ratio of 1.4 means that the odds increase by 1.4 no matter the
values of the independent variable—going from any value of the independent to the subsequent
value of the independent will always increase the odds by a factor of 1.4.
This multiplicative effect, such that the ratio of any odds to the odds in the next category is
constant, is graphically illustrated in Figure 4. The figure shows the logistic regression of
predicted odds of admission on HS average. The model (π‘œπ‘‘π‘‘π‘  = 𝑒 −0.986+0.367𝑋 ) is fitted from
data in Table 3—the predicted odds are marked with the symbol *, with the logistic regression
model superimposed through them. For comparative purposes, the observed odds (β–‘) taken
from Table 3 are also plotted. Importantly, note that the general shape of the relationships
between the independent variable and dependent variable is the same in both Figure 4 and
Graph B in Figure 3. Both graphs show the change in odds on a multiplicative scale.
17
Copyright © 2013 by Nelson Education Ltd.
Figure 4 Logistic Regression of Predicted Odds of Admission on HS Average, X. (Model fitted
from data in Table 3).
π‘œπ‘‘π‘‘π‘  = 𝑒 −0.986+0.367𝑋
15
12
odds
9
6
3
0
0
2
4
6
8
10
X
Observed Odds
Predicted Odds
As a final comment, note the following relationships between b, eb (odds ratio), odds, and
probability:
1. (b =0) = (eb =1) = odds and probability are the same at all X levels
2. (b <0) = (eb <1) = odds and probability decrease as X increases
3. (b >0) = (eb >1) = odds and probability increase as X increases
Back to Probabilities
In logistic regression, we can interpret the results not only in
terms of predicted log odds or odds of Y = 1, but also in terms of predicted probabilities of Y =
1. In the previous section we saw how to transform the logit back to an odds using Formula 8
(π‘œπ‘‘π‘‘π‘  = 𝑒 π‘Ž+𝑏𝑋 ). Once we have converted from log odds to odds, we can easily convert the
𝑒 π‘Ž+𝑏𝑋
odds back to probabilities using Formula 7 (𝑝 = 1+𝑒 π‘Ž+𝑏𝑋 ).
Let’s consider the following example for X = 1 in the present example (i.e., category 1 in Table
3 or an applicant with a HS average of <60):
𝑝
the predicted log odds would be: 𝑙𝑛 (1−𝑝) = −.986 + .367(1) = −.986 + .367 = -0.619;
the corresponding predicted odds would be: π‘œπ‘‘π‘‘π‘  = 𝑒 π‘Ž+𝑏𝑋 = 2.718−0.986+0.367(1) = 0.538;
𝑒 π‘Ž+𝑏𝑋
and finally, the corresponding predicted probability would be: 𝑝 = 1+𝑒 π‘Ž+𝑏𝑋 =
0.538
1+0.538
=0.350
If you perform this conversion to probabilities throughout the range of X values, you go from
the straight line of the logit to the S-shaped curve of p as revealed in Figure 5. Again notice the
18
Copyright © 2013 by Nelson Education Ltd.
general shape of the relationships between the independent variable and logit is the same in
both the top graph of Figure 5 and Graph C in Figure 3—both graphs are linear—and that the
general shape of the relationships between the independent variable and probability is the same
in both the bottom graph of Figure 5 and Graph A in Figure 3—both graphs are S-shaped.
We should finally point out that the model provides a “good fit.” For example, the predicted
probability for X = 1 of 0.35 is fairly close in value to the observed probability of 0.40.
Comparison of the observed to predicted values from the fitted models in Figure 5 suggests that
the models provide an overall good fit.
Figure 5 Logistic Regression of Predicted Logit and Predicted Probability of Admission on HS
Average, X. (Models fitted from data in Table 3)
𝑝
𝑙𝑛 (1−𝑝) = π‘Ž + 𝑏𝑋
2.5
2.0
1.5
logit
1.0
0.5
0.0
-0.5
-1.0
-1.5
0
2
4
X
6
Observed Logits
8
10
Predicted Logits
𝑒 π‘Ž+𝑏𝑋
𝑝 = 1+𝑒 π‘Ž+𝑏𝑋
1.0
0.9
probability
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0
2
4
6
8
10
X
Observed Probabilities
19
Predicted Probabilities
Copyright © 2013 by Nelson Education Ltd.