8 Categorical/Limited Dependent Variables and Logistic Regression

advertisement
8-1
8 Categorical/Limited Dependent
Variables and Logistic Regression
Reading:
Kennedy, P. “A Guide to Econometrics” chapter 15
Field, A. “Discovering Statistics”, chapter 5.
For a more comprehensive treatment of this topic, you may want to consider
purchasing: Long, J. S.(1997) “Regression models for Categorical and Limited
Dependent Variables”, Sage: Thousand Oaks California.
Aim:
The aim of this section is to consider the appropriate modelling technique to use when
the dependent variable is dichotomous.
Objectives:
By the end of this chapter, students should be aware of the problems of applying OLS
to situations where the dependent variable is dichotomous. They should know the
appropriate statistical model to turn to for different types of categorical dependent
variable and be able to understand the intuition behind logit modeling. They should
be able to run logit in SPSS and be able to interpret the output from a logit regression
using the proportionate change in odds method.
Plan:
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
8.9
8.10
8.11
8.12
Introduction .................................................................................................... 8-1
Choosing the Appropriate Statistical Models: ............................................... 8-2
Linear Probability Model ............................................................................... 8-4
A more appropriate functional form: ............................................................. 8-5
More than one explanatory variable: ............................................................. 8-8
Goodness of fit ............................................................................................... 8-9
Estimation of the logistic model: ................................................................. 8-11
Interpreting Output....................................................................................... 8-11
Multivariate Logit: ....................................................................................... 8-15
Odds ............................................................................................................. 8-15
Proportionate Change in Odds ..................................................................... 8-15
Interpreting Exp(B) ...................................................................................... 8-16
8.1 Introduction
The linear regression model that has formed the main focus of Module II, is the most
commonly used statistical tool in the social sciences. It has many advantages (hence
its popularity) but it has a number of limiting assumptions. One is that that the
dependent variable is an uncensored “scale numeric” variable. In other words, it is
presumed that the data on the dependent variable in your model is continuous and has
been measured for all cases in the sample.
8-2
In many situations of interest to social scientists, however, the dependent variable is
not continuous or measured for all cases. The table below offers a list based on Long
(1997, p. 1-3) of different types of dependent variable, typical social science examples
and the appropriate estimation method. Its worth spending some time working
through the table, particularly if you are seeking to apply regression analysis to data
you have collected yourself or as part of a research project. Consider which category
most appropriately describes your data. If the correct estimation technique is neither
regression nor dichotomous logit or probit, then you will need to refer to the reading
suggested at the start of this chapter since this text only covers regression and logit.
Don’t panic, however: once you have learnt to understand the output from OLS and
logit, the results from most of the other methods should not be too difficult to interpret
with a little extra background reading.
Note though, that to some extent, the categorisation of variables given in the table
hides the fact that the level of measurement of a variable is often ambiguous:
“...statements about levels of measurement of a [variable] cannot be sensibly made in
isolation from the theoretical and substantive context in which the [variable] is to be
used” (Carter, 1971, p.12, quoted in Long 1997, p. 2)
Education, for example, could be measured as a binary variable:
1 if only attained High School or less,
0 if other.
as an ordinal variable:
6 if has PhD,
5 if has Masters,
4 if has Degree,
3 if has Highers,
2 if has Standard Grades,
1 if no qualifications
or as a count variable
number of school years completed
8.2 Choosing the Appropriate Statistical Models:
If we choose a model that assumes a level of measurement of the dependent variable
different to that of our data, then the our estimates may be biased, inefficient or
simply inappropriate. For example, if we apply standard OLS to dependent variables
that fall into any of the above categories of data, it will assume that the variable is
unbounded and continuous and construct a line of best fit accordingly. We shall look
at this case in more detail below and then focus on applying the logit model to binary
dependent variables.
More elaborate techniques are needed to estimate other types of limited dependent
variables (see table) which are beyond the scope of this course (and of SPSS).
Nevertheless, students that gain a good grasp of logit models will find that many of
the same issues and methods crop up in the more advanced techniques.
8-3
Table 1 Categorising and Modelling Models with Limited Dependent Variables
Type of
dependent
variable
Binary variables:
made up of two
categories
Examples
coded 1 if event has occurred, 0 if not.
It has to be a decision or a category that can be explained
by other variables (I.e. male/female is not something
amenable to social scientific explanation -- it is not
usually a dependent variable):
 Did the person vote or not?
 Did the person take out MPPI or not?
 Does the person own their own home or not?
Appropriate
Estimation
technique
If the Dependent
variable is Binary then
Estimate using: binary
logit (also called logistic
regression) or probit
Ordinal variables:
made up of
categories that can
be ranked (ordinal
= “has an inherent
order”)
e.g. coded 4 if strongly agree, 3 if agree, 2 if disagree,
and 1 if strongly disagree.
e.g. coded 4 if often, 3 occasionally, 2 if seldom, 1 if
never
e.g. coded 3 if radical, 2 if liberal, 1if conservative
e.g. coded 6 if has PhD, 5 if has Masters, 4 if has
Degree, 3 if has Highers, 2 if has Standard Grades, 1 if
no qualifications
If the Dependent
variable is Ordinal then
Estimate using:
ordered logit or ordered
probit
Nominal
variables: made
up of multiple
outcomes that
cannot be ordered
Count variables:
indicates the
number of times
that an event has
occurred.
e.g. Marital status: single, married, divorced, widowed
e.g. mode of transport: car, van, bus, train, bycycle
If the Dependent
variable is Nominal then
Estimate using:
multinomial logit
e.g. how many times has a person been married
e.g. how often times did a person visit the doctor last
year?
e.g. how many strikes occurred?
e.g. how many articles has an academic published?
e.g. how many years of education has a person
completed?
If the Dependent
variable is a Count
variable Estimate using:
Poisson or negative
binomial regression
Censored
Variables: occur
when the value of
a variable is
unkown over a
certain range of
the variable
e.g. variables measuring %: censored below at zero and
above at 100.
e.g. hourly wage rates: censored below by minimum
wage rate.
If the Dependent
variable is Censored,
Estimate using: Tobit
Grouped Data:
occurs when we
have apparently
ordered data but
where the
threshold values
for categories are
known:
e.g. a survey of incomes, which is coded as follows:
= 1 if income < 5,000,
= 2 if 5,000  income < 7,000,
= 3 if 7,000  income < 10,000,
= 4 if 10,000  income < 15,000,
= 5 if income  15,000
If the Dependent
variable is Censored,
Estimate using:
Grouped Tobit (e.g.
LIMDEP)
8-4
8.3 Linear Probability Model
What happens if we try to fit a line of best fit to a regression where the dependent
variable is binary? This is the situation depicted in the scatter plot below (Figure 1) of
a dichotomous dependent variable y on a continuous explanatory variable x. The
advantage of this kind of model (called the Linear Probability Model) is that
interpretation of coefficients is straightforward because they are essentially
interpreted in the same way as in linear regression. For example, if the data on the
dependent variable records labour force participation (1 if in the labour force, 0
otherwise), then we could think of our regression as being a model of the probability
of labour force participation. In this case, a coefficient on x of 0.4,
yhat = 1.2 + 0.4x
would imply that the predicted probability of labour force participation increases by
0.4 for every unit increase in x, holding all other variables constant.
Figure 1 Scatter Plot and Line of Best Fit of a Linear Probability Model
y
1
0
x
y
8.3.1 Disadvantages:
As tempting as it is to apply the linear probability model (LPM) to dichotomous data,
its problems are so severe and so fundamental that it should only be used in practice
for comparison purposes.
Heteroscedasticity:
The first major problem with the LPM is that the error term will tend to be larger for
middle values of x. As a result, the OLS estimates are inefficient and standard errors
are biased, resulting in incorrect t-statistics.
8-5
Non-normal errors:
Second, the errors will not be normally distributed (but note that normality not
required for OLS to be BLUE).
Nonsensical Predictions:
The most serious problem is that the underlying functional form is incorrect and so
the predicted values can be < 0, or > 1. This can be seen in Figure 1 by the fact that
there are sections of the line of best fit that lie above y = 1 or below y = 0. This is of
course meaningless if we are interpreting the predicted values as measures of the
probability of an observation falling into the y = 1 category since probabilities can
only take on values between zero and one.
8.4 A more appropriate functional form:
What kind of model/transformation of our data could be used to represent this kind of
relationship? If you think about it, what we require is a functional form that will
pruduce a line of best fit that is “s” shaped: coverges to zero at one end and converges
to 1 at the other end. At first thought, one might suggest a cubic transformation of the
data, but cubic transformations are ruled out because they are unbounded (i.e. there
would be nothing to stop such a model producing predicted values greater than one or
less than zero). Note also that we may well have more than one explanatory variable,
so we need a model that can be applied to the whole right hand side of the equation,
b0 + b1x1 + b2x2 + b3x3,
and still result in predicted values for y that range between 0 and 1.
Figure 2 A more appropriate line of best fit
y
1
0
x
y
8-6
8.4.1 Logistic transformation:
One appropriate (and popular) transformation that fits these requirements is the logit
(also called the logistic) transformation which is simply the exponent of x divided by
one plus the exponent of x:
exp( x)
1  exp( x)
If we have a constant term and more than more than one explanatory variable, then
the logit would be:
exp(b0  b1 x1  b2 x2  b3 x3 )
1  exp(b0  b1 x1  b2 x2  b3 x3 )
An example of the logit transformation of a variable x is given in the table below. To
take the exponent of a number, you have to raise the value of e (which, like  =
3.1415927…, is one of those very special numbers that keeps cropping up in
mathematics and which has an infinite number of decimal places: e = 2.7182818….),
by that number. So, suppose you want to take the exponent of 2, you raise e to the
power of 2, which is approximately the same as raising 2.7 to the power of 2,
exp(2) = e2 =  2.72 = 7.3.
If you want to take the logit of 2, you would divide exp(2) by one plus exp(2):
exp( 2)
e2
7.3
logit(2) =
= 7.3/8.3 = 0.88


2
1  exp( 2) 1  e
1  7.3
If you wanted to take the exponent of a negative number, such as –3, you raise e to the
power of –3, which is approximately equivalent to raising 2.7 to the power of –3. But
because the number you’ve chosen is negative, it has the effect of inverting (i.e.
making it into the denominator of a fraction with 1 as the numerator) the figure you
want to raise to a power (e.g. 2-1 = ½) and then raising the denominator to the power
of absolute value of the number (e.g. 2-2 = 1/(22) = ¼ ). So,
1
1
1


exp(-3) = e-3  2.7-3 =
= 0.05
3
2.7 * 2.7 * 2.7 19.7
2.7
The logit of –3 would be:
exp( 3)
e 3
0.05
logit(-3) =
= 0.05/1.05 = 0.05


3
1  exp( 3) 1  e
1  0.05
Table 2 Logits for particular values of x
x exp(x) 1+exp(x) logit[x]
-3
0.05
1.05
0.05
-2
0.14
1.14
0.12
-1
0.37
1.37
0.27
0
1.00
2.00
0.50
1
2.72
3.72
0.73
2
7.39
8.39
0.88
3 20.09
21.09
0.95
8-7
The values of logit(x) listed in the above table are plotted against x in Figure 3. A
more complete picture of the logit function in given in following figure (Figure 4)
which has been plotted for much smaller intervals and for a much wider range of
values of x. Both figures demonstrate that the logit transformation does indeed
produce the desired “s” shaped outcome.
Figure 3 Logit(x) Plotted for 7 values of x
logistic[x]
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
-3
-2
-1
0
1
2
3
Figure 4 The Logit Curve
logistic[x ]
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
x
3.5
3.3
3.1
2.9
2.7
2.5
2.3
2.1
1.9
1.7
1.5
1.3
1.1
0.9
0.7
0.5
0.3
0.1
-0.1
-0.3
-0.5
-0.7
-0.9
-1.1
-1.3
-1.5
-1.7
-1.9
-2.1
-2.3
-2.5
-2.7
-2.9
-3.1
-3.3
-3.5
0
8-8
8.5 More than one explanatory variable:
Suppose we have more than one explanatory variable, how do we apply the logit
transformation? Consider the case where we have the following linear function of
explanatory variables:
230 – 4x1 + 7x2 +8x3
The logit transformation of this function would be:
logit(230 – 4x1 + 7x2 +8x3)
which is simply the logit of the value of the linear function for a particular set values
for x1, x2 and x3. For example, if x1 = 0, x2 = -0.7, and x3 = -31, then
230 – 4x1 + 7x2 +8x3 = -22.9
and the logit of the function for this set of values would be,
logit(230 – 4x1 + 7x2 +8x3) = logit(-22.9) = 0.0000000001134 = 1.134E-10
In Table 3 the logit transformations of the above function are listed for a variety of
values of the three variables. A full set of logit transformations for this function are
then plotted in Figure 5. It can be seen that, when we take a linear transformation of a
whole function, rather than just an individual variable, we still end up with our desired
“s” shaped curve that lies between y = 0 and y = 1.
Table 4 Logit transformation of a linear function
230 - 4x1 + 7x2 + 8x3
b0 + b1x1 + b2x2 + b3x3 =
b0
x1
230
230
230
230
230
230
230
230
230
230
230
230
230
230
230
230
230
230
x2
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-0.7
-0.65
-0.6
-0.55
-0.5
-0.45
-0.4
-0.35
-0.3
-0.25
-0.2
-0.15
-0.1
-0.05
9.71445E-17
0.05
0.1
0.15
x3
-31
-30.5
-30
-29.5
-29
-28.5
-28
-27.5
-27
-26.5
-26
-25.5
-25
-24.5
-24
-23.5
-23
-22.5
b1x1 + b2x2 + b3x3
-22.9
-22.55
-22.2
-21.85
-21.5
-21.15
-20.8
-20.45
-20.1
-19.75
-19.4
-19.05
-18.7
-18.35
-18
-17.65
-17.3
-16.95
Logistic()
1.13411E-10
1.60938E-10
2.28382E-10
3.2409E-10
4.59906E-10
6.52637E-10
9.26136E-10
1.31425E-09
1.86501E-09
2.64657E-09
3.75567E-09
5.32954E-09
7.56298E-09
1.07324E-08
1.523E-08
2.16124E-08
3.06694E-08
4.3522E-08
8-9
Figure 5 Logit curve for a linear function
L ogistic(b 0 + b 1 x 1 + b 2 x 2 + b 3 x 3 )
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
26.1
24.7
23.3
21.9
20.5
19.1
17.7
16.3
14.9
13.5
12.1
9.3
10.7
7.9
6.5
5.1
3.7
2.3
0.9
-0.5
-1.9
-3.3
-4.7
-6.1
-7.5
-10
-8.9
-12
-13
-15
-16
-17
-19
-20
-22
-23
0
b 0 + b 1x 1 + b 2x 2 + b 3x 3
8.6 Goodness of fit
The discussion so far would suggest that the logit transformation might be an ideal
candidate for fitting a line of best fit to models of dichotomous dependent variables. If
actual values of y in Figure 6 were observed for a wide range of the possible values of
x, then this plot wouldn’t be a very good line of best fit. Values of b0 + b1x1 + b2x2 +
b3x3 that are less than -4 or greater than 4 have very little effect on the probability.
Yet most of the values of x lie outside the -4, 4 range. Perhaps if we alter the
estimated values of bk then we might improve our line of best fit.
Figure 6 Logit Line of Best Fit
Logistic(b 0 + b 1 x 1 + b 2 x 2 + b 3 x 3 )
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
26.1
24.7
23.3
21.9
20.5
19.1
17.7
16.3
14.9
13.5
12.1
9.3
10.7
7.9
6.5
5.1
3.7
2.3
0.9
-0.5
-1.9
-3.3
-4.7
-6.1
-7.5
-10
-8.9
-12
-13
-15
-16
-17
-19
-20
-22
-23
0
b 0 + b 1x 1 + b 2x 2 + b 3x 3
Suppose we try the following set of values for the coefficients:
b0 = 22, b1 = -0.4, b2 = 0.5 and b3 = 0.98:
Table 5 displays the logit values of the revised function which are plotted in Figure 7.
You can see that the logit curve is now much less steep and so will produce more
8-10
middle range values of the predicted probabilities (the vertical axis) for a wider range
of the right hand side function (horizontal axis).
Table 5 Effect of Modifying the Coefficients
-31
-30.5
-30
-29.5
-29
-28.5
-28
-27.5
-27
-26.5
-26
-25.5
-25
-24.5
-24
-23.5
-23
-22.5
-22
-21.5
-0.7
-0.65
-0.6
-0.55
-0.5
-0.45
-0.4
-0.35
-0.3
-0.25
-0.2
-0.15
-0.1
-0.05
9.71E-17
0.05
0.1
0.15
0.2
0.25
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
22
22
22
22
22
22
22
22
22
22
22
22
22
22
22
22
22
22
22
22
22 -0.4x1 + 0.5x2 + 0.98x3
-8.73
-8.615
-8.5
-8.385
-8.27
-8.155
-8.04
-7.925
-7.81
-7.695
-7.58
-7.465
-7.35
-7.235
-7.12
-7.005
-6.89
-6.775
-6.66
-6.545
x3
x2
x1
b0
Logistic()
0.000162
0.000181
0.000203
0.000228
0.000256
0.000287
0.000322
0.000361
0.000405
0.000455
0.00051
0.000572
0.000642
0.00072
0.000808
0.000907
0.001017
0.001141
0.00128
0.001435
Figure 7 Effect of Modifying the Coefficients
L ogistic(b 0 + b 1 x 1 + b 2 x 2 + b 3 x 3 )
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
22 - 0.4x 1 + 0.5x 2 + 0.98 x 3
7.37
6.91
6.45
5.99
5.53
5.07
4.61
4.15
3.69
3.23
2.77
2.31
1.85
1.39
0.93
0.47
-0.5
0.01
-0.9
-1.4
-1.8
-2.3
-2.8
-3.2
-3.7
-4.1
-4.6
-5.1
-6
-5.5
-6.4
-6.9
-7.4
-7.8
-8.3
-8.7
0
8-11
8.7 Estimation of the logistic model:
The above discussion leads naturally to a probability model of the form:
Pr( y  1 | xk ) 
exp(b0  b1 x1  b2 x2  ...  bK xK )
1  exp(b0  b1 x1  b2 x2  ...  bK xK )
We now need to find a way of estimating values of bk that will best fit the data.
Unfortunately, OLS cannot be applied since the above model is non-linear in
parameters, so an alternative has to be found. The method used to estimate logit is
called “maximum likelihood”. The intuition behind this technique is as follows: first
start by saying, for a given set of parameter values, what is the probability of
observing the current sample. Then try various values of the parameters to arrive at
estimates of the parameters that makes the observed data most likely. This is rather
like the numerical example above where we modified the values of the bs in such a
way that it improved the fit of the model to the data; with ML you are concerned with
the fit of the data to the model.
8.8 Interpreting Output
Because logit regression is fundamentally non-linear, interpretation of output can be
difficult. Many published studies that use logit overlook this fact and they either
interpret magnitude of coefficients incorrectly (mistakenly treating them like OLS
coefficients) or they only pass comment on the signs (i.e. whether positive or
negative) of coefficients.
The difference between the interpretation of coefficients from OLS and logit is best
seen by comparing the different impact of increasing logit coefficients by 1. In OLS,
the effect would simply be to increase the slope of the (straight) line of best fit by 1
(i.e. make the line steeper). This means that for any increase in the relevant
explanatory variable, will now increase the dependent variable that much more, but
the effect is the same for all values of the x variable.
This is not so in logit estimation. Continuing our numerical example from above,
Figure 8 shows the effect of increasing b2 by 1. You can see that the change in the
value of y (the vertical axis) varies considerably depending on the value of the
horizontal axis (measuring the total linear function). For very low or very high values
of the horizontal axis, the impact is trivial, but much more substantial for middle
values of the horizontal axis.
Perhaps even more surprising is the impact of changing the value of the constant term.
In OLS, this will just cause a vertical shift in the line and so have an identical effect
for all values of the explanatory variable concerned, whereas in the logit model, a unit
increase in the constant term will have a varied impact for different values of the
dependent variable.
8-12
Figure 8 Impact of increasing b2 by 1
Logistic(b 0 + b 1 x 1 + b 2 x 2 + b 3 x 3 )
1
0.9
0.8
0.7
0.6
b2 = 0.5
b2 = 1.5
0.5
0.4
0.3
0.2
0.1
0
.
-8
73
.
-8
16
.
-7
58
.
-7
01
.
-6
43
.
-5
86
.
-5
28
.
-4
71
.
-4
13
.
-3
56
.
-2
98
.
-2
41
.
-1
83
.
-1
26
.
-0
68
.
-0
11
0.
47
1.
04
5
1.
62
2.
19
5
2.
77
3.
34
5
3.
92
4.
49
5
5.
07
5.
64
5
6.
22
6.
79
5
7.
37
Figure 9 Impact of increasing b0 by 1
Logistic(b 0 + b 1 x 1 + b 2 x 2 + b 3 x 3 )
1
0.9
0.8
0.7
0.6
b0 = 22
b0 = 23
0.5
0.4
0.3
0.2
0.1
5
37
79
7.
6.
5
22
64
6.
5.
5
07
49
5.
4.
5
92
34
3.
3.
5
77
19
2.
2.
5
62
1.
47
11
68
26
83
41
98
56
13
71
28
86
43
01
58
16
04
1.
0.
-0
.
-0
.
-1
.
-1
.
-2
.
-2
.
-3
.
-4
.
-4
.
-5
.
-5
.
-6
.
-7
.
-7
.
-8
.
-8
.
73
0
Lets look at another example. Consider the impact of number of children on the take
up of mortgage payment protection. We might expect that families with more
children have greater outlays relative to earnings and may therefore be less likely to
consider MPPI to be affordable. If we run a logit regression model of MPPI take-up
on children, we obtain the following output:
8-13
Table 6 Logit Output from MPPI Regression
--------- Variables in the Equation -----Variable
B
S.E.
Wald df
Sig
CHILDREN
-.0446
.0935
.2278
1 .6331
Constant -1.0711
.1143 87.8056
1 .0000
If we use the estimated coefficients to plot the predicted probability of taking out
MPPI given a wide range of values for x we get the following list of predicted
probabilities (last column of Table 7 – plotted in Figure 10 against number of
children).
Table 7 Predicted Probability of MPPI Take-up
b0
-1.0711
-1.0711
-1.0711
-1.0711
-1.0711
-1.0711
-1.0711
-1.0711
-1.0711
-1.0711
-1.0711
-1.0711
-1.0711
-1.0711
-1.0711
-1.0711
-1.0711
-1.0711
-1.0711
b1
-0.0446
-0.0446
-0.0446
-0.0446
-0.0446
-0.0446
-0.0446
-0.0446
-0.0446
-0.0446
-0.0446
-0.0446
-0.0446
-0.0446
-0.0446
-0.0446
-0.0446
-0.0446
-0.0446
x1
b0 + b1x1
-130
4.7269
-120
4.2809
-110
3.8349
-100
3.3889
-90
2.9429
-80
2.4969
-70
2.0509
-60
1.6049
-50
1.1589
-40
0.7129
-30
0.2669
-20
-0.1791
-10
-0.6251
0
-1.0711
10
-1.5171
20
-1.9631
30
-2.4091
40
-2.8551
50
-3.3011
Predicted
Probability of
taking out MPPI
0.991223828
0.986358456
0.978853343
0.967355826
0.949926848
0.923924213
0.886038527
0.832702114
0.761132782
0.671041637
0.566331702
0.455344304
0.348622427
0.255193951
0.179888956
0.123131948
0.082481403
0.05441829
0.035533472
Figure 10 Predicted Probability of Taking out MPPI
P re d ic te d P ro b a b ility o f ta k in g o u t M P P I
1 .2
1
P ro b (M P P I)
0 .8
0 .6
0 .4
0 .2
0
-1 3 0 -1 2 0 -1 1 0 -1 0 0
-9 0
-8 0
-7 0
-6 0
-5 0
-4 0
-3 0
-2 0
-1 0
0
10
20
N u m b e r o f C h ild r e n
30
40
50
60
70
80
90
100
110
120
130
140
8-14
Clearly, it is meaningless to talk about having “-130” children, so lets look now at
predicted probabilities for the relevant range of the explanatory variable only (Table 8
and Figure 11). You can see from the graph that, to all intents and purposes, the
relationship between probability of MPPI take up and number of children appears
linear, and is negative as we would expect. However, in other contexts the predicted
probabilities may not have such a near-linear relationship with the explanatory
variable and so it can be quite useful to plot predicted probabilities over the relevant
range of a variable.
Table 8: Predicted probabilities over relevant values of x:
b0
-1.0711
-1.0711
-1.0711
-1.0711
-1.0711
-1.0711
-1.0711
-1.0711
-1.0711
b1
-0.0446
-0.0446
-0.0446
-0.0446
-0.0446
-0.0446
-0.0446
-0.0446
-0.0446
x1
0
1
2
3
4
5
6
7
8
-1.0711
-1.1157
-1.1603
-1.2049
-1.2495
-1.2941
-1.3387
-1.3833
-1.4279
Predicted
Probability of
taking out MPPI
0.255193951
0.24680976
0.238612778
0.230604681
0.222786703
0.215159652
0.207723924
0.200479528
0.193426099
Figure 11 Predicted probabilities over relevant values of x:
P red ic ted P rob ab ility of tak ing out M P P I
0 .2 6
0 .2 5
P r o b (M P P I)
0 .2 4
0 .2 3
0 .2 2
0 .2 1
0 .2
0
0 .2
0 .4
0 .6
0 .8
1
1 .2
1 .4
1 .6
1 .8
2
2 .2
2 .4
2 .6
2 .8
3
N u m b e r o f C h ild r e n
3 .2
3 .4
3 .6
3 .8
4
4 .2
4 .4
4 .6
4 .8
5
5 .2
5 .4
8-15
8.9 Multivariate Logit:
Logit models are more complex if there is more than one x since the effect on the
dependent variable will depend on the values of the other explanatory variables. So
we cannot simply interpret the coefficients from multivariate logit as the individual
effects of each variable. The effect of each variable in the logit model will be
different for each value of the other variables (in mathematical terms, the first partial
derivatives are not constant: they have variables in them).
There are various ways round this. One solution is to calculate the effect of each
variable at the mean value of all variables (called the marginal effect at the mean), but
this tells us nothing about the effect of the variable at other values.
8.10 Odds
Another solution to this is to use the odds, which is the probability of an event
occurring divided by the probability of the event not occurring:
odds
=
P(event)
P(no event)
=
P(event)
1 - P(event)
So in the case of the above example, odds of having MPPI would equal the
probability of having MPPI divided by the probability of not having MPPI. From
Table 8 it can be seen that for families with no children the predicted probability of
take-up is 26% (see final column of Table 8 which reads 0.255193951 for x1 = 0), so
the odds would be:
Odds on of
MPPI take-up
=
0.26
= 0.26 = 0.35
for HHs with 0 children
1-0.26
0.74
which is almost exactly a third – i.e. childless households are three times more likely
not to have MPPI than to have it. In the parlance of bookmakers, the odds of MPPI
take-up amongst households with no children are 3 to 1 against (or, 1 to 3 on).
8.11 Proportionate Change in Odds
One of the attractions of using logit rather probit analysis of dichotomous dependent
variable models is that the proportionate change in odds due to a change in an
explanatory variable can easily be calculated. SPSS computes this automatically for
each explanatory variable and calls it “Exp(B)” which is the effect on the predicted
odds of a unit change in the explanatory variable, holding all other variables constant.
For the regression listed in Table 6 the Exp(B) = 0.9564 would be given in the SPSS
output as follows:
Variable
CHILDREN
Constant
B
S.E.
-.0446 .0935
-1.0711 .1143
Exp(B)
.9564
Returning to Table 8 we could calculate manually the effect of having 1 more child on
the odds of taking out MPPI. First we need to calculate the predicted odds before the
increase in children, which we have already done for zero children:
8-16
Prob(HH has MPPI given child = 0)
= 0.2552
Odds(HH has MPPI given child = 0)
= 0.2552/(1-0.2552) = 0.3426
Next we need to calculate the odds for households with one child:
Prob(HH has MPPI given child = 1)
= 0.2468
Odds(HH has MPPI given child = 1)
= 0.2468/(1-0.2468) = 0.3277
To calculate the proportionate change in odds from having one more child, we divide
the second figure by the first:
Proport.Change in Odds
=
Exp(B)
=
odds after a unit change in the explanatory variable
original odds
=
0.3277 / 0.3426
=
0.956
Note that this figure would turn out the same if we chose to look at moving from 1
child to 2, or 3 to 4 etc.
8.12 Interpreting Exp(B)
If the proportionate change in odds is greater than one, then as the explanatory
variable increases, the odds of the outcome increase (i.e. in the case of a variable that
has a positive effect on the dependent variable). The larger the value, the more
sensitive the predicted probability is to unit changes in the variable.
If, as in the above example, the proportionate change in odds is less than one, then as
the explanatory variable (number of children) increases, the predicted probability
declines (i.e. the variable has a negative effect on the dependent variable). In this
case, the smaller the value of Exp(B) the more sensitive the predicted probability is to
unit changes in the variable.
So,


if the value of Exp(B) is > 1 then it indicates that as the explanatory variable
increases, the odds of the outcome occurring increase.
if the value of Exp(B) is < 1 then it indicates that as the explanatory variable
increases, the odds of the outcome occurring decrease.
If you want to compare the magnitudes of positive and negative effects, simply take
the inverse of the negative effects. For example, if Exp(B) = 2 on a positive effect
variable, this has the same magnitude as variable with Exp(B) = 0.5 = ½ but in the
opposite direction.
The useful thing about this approach to interpreting the effect of coefficients is that in
a multivariate context, the Exp(B) is calculated holding all other variables constant.
One must remember, however, that “a constant change in the odds does not
8-17
correspond to a constant change … in the probability” (Long p. 82). For example, a
doubling of the odds from 1/1000 to 2/1000 results in an Exp(B) of 2, and this is true
if we move from 1/10 to 2/10 or 1/1 to 2/1 etc. However, whilst the proportionate
change in the odds is constant at 2 for these examples, the change in probability is
not: the change in odds from 1/1000 to 2/1000 results in a change in probability of
0.001, whereas the change in odds from 1/10 to 2/10 results in a change in probability
of 0.076 (Long, p. 82).
1. Open the Project 2 data (surv9295) and create a dummy variable from q16a that
equals 1 if the borrower has MPPI and = 0 otherwise.
if (q16a = 2) MPPI = 0.
execute.
if ( q16a = 1) MPPI = 1.
execute.
Alternatively, given that some respondents may have confused MPPI with other
forms of insurance, you may want to screen out people who do not have
mortgages; and only assume that a person has MPPI if they know the monthly
premium (q16b > 0):
If (q16a >= 2 and q10c = 1) MPPI = 0 .
EXECUTE .
IF (q16b > 0 and q10c = 1) MPPI = 1 .
EXECUTE .
2. Go to Analyse, Regression, Binary Logistic, enter your newly created dummy for
MPPI take-up in the dependent variable box and enter possible explanatory
variables in the covariates box, such as:
Loan to value ratio (see q9 and q10b)
Q65.6 = educational achievement
Q58a = number of adults,
q58b = number of children,
q58c = total number of people in HH;
8-18
Alternatively you can use the syntax version:
LOGISTIC REGRESSION VAR=mppi
/METHOD=ENTER q58b
/CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .
3. How would you interpret the Exp(B) values in the output from this regression?
4. Look at the coding of missing values for q58b. How might this distort your
results? How might you solve this problem?
Download