8-1 8 Categorical/Limited Dependent Variables and Logistic Regression Reading: Kennedy, P. “A Guide to Econometrics” chapter 15 Field, A. “Discovering Statistics”, chapter 5. For a more comprehensive treatment of this topic, you may want to consider purchasing: Long, J. S.(1997) “Regression models for Categorical and Limited Dependent Variables”, Sage: Thousand Oaks California. Aim: The aim of this section is to consider the appropriate modelling technique to use when the dependent variable is dichotomous. Objectives: By the end of this chapter, students should be aware of the problems of applying OLS to situations where the dependent variable is dichotomous. They should know the appropriate statistical model to turn to for different types of categorical dependent variable and be able to understand the intuition behind logit modeling. They should be able to run logit in SPSS and be able to interpret the output from a logit regression using the proportionate change in odds method. Plan: 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10 8.11 8.12 Introduction .................................................................................................... 8-1 Choosing the Appropriate Statistical Models: ............................................... 8-2 Linear Probability Model ............................................................................... 8-4 A more appropriate functional form: ............................................................. 8-5 More than one explanatory variable: ............................................................. 8-8 Goodness of fit ............................................................................................... 8-9 Estimation of the logistic model: ................................................................. 8-11 Interpreting Output....................................................................................... 8-11 Multivariate Logit: ....................................................................................... 8-15 Odds ............................................................................................................. 8-15 Proportionate Change in Odds ..................................................................... 8-15 Interpreting Exp(B) ...................................................................................... 8-16 8.1 Introduction The linear regression model that has formed the main focus of Module II, is the most commonly used statistical tool in the social sciences. It has many advantages (hence its popularity) but it has a number of limiting assumptions. One is that that the dependent variable is an uncensored “scale numeric” variable. In other words, it is presumed that the data on the dependent variable in your model is continuous and has been measured for all cases in the sample. 8-2 In many situations of interest to social scientists, however, the dependent variable is not continuous or measured for all cases. The table below offers a list based on Long (1997, p. 1-3) of different types of dependent variable, typical social science examples and the appropriate estimation method. Its worth spending some time working through the table, particularly if you are seeking to apply regression analysis to data you have collected yourself or as part of a research project. Consider which category most appropriately describes your data. If the correct estimation technique is neither regression nor dichotomous logit or probit, then you will need to refer to the reading suggested at the start of this chapter since this text only covers regression and logit. Don’t panic, however: once you have learnt to understand the output from OLS and logit, the results from most of the other methods should not be too difficult to interpret with a little extra background reading. Note though, that to some extent, the categorisation of variables given in the table hides the fact that the level of measurement of a variable is often ambiguous: “...statements about levels of measurement of a [variable] cannot be sensibly made in isolation from the theoretical and substantive context in which the [variable] is to be used” (Carter, 1971, p.12, quoted in Long 1997, p. 2) Education, for example, could be measured as a binary variable: 1 if only attained High School or less, 0 if other. as an ordinal variable: 6 if has PhD, 5 if has Masters, 4 if has Degree, 3 if has Highers, 2 if has Standard Grades, 1 if no qualifications or as a count variable number of school years completed 8.2 Choosing the Appropriate Statistical Models: If we choose a model that assumes a level of measurement of the dependent variable different to that of our data, then the our estimates may be biased, inefficient or simply inappropriate. For example, if we apply standard OLS to dependent variables that fall into any of the above categories of data, it will assume that the variable is unbounded and continuous and construct a line of best fit accordingly. We shall look at this case in more detail below and then focus on applying the logit model to binary dependent variables. More elaborate techniques are needed to estimate other types of limited dependent variables (see table) which are beyond the scope of this course (and of SPSS). Nevertheless, students that gain a good grasp of logit models will find that many of the same issues and methods crop up in the more advanced techniques. 8-3 Table 1 Categorising and Modelling Models with Limited Dependent Variables Type of dependent variable Binary variables: made up of two categories Examples coded 1 if event has occurred, 0 if not. It has to be a decision or a category that can be explained by other variables (I.e. male/female is not something amenable to social scientific explanation -- it is not usually a dependent variable): Did the person vote or not? Did the person take out MPPI or not? Does the person own their own home or not? Appropriate Estimation technique If the Dependent variable is Binary then Estimate using: binary logit (also called logistic regression) or probit Ordinal variables: made up of categories that can be ranked (ordinal = “has an inherent order”) e.g. coded 4 if strongly agree, 3 if agree, 2 if disagree, and 1 if strongly disagree. e.g. coded 4 if often, 3 occasionally, 2 if seldom, 1 if never e.g. coded 3 if radical, 2 if liberal, 1if conservative e.g. coded 6 if has PhD, 5 if has Masters, 4 if has Degree, 3 if has Highers, 2 if has Standard Grades, 1 if no qualifications If the Dependent variable is Ordinal then Estimate using: ordered logit or ordered probit Nominal variables: made up of multiple outcomes that cannot be ordered Count variables: indicates the number of times that an event has occurred. e.g. Marital status: single, married, divorced, widowed e.g. mode of transport: car, van, bus, train, bycycle If the Dependent variable is Nominal then Estimate using: multinomial logit e.g. how many times has a person been married e.g. how often times did a person visit the doctor last year? e.g. how many strikes occurred? e.g. how many articles has an academic published? e.g. how many years of education has a person completed? If the Dependent variable is a Count variable Estimate using: Poisson or negative binomial regression Censored Variables: occur when the value of a variable is unkown over a certain range of the variable e.g. variables measuring %: censored below at zero and above at 100. e.g. hourly wage rates: censored below by minimum wage rate. If the Dependent variable is Censored, Estimate using: Tobit Grouped Data: occurs when we have apparently ordered data but where the threshold values for categories are known: e.g. a survey of incomes, which is coded as follows: = 1 if income < 5,000, = 2 if 5,000 income < 7,000, = 3 if 7,000 income < 10,000, = 4 if 10,000 income < 15,000, = 5 if income 15,000 If the Dependent variable is Censored, Estimate using: Grouped Tobit (e.g. LIMDEP) 8-4 8.3 Linear Probability Model What happens if we try to fit a line of best fit to a regression where the dependent variable is binary? This is the situation depicted in the scatter plot below (Figure 1) of a dichotomous dependent variable y on a continuous explanatory variable x. The advantage of this kind of model (called the Linear Probability Model) is that interpretation of coefficients is straightforward because they are essentially interpreted in the same way as in linear regression. For example, if the data on the dependent variable records labour force participation (1 if in the labour force, 0 otherwise), then we could think of our regression as being a model of the probability of labour force participation. In this case, a coefficient on x of 0.4, yhat = 1.2 + 0.4x would imply that the predicted probability of labour force participation increases by 0.4 for every unit increase in x, holding all other variables constant. Figure 1 Scatter Plot and Line of Best Fit of a Linear Probability Model y 1 0 x y 8.3.1 Disadvantages: As tempting as it is to apply the linear probability model (LPM) to dichotomous data, its problems are so severe and so fundamental that it should only be used in practice for comparison purposes. Heteroscedasticity: The first major problem with the LPM is that the error term will tend to be larger for middle values of x. As a result, the OLS estimates are inefficient and standard errors are biased, resulting in incorrect t-statistics. 8-5 Non-normal errors: Second, the errors will not be normally distributed (but note that normality not required for OLS to be BLUE). Nonsensical Predictions: The most serious problem is that the underlying functional form is incorrect and so the predicted values can be < 0, or > 1. This can be seen in Figure 1 by the fact that there are sections of the line of best fit that lie above y = 1 or below y = 0. This is of course meaningless if we are interpreting the predicted values as measures of the probability of an observation falling into the y = 1 category since probabilities can only take on values between zero and one. 8.4 A more appropriate functional form: What kind of model/transformation of our data could be used to represent this kind of relationship? If you think about it, what we require is a functional form that will pruduce a line of best fit that is “s” shaped: coverges to zero at one end and converges to 1 at the other end. At first thought, one might suggest a cubic transformation of the data, but cubic transformations are ruled out because they are unbounded (i.e. there would be nothing to stop such a model producing predicted values greater than one or less than zero). Note also that we may well have more than one explanatory variable, so we need a model that can be applied to the whole right hand side of the equation, b0 + b1x1 + b2x2 + b3x3, and still result in predicted values for y that range between 0 and 1. Figure 2 A more appropriate line of best fit y 1 0 x y 8-6 8.4.1 Logistic transformation: One appropriate (and popular) transformation that fits these requirements is the logit (also called the logistic) transformation which is simply the exponent of x divided by one plus the exponent of x: exp( x) 1 exp( x) If we have a constant term and more than more than one explanatory variable, then the logit would be: exp(b0 b1 x1 b2 x2 b3 x3 ) 1 exp(b0 b1 x1 b2 x2 b3 x3 ) An example of the logit transformation of a variable x is given in the table below. To take the exponent of a number, you have to raise the value of e (which, like = 3.1415927…, is one of those very special numbers that keeps cropping up in mathematics and which has an infinite number of decimal places: e = 2.7182818….), by that number. So, suppose you want to take the exponent of 2, you raise e to the power of 2, which is approximately the same as raising 2.7 to the power of 2, exp(2) = e2 = 2.72 = 7.3. If you want to take the logit of 2, you would divide exp(2) by one plus exp(2): exp( 2) e2 7.3 logit(2) = = 7.3/8.3 = 0.88 2 1 exp( 2) 1 e 1 7.3 If you wanted to take the exponent of a negative number, such as –3, you raise e to the power of –3, which is approximately equivalent to raising 2.7 to the power of –3. But because the number you’ve chosen is negative, it has the effect of inverting (i.e. making it into the denominator of a fraction with 1 as the numerator) the figure you want to raise to a power (e.g. 2-1 = ½) and then raising the denominator to the power of absolute value of the number (e.g. 2-2 = 1/(22) = ¼ ). So, 1 1 1 exp(-3) = e-3 2.7-3 = = 0.05 3 2.7 * 2.7 * 2.7 19.7 2.7 The logit of –3 would be: exp( 3) e 3 0.05 logit(-3) = = 0.05/1.05 = 0.05 3 1 exp( 3) 1 e 1 0.05 Table 2 Logits for particular values of x x exp(x) 1+exp(x) logit[x] -3 0.05 1.05 0.05 -2 0.14 1.14 0.12 -1 0.37 1.37 0.27 0 1.00 2.00 0.50 1 2.72 3.72 0.73 2 7.39 8.39 0.88 3 20.09 21.09 0.95 8-7 The values of logit(x) listed in the above table are plotted against x in Figure 3. A more complete picture of the logit function in given in following figure (Figure 4) which has been plotted for much smaller intervals and for a much wider range of values of x. Both figures demonstrate that the logit transformation does indeed produce the desired “s” shaped outcome. Figure 3 Logit(x) Plotted for 7 values of x logistic[x] 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 -3 -2 -1 0 1 2 3 Figure 4 The Logit Curve logistic[x ] 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 x 3.5 3.3 3.1 2.9 2.7 2.5 2.3 2.1 1.9 1.7 1.5 1.3 1.1 0.9 0.7 0.5 0.3 0.1 -0.1 -0.3 -0.5 -0.7 -0.9 -1.1 -1.3 -1.5 -1.7 -1.9 -2.1 -2.3 -2.5 -2.7 -2.9 -3.1 -3.3 -3.5 0 8-8 8.5 More than one explanatory variable: Suppose we have more than one explanatory variable, how do we apply the logit transformation? Consider the case where we have the following linear function of explanatory variables: 230 – 4x1 + 7x2 +8x3 The logit transformation of this function would be: logit(230 – 4x1 + 7x2 +8x3) which is simply the logit of the value of the linear function for a particular set values for x1, x2 and x3. For example, if x1 = 0, x2 = -0.7, and x3 = -31, then 230 – 4x1 + 7x2 +8x3 = -22.9 and the logit of the function for this set of values would be, logit(230 – 4x1 + 7x2 +8x3) = logit(-22.9) = 0.0000000001134 = 1.134E-10 In Table 3 the logit transformations of the above function are listed for a variety of values of the three variables. A full set of logit transformations for this function are then plotted in Figure 5. It can be seen that, when we take a linear transformation of a whole function, rather than just an individual variable, we still end up with our desired “s” shaped curve that lies between y = 0 and y = 1. Table 4 Logit transformation of a linear function 230 - 4x1 + 7x2 + 8x3 b0 + b1x1 + b2x2 + b3x3 = b0 x1 230 230 230 230 230 230 230 230 230 230 230 230 230 230 230 230 230 230 x2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 -0.7 -0.65 -0.6 -0.55 -0.5 -0.45 -0.4 -0.35 -0.3 -0.25 -0.2 -0.15 -0.1 -0.05 9.71445E-17 0.05 0.1 0.15 x3 -31 -30.5 -30 -29.5 -29 -28.5 -28 -27.5 -27 -26.5 -26 -25.5 -25 -24.5 -24 -23.5 -23 -22.5 b1x1 + b2x2 + b3x3 -22.9 -22.55 -22.2 -21.85 -21.5 -21.15 -20.8 -20.45 -20.1 -19.75 -19.4 -19.05 -18.7 -18.35 -18 -17.65 -17.3 -16.95 Logistic() 1.13411E-10 1.60938E-10 2.28382E-10 3.2409E-10 4.59906E-10 6.52637E-10 9.26136E-10 1.31425E-09 1.86501E-09 2.64657E-09 3.75567E-09 5.32954E-09 7.56298E-09 1.07324E-08 1.523E-08 2.16124E-08 3.06694E-08 4.3522E-08 8-9 Figure 5 Logit curve for a linear function L ogistic(b 0 + b 1 x 1 + b 2 x 2 + b 3 x 3 ) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 26.1 24.7 23.3 21.9 20.5 19.1 17.7 16.3 14.9 13.5 12.1 9.3 10.7 7.9 6.5 5.1 3.7 2.3 0.9 -0.5 -1.9 -3.3 -4.7 -6.1 -7.5 -10 -8.9 -12 -13 -15 -16 -17 -19 -20 -22 -23 0 b 0 + b 1x 1 + b 2x 2 + b 3x 3 8.6 Goodness of fit The discussion so far would suggest that the logit transformation might be an ideal candidate for fitting a line of best fit to models of dichotomous dependent variables. If actual values of y in Figure 6 were observed for a wide range of the possible values of x, then this plot wouldn’t be a very good line of best fit. Values of b0 + b1x1 + b2x2 + b3x3 that are less than -4 or greater than 4 have very little effect on the probability. Yet most of the values of x lie outside the -4, 4 range. Perhaps if we alter the estimated values of bk then we might improve our line of best fit. Figure 6 Logit Line of Best Fit Logistic(b 0 + b 1 x 1 + b 2 x 2 + b 3 x 3 ) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 26.1 24.7 23.3 21.9 20.5 19.1 17.7 16.3 14.9 13.5 12.1 9.3 10.7 7.9 6.5 5.1 3.7 2.3 0.9 -0.5 -1.9 -3.3 -4.7 -6.1 -7.5 -10 -8.9 -12 -13 -15 -16 -17 -19 -20 -22 -23 0 b 0 + b 1x 1 + b 2x 2 + b 3x 3 Suppose we try the following set of values for the coefficients: b0 = 22, b1 = -0.4, b2 = 0.5 and b3 = 0.98: Table 5 displays the logit values of the revised function which are plotted in Figure 7. You can see that the logit curve is now much less steep and so will produce more 8-10 middle range values of the predicted probabilities (the vertical axis) for a wider range of the right hand side function (horizontal axis). Table 5 Effect of Modifying the Coefficients -31 -30.5 -30 -29.5 -29 -28.5 -28 -27.5 -27 -26.5 -26 -25.5 -25 -24.5 -24 -23.5 -23 -22.5 -22 -21.5 -0.7 -0.65 -0.6 -0.55 -0.5 -0.45 -0.4 -0.35 -0.3 -0.25 -0.2 -0.15 -0.1 -0.05 9.71E-17 0.05 0.1 0.15 0.2 0.25 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 -0.4x1 + 0.5x2 + 0.98x3 -8.73 -8.615 -8.5 -8.385 -8.27 -8.155 -8.04 -7.925 -7.81 -7.695 -7.58 -7.465 -7.35 -7.235 -7.12 -7.005 -6.89 -6.775 -6.66 -6.545 x3 x2 x1 b0 Logistic() 0.000162 0.000181 0.000203 0.000228 0.000256 0.000287 0.000322 0.000361 0.000405 0.000455 0.00051 0.000572 0.000642 0.00072 0.000808 0.000907 0.001017 0.001141 0.00128 0.001435 Figure 7 Effect of Modifying the Coefficients L ogistic(b 0 + b 1 x 1 + b 2 x 2 + b 3 x 3 ) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 22 - 0.4x 1 + 0.5x 2 + 0.98 x 3 7.37 6.91 6.45 5.99 5.53 5.07 4.61 4.15 3.69 3.23 2.77 2.31 1.85 1.39 0.93 0.47 -0.5 0.01 -0.9 -1.4 -1.8 -2.3 -2.8 -3.2 -3.7 -4.1 -4.6 -5.1 -6 -5.5 -6.4 -6.9 -7.4 -7.8 -8.3 -8.7 0 8-11 8.7 Estimation of the logistic model: The above discussion leads naturally to a probability model of the form: Pr( y 1 | xk ) exp(b0 b1 x1 b2 x2 ... bK xK ) 1 exp(b0 b1 x1 b2 x2 ... bK xK ) We now need to find a way of estimating values of bk that will best fit the data. Unfortunately, OLS cannot be applied since the above model is non-linear in parameters, so an alternative has to be found. The method used to estimate logit is called “maximum likelihood”. The intuition behind this technique is as follows: first start by saying, for a given set of parameter values, what is the probability of observing the current sample. Then try various values of the parameters to arrive at estimates of the parameters that makes the observed data most likely. This is rather like the numerical example above where we modified the values of the bs in such a way that it improved the fit of the model to the data; with ML you are concerned with the fit of the data to the model. 8.8 Interpreting Output Because logit regression is fundamentally non-linear, interpretation of output can be difficult. Many published studies that use logit overlook this fact and they either interpret magnitude of coefficients incorrectly (mistakenly treating them like OLS coefficients) or they only pass comment on the signs (i.e. whether positive or negative) of coefficients. The difference between the interpretation of coefficients from OLS and logit is best seen by comparing the different impact of increasing logit coefficients by 1. In OLS, the effect would simply be to increase the slope of the (straight) line of best fit by 1 (i.e. make the line steeper). This means that for any increase in the relevant explanatory variable, will now increase the dependent variable that much more, but the effect is the same for all values of the x variable. This is not so in logit estimation. Continuing our numerical example from above, Figure 8 shows the effect of increasing b2 by 1. You can see that the change in the value of y (the vertical axis) varies considerably depending on the value of the horizontal axis (measuring the total linear function). For very low or very high values of the horizontal axis, the impact is trivial, but much more substantial for middle values of the horizontal axis. Perhaps even more surprising is the impact of changing the value of the constant term. In OLS, this will just cause a vertical shift in the line and so have an identical effect for all values of the explanatory variable concerned, whereas in the logit model, a unit increase in the constant term will have a varied impact for different values of the dependent variable. 8-12 Figure 8 Impact of increasing b2 by 1 Logistic(b 0 + b 1 x 1 + b 2 x 2 + b 3 x 3 ) 1 0.9 0.8 0.7 0.6 b2 = 0.5 b2 = 1.5 0.5 0.4 0.3 0.2 0.1 0 . -8 73 . -8 16 . -7 58 . -7 01 . -6 43 . -5 86 . -5 28 . -4 71 . -4 13 . -3 56 . -2 98 . -2 41 . -1 83 . -1 26 . -0 68 . -0 11 0. 47 1. 04 5 1. 62 2. 19 5 2. 77 3. 34 5 3. 92 4. 49 5 5. 07 5. 64 5 6. 22 6. 79 5 7. 37 Figure 9 Impact of increasing b0 by 1 Logistic(b 0 + b 1 x 1 + b 2 x 2 + b 3 x 3 ) 1 0.9 0.8 0.7 0.6 b0 = 22 b0 = 23 0.5 0.4 0.3 0.2 0.1 5 37 79 7. 6. 5 22 64 6. 5. 5 07 49 5. 4. 5 92 34 3. 3. 5 77 19 2. 2. 5 62 1. 47 11 68 26 83 41 98 56 13 71 28 86 43 01 58 16 04 1. 0. -0 . -0 . -1 . -1 . -2 . -2 . -3 . -4 . -4 . -5 . -5 . -6 . -7 . -7 . -8 . -8 . 73 0 Lets look at another example. Consider the impact of number of children on the take up of mortgage payment protection. We might expect that families with more children have greater outlays relative to earnings and may therefore be less likely to consider MPPI to be affordable. If we run a logit regression model of MPPI take-up on children, we obtain the following output: 8-13 Table 6 Logit Output from MPPI Regression --------- Variables in the Equation -----Variable B S.E. Wald df Sig CHILDREN -.0446 .0935 .2278 1 .6331 Constant -1.0711 .1143 87.8056 1 .0000 If we use the estimated coefficients to plot the predicted probability of taking out MPPI given a wide range of values for x we get the following list of predicted probabilities (last column of Table 7 – plotted in Figure 10 against number of children). Table 7 Predicted Probability of MPPI Take-up b0 -1.0711 -1.0711 -1.0711 -1.0711 -1.0711 -1.0711 -1.0711 -1.0711 -1.0711 -1.0711 -1.0711 -1.0711 -1.0711 -1.0711 -1.0711 -1.0711 -1.0711 -1.0711 -1.0711 b1 -0.0446 -0.0446 -0.0446 -0.0446 -0.0446 -0.0446 -0.0446 -0.0446 -0.0446 -0.0446 -0.0446 -0.0446 -0.0446 -0.0446 -0.0446 -0.0446 -0.0446 -0.0446 -0.0446 x1 b0 + b1x1 -130 4.7269 -120 4.2809 -110 3.8349 -100 3.3889 -90 2.9429 -80 2.4969 -70 2.0509 -60 1.6049 -50 1.1589 -40 0.7129 -30 0.2669 -20 -0.1791 -10 -0.6251 0 -1.0711 10 -1.5171 20 -1.9631 30 -2.4091 40 -2.8551 50 -3.3011 Predicted Probability of taking out MPPI 0.991223828 0.986358456 0.978853343 0.967355826 0.949926848 0.923924213 0.886038527 0.832702114 0.761132782 0.671041637 0.566331702 0.455344304 0.348622427 0.255193951 0.179888956 0.123131948 0.082481403 0.05441829 0.035533472 Figure 10 Predicted Probability of Taking out MPPI P re d ic te d P ro b a b ility o f ta k in g o u t M P P I 1 .2 1 P ro b (M P P I) 0 .8 0 .6 0 .4 0 .2 0 -1 3 0 -1 2 0 -1 1 0 -1 0 0 -9 0 -8 0 -7 0 -6 0 -5 0 -4 0 -3 0 -2 0 -1 0 0 10 20 N u m b e r o f C h ild r e n 30 40 50 60 70 80 90 100 110 120 130 140 8-14 Clearly, it is meaningless to talk about having “-130” children, so lets look now at predicted probabilities for the relevant range of the explanatory variable only (Table 8 and Figure 11). You can see from the graph that, to all intents and purposes, the relationship between probability of MPPI take up and number of children appears linear, and is negative as we would expect. However, in other contexts the predicted probabilities may not have such a near-linear relationship with the explanatory variable and so it can be quite useful to plot predicted probabilities over the relevant range of a variable. Table 8: Predicted probabilities over relevant values of x: b0 -1.0711 -1.0711 -1.0711 -1.0711 -1.0711 -1.0711 -1.0711 -1.0711 -1.0711 b1 -0.0446 -0.0446 -0.0446 -0.0446 -0.0446 -0.0446 -0.0446 -0.0446 -0.0446 x1 0 1 2 3 4 5 6 7 8 -1.0711 -1.1157 -1.1603 -1.2049 -1.2495 -1.2941 -1.3387 -1.3833 -1.4279 Predicted Probability of taking out MPPI 0.255193951 0.24680976 0.238612778 0.230604681 0.222786703 0.215159652 0.207723924 0.200479528 0.193426099 Figure 11 Predicted probabilities over relevant values of x: P red ic ted P rob ab ility of tak ing out M P P I 0 .2 6 0 .2 5 P r o b (M P P I) 0 .2 4 0 .2 3 0 .2 2 0 .2 1 0 .2 0 0 .2 0 .4 0 .6 0 .8 1 1 .2 1 .4 1 .6 1 .8 2 2 .2 2 .4 2 .6 2 .8 3 N u m b e r o f C h ild r e n 3 .2 3 .4 3 .6 3 .8 4 4 .2 4 .4 4 .6 4 .8 5 5 .2 5 .4 8-15 8.9 Multivariate Logit: Logit models are more complex if there is more than one x since the effect on the dependent variable will depend on the values of the other explanatory variables. So we cannot simply interpret the coefficients from multivariate logit as the individual effects of each variable. The effect of each variable in the logit model will be different for each value of the other variables (in mathematical terms, the first partial derivatives are not constant: they have variables in them). There are various ways round this. One solution is to calculate the effect of each variable at the mean value of all variables (called the marginal effect at the mean), but this tells us nothing about the effect of the variable at other values. 8.10 Odds Another solution to this is to use the odds, which is the probability of an event occurring divided by the probability of the event not occurring: odds = P(event) P(no event) = P(event) 1 - P(event) So in the case of the above example, odds of having MPPI would equal the probability of having MPPI divided by the probability of not having MPPI. From Table 8 it can be seen that for families with no children the predicted probability of take-up is 26% (see final column of Table 8 which reads 0.255193951 for x1 = 0), so the odds would be: Odds on of MPPI take-up = 0.26 = 0.26 = 0.35 for HHs with 0 children 1-0.26 0.74 which is almost exactly a third – i.e. childless households are three times more likely not to have MPPI than to have it. In the parlance of bookmakers, the odds of MPPI take-up amongst households with no children are 3 to 1 against (or, 1 to 3 on). 8.11 Proportionate Change in Odds One of the attractions of using logit rather probit analysis of dichotomous dependent variable models is that the proportionate change in odds due to a change in an explanatory variable can easily be calculated. SPSS computes this automatically for each explanatory variable and calls it “Exp(B)” which is the effect on the predicted odds of a unit change in the explanatory variable, holding all other variables constant. For the regression listed in Table 6 the Exp(B) = 0.9564 would be given in the SPSS output as follows: Variable CHILDREN Constant B S.E. -.0446 .0935 -1.0711 .1143 Exp(B) .9564 Returning to Table 8 we could calculate manually the effect of having 1 more child on the odds of taking out MPPI. First we need to calculate the predicted odds before the increase in children, which we have already done for zero children: 8-16 Prob(HH has MPPI given child = 0) = 0.2552 Odds(HH has MPPI given child = 0) = 0.2552/(1-0.2552) = 0.3426 Next we need to calculate the odds for households with one child: Prob(HH has MPPI given child = 1) = 0.2468 Odds(HH has MPPI given child = 1) = 0.2468/(1-0.2468) = 0.3277 To calculate the proportionate change in odds from having one more child, we divide the second figure by the first: Proport.Change in Odds = Exp(B) = odds after a unit change in the explanatory variable original odds = 0.3277 / 0.3426 = 0.956 Note that this figure would turn out the same if we chose to look at moving from 1 child to 2, or 3 to 4 etc. 8.12 Interpreting Exp(B) If the proportionate change in odds is greater than one, then as the explanatory variable increases, the odds of the outcome increase (i.e. in the case of a variable that has a positive effect on the dependent variable). The larger the value, the more sensitive the predicted probability is to unit changes in the variable. If, as in the above example, the proportionate change in odds is less than one, then as the explanatory variable (number of children) increases, the predicted probability declines (i.e. the variable has a negative effect on the dependent variable). In this case, the smaller the value of Exp(B) the more sensitive the predicted probability is to unit changes in the variable. So, if the value of Exp(B) is > 1 then it indicates that as the explanatory variable increases, the odds of the outcome occurring increase. if the value of Exp(B) is < 1 then it indicates that as the explanatory variable increases, the odds of the outcome occurring decrease. If you want to compare the magnitudes of positive and negative effects, simply take the inverse of the negative effects. For example, if Exp(B) = 2 on a positive effect variable, this has the same magnitude as variable with Exp(B) = 0.5 = ½ but in the opposite direction. The useful thing about this approach to interpreting the effect of coefficients is that in a multivariate context, the Exp(B) is calculated holding all other variables constant. One must remember, however, that “a constant change in the odds does not 8-17 correspond to a constant change … in the probability” (Long p. 82). For example, a doubling of the odds from 1/1000 to 2/1000 results in an Exp(B) of 2, and this is true if we move from 1/10 to 2/10 or 1/1 to 2/1 etc. However, whilst the proportionate change in the odds is constant at 2 for these examples, the change in probability is not: the change in odds from 1/1000 to 2/1000 results in a change in probability of 0.001, whereas the change in odds from 1/10 to 2/10 results in a change in probability of 0.076 (Long, p. 82). 1. Open the Project 2 data (surv9295) and create a dummy variable from q16a that equals 1 if the borrower has MPPI and = 0 otherwise. if (q16a = 2) MPPI = 0. execute. if ( q16a = 1) MPPI = 1. execute. Alternatively, given that some respondents may have confused MPPI with other forms of insurance, you may want to screen out people who do not have mortgages; and only assume that a person has MPPI if they know the monthly premium (q16b > 0): If (q16a >= 2 and q10c = 1) MPPI = 0 . EXECUTE . IF (q16b > 0 and q10c = 1) MPPI = 1 . EXECUTE . 2. Go to Analyse, Regression, Binary Logistic, enter your newly created dummy for MPPI take-up in the dependent variable box and enter possible explanatory variables in the covariates box, such as: Loan to value ratio (see q9 and q10b) Q65.6 = educational achievement Q58a = number of adults, q58b = number of children, q58c = total number of people in HH; 8-18 Alternatively you can use the syntax version: LOGISTIC REGRESSION VAR=mppi /METHOD=ENTER q58b /CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) . 3. How would you interpret the Exp(B) values in the output from this regression? 4. Look at the coding of missing values for q58b. How might this distort your results? How might you solve this problem?