1 Psychology 522/622 Winter 2008 Lab Lecture 7 Logistic Regression DATAFILE: BIRTHWEIGHT.SAV The following example involves data collected at Baystate Medical Center, Springfield, Massachusetts, during 1986. We are interested in understanding the variables that predict the likelihood of a mother giving birth to a baby with low-birth weight (defined as a baby weighing less than 2500 grams). The variables we’ll use to predict low birth weight will be the mother’s age, whether the mother smoked during the pregnancy, whether the mother had hypertension, and the weight of the mother during her last menstrual period. Before we begin, here’s a table that should help you “translate” odds, log odds, and likelihood of the outcome (i.e., whichever level of the variable is coded 1). Likelihood of Outcome (e.g., low birthweight) Odds Log Odds Outcome is more likely to occur >1 >0 Outcome is not more or less likely to occur 1 0 Outcome is less likely to occur 0 – .99 <0 Also, a quick conversion table might be helpful: Desired Conversion Calculation Probability Odds Probability of outcome/Probability of not outcome = odds Odds Log Odds (ln)odds = log odds Log Odds Odds elog odds = odds Log Odds Probability elog odds / (1 + elog odds) = probability of outcome Descriptive Analysis Let’s run descriptive statistics for our data. Analyze Descriptives StatisticsDescriptivesMove the following variables into the Variable(s) box: Low birth weight, age of mother, mother’s last weight at menstrual period, smoking status during pregnancy, history of hypertension. Click OK. De scri ptive Statistics N Low Birth W eight Age of Mot her Mother's W eight at Las t Menst rual Period Smoking Status During Pregnancy History of Hypertension Valid N (lis twis e) 189 189 Minimum 0 14.0 Maximum 1 45.0 Mean .31 23.238 St d. Deviat ion .465 5.2987 189 80.00 250.00 129.8148 30.57938 189 0 1 .39 .489 189 189 0 1 .06 .244 2 Note: 31% of mothers had children with low birth weight. The odds of a low birth weight baby are .31/.69 = .45, meaning that having a low birth weight baby is less likely than having a baby of normal weight (because the odds are less than 1). The log of the odds of having a low birth weight baby is ln(.45) = -.80. Again, because the log of the odds is less than 0, low birth weight babies are less likely than normal birth weight babies. Now let’s look at the correlations among the study variables. AnalyzeCorrelateBivariate Move the following variables into the Variable(s) box: Low birth weight, age of mother, mother’s last weight at menstrual period, smoking status during pregnancy, history of hypertension. *I unchecked the “flag significant correlations” box because we are not interested in p-values associated with these correlations. They are for descriptive purposes only.* Click OK. Correlations Low Birth Weight Age of Mother Mother's Weight at Last Menstrual Period Smoking Status During Pregnancy His tory of Hypertens ion Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Low Birth Weight 1 . 189 -.119 .103 189 -.170 .020 Age of Mother -.119 .103 189 1 . 189 .180 .013 Mother's Weight at Last Menstrual Period -.170 .020 189 .180 .013 189 1 . 189 189 189 189 189 .161 .027 189 .152 .036 189 -.044 .545 189 -.016 .829 189 -.044 .546 189 .236 .001 189 1 . 189 .013 .855 189 .013 .855 189 1 . 189 Smoking Status During Pregnancy .161 .027 189 -.044 .545 189 -.044 .546 His tory of Hypertension .152 .036 189 -.016 .829 189 .236 .001 Note: We look at this to get a descriptive sense of how the variables are related and don’t pay much attention to the significance of the correlations. We see that the tendency to have a low birth weight baby is lower for younger mothers (-.119) and lighter mothers (.170), while it is higher for smoking mothers (.161) and hypertensive mothers (.152). We also inspect the correlations among the predictors to make sure that collinearity issues are not present. Nothing appears out of the ordinary. Logistic Regression Analyze Regression Binary Logistic Make LBW the dependent variable (1 = low birth weight, 0 = Normal weight). Move age, weight, smoke, and hyper to the covariates box. Click OK. 3 Logistic Regression Case Processing Summary Unweighted Cases Selected Cases a N Included in Analysis Mis sing Cases Total Unselected Cas es Total 189 0 189 0 189 Percent 100.0 .0 100.0 .0 100.0 a. If weight is in effect, s ee class ification table for the total number of cases. De pendent Varia ble Encodi ng Original Value Int ernal Value Normal 0 Low 1 Note: It is good to double-check this to see that the group you think you coded 1 is the group SPSS kept as 1 and likewise for the group coded 0. When SPSS does change things around we’d have a lot of fun interpreting what is going on when in fact the opposite relationships are in fact true. Nothing funny happened here so we proceed. Block 0: Beginning Block Classification Table a,b Predicted Step 0 Observed Low Birth Weight Normal Low Low Birth Weight Normal Low 130 0 59 0 Overall Percentage Percentage Correct 100.0 .0 68.8 a. Constant is included in the model. b. The cut value is .500 Note: Because having a low birth weight baby is less likely, when no predictors are considered, all mothers are predicted to have a normal birth weight baby. Va riables in the Equa tion St ep 0 Constant B -.790 S. E. .157 W ald 25.327 df 1 Sig. .000 Ex p(B) .454 Note: The logistic regression coefficient (which is the constant) is the log of the odds of having a low birth weight baby. This number agrees to rounding error what we calculated by hand above; ln(.45) = -.80. Exp(B) are the odds of having a low birth weight baby which also agrees within rounding error of what we computed by hand above; .31/.69 = .45. 4 Va riables not in the Equa tion St ep 0 Variables Sc ore 5.438 2.674 4.924 4.388 18.060 weight age smoke hy per Overall Statistics df 1 1 1 1 4 Sig. .020 .102 .026 .036 .001 Block 1: Method = Enter Omnibus Tests of Model Coefficients Step 1 Step Block Model Chi-square 18.988 18.988 18.988 df 4 4 4 Sig. .001 .001 .001 Note: This output is similar to the F-statistic for a multiple regression model. These statistics ask whether the four predictor variables taken all together are helpful in predicting the outcome (i.e., the log odds of giving a low birth weight baby). The null hypothesis here is that these predictors are not useful (i.e., their logistic regression coefficients are all zero). We reject this idea. Taken together, these 4 predictor variables appear to be useful in predicting the log odds of having a low birth weight baby. Model Summary Step 1 -2 Log Cox & Snell likelihood R Square 215.684a .096 Nagelkerke R Square .134 a. Es timation terminated at iteration number 5 because parameter estimates changed by les s than .001. Note: These R Square like statistics are just as they seem, they are R Square like statistics and have similar kinds of interpretation as they do in multiple linear regression (i.e., the bigger the better). Classification Table a Predicted Step 1 Observed Low Birth Weight Overall Percentage Normal Low Low Birth Weight Normal Low 121 9 49 10 Percentage Correct 93.1 16.9 69.3 a. The cut value is .500 Note: If we were interested in using logistic regression for classification, this table would tell us how well we would do. Comparing this table with the one above with no predictors, our classification accuracy improves (but just a little). This is partly due to the fact that having a low birth weight baby is a low probability event. 5 Va riables in the Equa tion Sta ep 1 weight age smoke hy per Constant B -.017 -.036 .679 1.788 1.767 S. E. .007 .033 .332 .686 1.052 W ald 6.554 1.144 4.185 6.797 2.819 df 1 1 1 1 1 Sig. .010 .285 .041 .009 .093 Ex p(B) .983 .965 1.972 5.978 5.852 a. Variable(s) ent ered on step 1: weight, age, smoke, hyper. Note: The B column contains our logistic regression coefficients. Remember that these are partial logistic regression coefficients (i.e., they are the relationship between mother’s weight and the log odds of having a low birth weight baby controlling for mother’s age, smoking history, and hypertension history). Age: The age coefficient is not statistically significant whereas the others are. Controlling for the other variables, age does not appear to be much related to the log odds of having a low birth weight baby. Weight: Controlling for the other variables, weight is negatively related to the log odds of having a low birth weight baby (i.e., controlling for the other variables heavier mothers are less likely to have a low birth weight baby). Specifically, for every additional pound of mother’s weight the log odds of having a low birth weight baby decreases by .017. Smoke: Controlling for the other variables, smoking increases the log odds of having a low birth weight baby by 1.788. *This is a categorical variable, smoked/didn’t smoke. Hyper: Controlling for the other variables, a history of hypertension increases the log odds of having a low birth weight baby by 1.767. *This is a categorical variable, had hypertension/didn’t have hypertension. Regarding Exp(B): this is the odds value associated with the log odds value in column B. For example: exp-.017 = .983; ln(.983) = -.02 Some Potential Follow-up Analyses for Additional Interpretation Let’s compare two groups of women to get a sense of the effect of a history of hypertension on the likelihood of having a low birth weight baby. The groups will be the same on age (the mean age of 23.2), the same weight (the mean weight of 129.8), and they will not smoke. However, one group will have a history of hypertension (Hypertension group) and one group will not have a history of hypertension (No Hypertension group). No Hypertension Group Log odd (LBW) = 1.767 - .017*(weight = 129.8) - .036*(age = 23.2) + .679*(smoke = 0) + 1.788*(hyper = 0) = -1.27 e 1.27 .281 prob( LBW ) .219 1.27 1 .281 1 e It is rare to have a low birth weight baby whose mother is of average weight, average age, did not smoke during the pregnancy, and has no history of hypertension. 6 Hypertension Group Log odd (LBW) = 1.767 - .017*(weight = 129.8) - .036*(age = 23.2) + .679*(smoke = 0) + 1.788*(hyper = 1) = .51 e .51 1.665 .625 .51 1 1.665 1 e The probability of a low birth weight child being born to a mother with this constellation of predictors (i.e., average weight, average age, didn’t smoke during pregnancy, history of hypertension) is .625. prob( LBW ) High Risk Group Let’s predict the outcome for a high risk group. This group will consist of smokers with a history of hypertension. They will be 1 SD below the mean for weight and 1 SD below the mean for age. Log odd (LBW) = 1.767 - .017*(weight = 129.8 – 30.6 = 99.2) - .036*(age = 23.2-5.3 = 17.9) + .679*(smoke = 1) + 1.788*(hyper = 1) = 1.767 - .017(99.2) - .036(17.9) + .679(1) + 1.788(1) = 1.71 e1.71 5.529 .847 1.71 1 5.529 1 e Here the probability of having a low birth weight baby within this high risk group is .847, a pretty high probability. prob( LBW ) Sample Results A logistic regression analysis was conducted to evaluate the relationships between the likelihood of having a low birth weight (LBW) and certain maternal characteristics. The predictors included: mother’s weight at last menstrual period, mother’s age, whether or not the mother smoked during pregnancy, and whether or not the mother had a history of hypertension. The outcome variable was birth weight of the child, with 1 = low birth weight (5 lbs 8 oz. or less), 0 = normal birth weight (greater than 5 lbs. 8 oz.). The four predictors together were significantly related to the log odds of having a LBW baby, χ2 (4) = 18.99, p = .001, Cox-Snell R2 = .096. With the exception of mother’s age, all predictors were significant. Controlling for the other variables, weight is negatively related to the log odds of having a low birth weight baby, slope = -.02, Wald χ2 statistic = 6.55, p =.01 . Specifically, for every additional pound of mother’s weight the log odds of having a low birth weight baby decreases by .017. Controlling for the other variables, smoking increases the log odds of having a low birth weight baby by 1.788, slope = .68, Wald χ2 statistic = 4.19, p =.04. Controlling for the other variables, a history of hypertension increases the log odds of having a low birth weight baby by 1.767, slope = 1.79, Wald χ2 statistic = 1.05, p =.01. The probability of “high risk” mothers (1 SD below the mean weight and age, smoked during pregnancy and had a history of hypertension) having a LBW baby was .85.