advertisement

Standardizing Continuous Predictors In Binary Logistic Regression Determining the relative importance of continuously distributed predictors in a binary logistic regression can be made much easier by standardizing those predictors. Typically such predictors are not measured in the same metric, and often the metric is arbitrary. The odds ratio tells one by what multiplicative factor the odds of the predicted event increase per one unit change in the predictor variable. A one unit change in the predictor variable may be a very small change (one gram increase in body mass) or it may be a very large change (number of limbs amputated) or it may be a change of uncertain magnitude (score on a Likert-type scale designed to measure political conservatism). Standardizing the continuous predictor variables puts them on the same scale (standard deviation units). To illustrate the benefit of standardizing continuous predictor variables in binary logistic regression, I present the analysis of data on engineering students at East Carolina University. The predicted variable is whether or not the student was retained in the engineering program one year after entering it (0 = not retained, 1 = retained – retention is the predicted event). The predictor variables are high school grade point average (for which a one point difference is a very large difference), score on the quantitative section of the Scholastic Aptitude Test (for which an one point difference is trivial) and score on the Openness to Experience scale of a Big Five personality inventory (for which a one point change is of uncertain magnitude to the typical consumer of our research report). Here is the annotated output from the statistical analysis. DESCRIPTIVES VARIABLES=HSGPA SATQ Openness retention /SAVE /STATISTICS=MEAN STDDEV MIN MAX. Descriptives Descriptive Statistics N Minimum Maximum Mean Std. Deviation High School GPA 130 2.223 4.000 3.07143 .393092 Quant SAT 130 410 770 553.54 68.425 Openness 130 14 37 26.22 4.351 retention 130 0 1 .47 .501 Valid N (listwise) 130 LOGISTIC REGRESSION VARIABLES retention /METHOD=ENTER HSGPA SATQ Openness /CRITERIA=PIN(.05) POUT(.10) ITERATE(20) CUT(.5). Logistic Regression Case Processing Summary Unweighted Casesa Selected Cases N Included in Analysis Percent 130 100.0 0 .0 130 100.0 0 .0 130 100.0 Missing Cases Total Unselected Cases Total Dependent Variable Encoding Original Value Internal Value Gone 0 Retained 1 Block 0: Beginning Block Classification Tablea,b Predicted retention Observed Step 0 retention Gone Percentage Retained Correct Gone 69 0 100.0 Retained 61 0 .0 Overall Percentage 53.1 a. Constant is included in the model. b. The cut value is .500 Variables in the Equation B Step 0 Constant S.E. -.123 .176 Wald .492 df Sig. 1 .483 Exp(B) .884 Variables not in the Equation Score Step 0 Variables df HSGPA 6.425 1 .011 SATQ 6.180 1 .013 Openness 2.954 1 .086 15.680 3 .001 Overall Statistics Block 1: Method = Enter Omnibus Tests of Model Coefficients Chi-square Step 1 df Sig. Step 16.631 3 .001 Block 16.631 3 .001 Model 16.631 3 .001 Model Summary Step 1 Sig. -2 Log likelihood 163.094a Cox & Snell R Nagelkerke R Square Square .120 a. Estimation terminated at iteration number 4 because parameter estimates changed by less than .001. .160 Classification Tablea Predicted retention Observed Step 1 retention Gone Percentage Retained Correct Gone 51 18 73.9 Retained 28 33 54.1 Overall Percentage 64.6 a. The cut value is .500 Variables in the Equation B Step 1a HSGPA S.E. Wald df Sig. Exp(B) 1.296 .506 6.569 1 .010 3.656 SATQ .006 .003 4.791 1 .029 1.006 Openness .100 .045 4.832 1 .028 1.105 -10.286 2.743 14.063 1 .000 .000 Constant a. Variable(s) entered on step 1: HSGPA, SATQ, Openness. The novice might look at the B weights or the odds ratios and conclude that high school GPA has a much greater unique contribution to predicting retention than do the other two predictors. Here is the analysis with standardized predictors. Most of the output is identical to that already presented, so I have deleted that output. LOGISTIC REGRESSION VARIABLES retention /METHOD=ENTER ZHSGPA ZSATQ ZOpenness /CRITERIA=PIN(.05) POUT(.10) ITERATE(20) CUT(.5). Logistic Regression Block 1: Method = Enter Variables in the Equation B Step 1a S.E. Wald df Sig. Exp(B) ZHSGPA .510 .199 6.569 1 .010 1.665 ZSATQ .440 .201 4.791 1 .029 1.553 ZOpenness .435 .198 4.832 1 .028 1.545 -.121 .188 .414 1 .520 .886 Constant a. Variable(s) entered on step 1: ZHSGPA, ZSATQ, ZOpenness. Ah-hah. Now it is clear that the contributions of the three predictors differ little from each other. A one standard deviation change in high school GPA increases the odds of retention by a multiplicative factor of 1.665, and one SD increase of quantitative GPA by 1.553, and a one SD increase in openness to experience by 1.545. Notice that the partially standardized B weights are simply the unstandardized B weights multiplied by the predictor’s standard deviation. For example, for openness, 4.351(.100) = .4351. The odds ratio is e .4351 1.545 . Karl L. Wuensch, October, 2009 Return to Karl’s Binary Logistic Regression Lesson