Standardizing Continuous Predictors In Binary Logistic Regression

advertisement
Standardizing Continuous Predictors In Binary Logistic Regression
Determining the relative importance of continuously distributed predictors in a
binary logistic regression can be made much easier by standardizing those predictors.
Typically such predictors are not measured in the same metric, and often the metric is
arbitrary. The odds ratio tells one by what multiplicative factor the odds of the predicted
event increase per one unit change in the predictor variable. A one unit change in the
predictor variable may be a very small change (one gram increase in body mass) or it
may be a very large change (number of limbs amputated) or it may be a change of
uncertain magnitude (score on a Likert-type scale designed to measure political
conservatism). Standardizing the continuous predictor variables puts them on the same
scale (standard deviation units).
To illustrate the benefit of standardizing continuous predictor variables in binary
logistic regression, I present the analysis of data on engineering students at East
Carolina University. The predicted variable is whether or not the student was retained
in the engineering program one year after entering it (0 = not retained, 1 = retained –
retention is the predicted event). The predictor variables are high school grade point
average (for which a one point difference is a very large difference), score on the
quantitative section of the Scholastic Aptitude Test (for which an one point difference is
trivial) and score on the Openness to Experience scale of a Big Five personality
inventory (for which a one point change is of uncertain magnitude to the typical
consumer of our research report).
Here is the annotated output from the statistical analysis.
DESCRIPTIVES VARIABLES=HSGPA SATQ Openness retention
/SAVE
/STATISTICS=MEAN STDDEV MIN MAX.
Descriptives
Descriptive Statistics
N
Minimum
Maximum
Mean
Std. Deviation
High School GPA
130
2.223
4.000
3.07143
.393092
Quant SAT
130
410
770
553.54
68.425
Openness
130
14
37
26.22
4.351
retention
130
0
1
.47
.501
Valid N (listwise)
130
LOGISTIC REGRESSION VARIABLES retention
/METHOD=ENTER HSGPA SATQ Openness
/CRITERIA=PIN(.05) POUT(.10) ITERATE(20) CUT(.5).
Logistic Regression
Case Processing Summary
Unweighted Casesa
Selected Cases
N
Included in Analysis
Percent
130
100.0
0
.0
130
100.0
0
.0
130
100.0
Missing Cases
Total
Unselected Cases
Total
Dependent Variable Encoding
Original Value
Internal Value
Gone
0
Retained
1
Block 0: Beginning Block
Classification Tablea,b
Predicted
retention
Observed
Step 0
retention
Gone
Percentage
Retained
Correct
Gone
69
0
100.0
Retained
61
0
.0
Overall Percentage
53.1
a. Constant is included in the model.
b. The cut value is .500
Variables in the Equation
B
Step 0
Constant
S.E.
-.123
.176
Wald
.492
df
Sig.
1
.483
Exp(B)
.884
Variables not in the Equation
Score
Step 0
Variables
df
HSGPA
6.425
1
.011
SATQ
6.180
1
.013
Openness
2.954
1
.086
15.680
3
.001
Overall Statistics
Block 1: Method = Enter
Omnibus Tests of Model Coefficients
Chi-square
Step 1
df
Sig.
Step
16.631
3
.001
Block
16.631
3
.001
Model
16.631
3
.001
Model Summary
Step
1
Sig.
-2 Log likelihood
163.094a
Cox & Snell R
Nagelkerke R
Square
Square
.120
a. Estimation terminated at iteration number 4 because
parameter estimates changed by less than .001.
.160
Classification Tablea
Predicted
retention
Observed
Step 1
retention
Gone
Percentage
Retained
Correct
Gone
51
18
73.9
Retained
28
33
54.1
Overall Percentage
64.6
a. The cut value is .500
Variables in the Equation
B
Step 1a
HSGPA
S.E.
Wald
df
Sig.
Exp(B)
1.296
.506
6.569
1
.010
3.656
SATQ
.006
.003
4.791
1
.029
1.006
Openness
.100
.045
4.832
1
.028
1.105
-10.286
2.743
14.063
1
.000
.000
Constant
a. Variable(s) entered on step 1: HSGPA, SATQ, Openness.
The novice might look at the B weights or the odds ratios and conclude that high
school GPA has a much greater unique contribution to predicting retention than do the
other two predictors.
Here is the analysis with standardized predictors. Most of the output is identical to that
already presented, so I have deleted that output.
LOGISTIC REGRESSION VARIABLES retention
/METHOD=ENTER ZHSGPA ZSATQ ZOpenness
/CRITERIA=PIN(.05) POUT(.10) ITERATE(20) CUT(.5).
Logistic Regression
Block 1: Method = Enter
Variables in the Equation
B
Step 1a
S.E.
Wald
df
Sig.
Exp(B)
ZHSGPA
.510
.199
6.569
1
.010
1.665
ZSATQ
.440
.201
4.791
1
.029
1.553
ZOpenness
.435
.198
4.832
1
.028
1.545
-.121
.188
.414
1
.520
.886
Constant
a. Variable(s) entered on step 1: ZHSGPA, ZSATQ, ZOpenness.
Ah-hah. Now it is clear that the contributions of the three predictors differ little
from each other. A one standard deviation change in high school GPA increases the
odds of retention by a multiplicative factor of 1.665, and one SD increase of quantitative
GPA by 1.553, and a one SD increase in openness to experience by 1.545.
Notice that the partially standardized B weights are simply the unstandardized B
weights multiplied by the predictor’s standard deviation. For example, for openness,
4.351(.100) = .4351. The odds ratio is e .4351  1.545 .
Karl L. Wuensch, October, 2009
Return to Karl’s Binary Logistic Regression Lesson
Download