Sequential Multinomial Logistic Regression Analysis

advertisement
Multinomial Logistic Regression
As with binomial logistic regression, this technique is employed to predict a categorical variable
from a collection of continuous and/or categorical predictors. Unlike with binomial logistic regression,
there are more than two levels of the predicted categorical variable.
In the summer of 2014 my colleagues and I received feedback on a manuscript we had
submitted to a scholarly journal. The categorical variable being predicted was the status of
engineering students here at ECU – they were classified as still being in the program, having left the
program but in good status, or having left the program in poor status. One of my coauthors had used a
discriminant function analysis, but one of the reviewers suggesting using a multinomial logistic
regression instead, to avoid the restrictive assumptions associated with a discriminant function
analysis. So, I taught myself how to do a multinomial logistic regression, with some help from a
colleague in biostatistics. Since the data were in SPSS format, I employed SPSS.
Below I present the multinomial logistic analysis recommended by one of our reviewers.
Although I have done it in a sequential fashion, for pedagogical purposes, we reported a simultaneous
analysis (all the variables thrown in at once, that is, the last step shown below). All of the predictor
variables were continuous. To make it easier to compare predictors’ relative importance, I
standardized them all to mean 0, standard deviation 1.
MSAT is score on the math SAT. VSAT is score on the verbal SAT. HSGPA is high school
GPA. ALEKS is score on a mathematics assessment test designed to test a college student’s
readiness to take courses that require mastery of mathematics. LOC is locus of control, with high
scores representing an external locus of control. The NEO predictors are scores on a Big Five
personality test: Openness, Conscientiousness, Extroversion, Agreeableness, and Neuroticism.
Descriptive Statistics
N
Minimum
Maximum
Mean
Std. Deviation
MSAT
256
410
780
565.47
62.174
VSAT
256
350
670
492.93
59.728
HSGPA
256
2.22
4.00
3.1167
.34986
ALEKS
256
17
97
53.74
18.985
LOC
256
0
36
13.79
5.950
NEOOpen
256
11
50
26.83
5.663
NEOC
256
14
49
31.57
6.649
NEOE
256
10
46
30.68
5.946
NEOA
256
12
43
28.73
5.460
NEON
256
6
53
25.31
11.284
Valid N (listwise)
256
First I entered the Big Five predictors as a set. Analyze, Regression, Multinomial Logistic.
Case Processing Summary
N
groups
Marginal Percentage
Poor
68
26.6%
Good
85
33.2%
Stay
103
40.2%
256
100.0%
Valid
Missing
0
Total
256
256a
Subpopulation
a. The dependent variable has only one value observed in 256
(100.0%) subpopulations.
Model Fitting Information
Model
Model Fitting Criteria
-2 Log Likelihood
Intercept Only
555.273
Final
522.381
Likelihood Ratio Tests
Chi-Square
32.892
df
Sig.
10
.000
Using these predictors significantly improved the model (compared to a model based only on the
differences in group sample sizes).
Pseudo R-Square
Cox and Snell
.121
Nagelkerke
.136
McFadden
.059
This is an R-squared-like statistic, but cannot really be interpreted as a proportion of variance. I
avoid it, but one of our reviewers wanted it.
Likelihood Ratio Tests
Effect
Model Fitting Criteria
Likelihood Ratio Tests
-2 Log Likelihood of
Chi-Square
df
Sig.
Reduced Model
Intercept
533.145
10.764
2
.005
ZNEOOpen
523.656
1.274
2
.529
ZNEOC
537.587
15.206
2
.000
ZNEOE
523.370
.989
2
.610
ZNEOA
523.208
.826
2
.662
ZNEON
527.838
5.457
2
.065
The chi-square statistic is the difference in -2 log-likelihoods between the final model and
a reduced model. The reduced model is formed by omitting an effect from the final model.
The null hypothesis is that all parameters of that effect are 0.
Removing consciousness from the model would significantly lower fit between model and data.
Neuroticism is nearly significant (but look below).
Each predictor has k-1 B weights, each one comparing the reference group with one of the other
groups. Here I designated the stay group as the reference group.
Parameter Estimates
groupsa
B
Intercept
Std. Error
Wald
df
Sig.
Exp(B)
.404
.184
4.846
1
.028
-.135
.187
.519
1
.471
.874
ZNEOC
.658
.213
9.562
1
.002
1.932
ZNEOE
.078
.200
.154
1
.695
1.081
ZNEOA
-.030
.189
.025
1
.873
.970
ZNEON
.233
.214
1.185
1
.276
1.262
ZNEOOpen
Good
groupsa
B
Std. Error
Intercept
Wald
df
Sig.
Exp(B)
groupsa
.561
.179
9.791
1
.002
-.208
.185
1.270
1
.260
.812
ZNEOC
.741
.211
12.372
1
.000
2.099
ZNEOE
-.092
.196
.221
1
.638
.912
ZNEOA
.121
.189
.410
1
.522
1.129
ZNEON
.467
.211
4.893
1
.027
1.595
ZNEOOpen
Stay
For each one standard deviation increase in conscientiousness, the odds of being in the stay
group rather than the poor group more than doubled.
For each one standard deviation increase in conscientiousness. the odds of being in the good
group rather than the poor group nearly doubled.
For each one standard deviation increase in neuroticism the odds of being in the stay group
rather than the poor group increased multiplicatively by 1.60.
Locus of control was added in the next step. Its addition did not significantly improve the model.
Model Fitting Information
Model
Model Fitting Criteria
-2 Log Likelihood
Intercept Only
555.273
Final
520.187
Likelihood Ratio Tests
Chi-Square
df
Sig.
35.086
12
.000
Pseudo R-Square
Cox and Snell
.128
Nagelkerke
.145
McFadden
.063
Likelihood Ratio Tests
Effect
Model Fitting Criteria
-2 Log Likelihood
Likelihood Ratio Tests
Chi-Square
df
Sig.
Intercept
531.087
10.901
2
.004
ZNEOOpen
521.362
1.175
2
.556
ZNEOC
536.245
16.058
2
.000
ZNEOE
521.040
.853
2
.653
ZNEOA
521.134
.947
2
.623
ZNEON
524.591
4.405
2
.111
ZLOC
522.381
2.194
2
.334
Parameter Estimates
groupsa
B
Intercept
df
Sig.
Exp(B)
.185
4.759
1
.029
-.128
.188
.459
1
.498
.880
ZNEOC
.706
.218
10.528
1
.001
2.026
ZNEOE
.062
.200
.097
1
.755
1.064
ZNEOA
-.065
.193
.112
1
.738
.938
ZNEON
.091
.236
.148
1
.700
1.095
ZLOC
.282
.198
2.035
1
.154
1.326
Intercept
.567
.180
9.946
1
.002
-.201
.186
1.172
1
.279
.818
ZNEOC
.759
.214
12.605
1
.000
2.136
ZNEOE
-.096
.196
.240
1
.624
.909
ZNEOA
.105
.192
.299
1
.585
1.111
ZNEON
.410
.230
3.160
1
.075
1.506
ZLOC
.114
.192
.355
1
.551
1.121
ZNEOOpen
Stay
Wald
.403
ZNEOOpen
Good
Std. Error
a. The reference category is: Poor.
On the third step, ALEKS was added to the model.
Model Fitting Information
Model
Model Fitting Criteria
-2 Log Likelihood
Intercept Only
555.273
Final
502.495
Pseudo R-Square
Cox and Snell
.186
Nagelkerke
.210
McFadden
.095
Likelihood Ratio Tests
Chi-Square
52.777
df
Sig.
14
.000
Likelihood Ratio Tests
Effect
Model Fitting Criteria
Likelihood Ratio Tests
-2 Log Likelihood of
Chi-Square
df
Sig.
Reduced Model
Intercept
514.751
12.255
2
.002
ZNEOOpen
502.969
.473
2
.789
ZNEOC
517.760
15.265
2
.000
ZNEOE
503.311
.816
2
.665
ZNEOA
503.760
1.265
2
.531
ZNEON
505.689
3.193
2
.203
ZLOC
504.877
2.382
2
.304
ZALEKS
520.187
17.691
2
.000
Parameter Estimates
groupsa
B
Intercept
Std. Error
Wald
df
Sig.
Exp(B)
.502
.197
6.501
1
.011
-.104
.191
.294
1
.587
.901
ZNEOC
.743
.222
11.184
1
.001
2.103
ZNEOE
.081
.203
.162
1
.687
1.085
ZNEOA
-.075
.194
.150
1
.698
.928
ZNEON
.084
.239
.122
1
.727
1.087
ZLOC
.290
.198
2.136
1
.144
1.337
ZALEKS
.341
.187
3.338
1
.068
1.406
Intercept
.630
.194
10.536
1
.001
-.129
.193
.451
1
.502
.879
ZNEOC
.740
.220
11.271
1
.001
2.096
ZNEOE
-.076
.202
.140
1
.708
.927
ZNEOA
.124
.197
.395
1
.530
1.132
ZNEON
.362
.238
2.314
1
.128
1.436
ZLOC
.107
.198
.289
1
.591
1.112
ZALEKS
.733
.186
15.499
1
.000
2.081
ZNEOOpen
Good
ZNEOOpen
Stay
Adding ALEKS significantly improved the model. Each increase of one standard deviation in
ALEKS was associated with a more than doubling of the odds of being in the stay group rather than the
poor group. The effect of ALEKS on the odds ratio for good versus poor fell just short of statistical
significance.
In Step 4 the SAT variables were added.
Model Fitting Information
Model
Model Fitting Criteria
Likelihood Ratio Tests
-2 Log Likelihood
Intercept Only
555.273
Final
493.748
Chi-Square
df
Sig.
61.525
18
.000
The chi-square for this step is 502.495 – 493.748 = 8.747 on 18-14 = 4 degrees of freedom.
That yields a p value of .068.
Pseudo R-Square
Nagelkerke
.241
Likelihood Ratio Tests
Effect
Model Fitting Criteria
-2 Log Likelihood of
Likelihood Ratio Tests
Chi-Square
df
Sig.
Reduced Model
Intercept
505.474
11.726
2
.003
ZNEOOpen
494.425
.677
2
.713
ZNEOC
509.567
15.819
2
.000
ZNEOE
494.480
.732
2
.693
ZNEOA
494.550
.802
2
.670
ZNEON
496.586
2.838
2
.242
ZLOC
496.634
2.886
2
.236
ZALEKS
504.006
10.258
2
.006
ZMSAT
500.976
7.228
2
.027
ZVSAT
496.824
3.076
2
.215
Removing math SAT from the model would significantly reduce the fit of the model to the data,
but the effects of math SAT on the two contrasts (stay versus good and stay versus poor) fall short of
statistical significance. In another analysis I found that math SAT was significantly associated with the
difference between the stay and the good groups, with the odds of being in the stay group rather than
the good group increasing multiplicatively by 1.63 for each standard deviation increase in math SAT.
Parameter Estimates
groupsa
B
Intercept
Std. Error
Wald
df
Sig.
Exp(B)
.494
.201
6.045
1
.014
-.124
.194
.408
1
.523
.883
ZNEOC
.752
.225
11.178
1
.001
2.121
ZNEOE
.058
.205
.080
1
.777
1.060
ZNEOA
-.048
.196
.060
1
.807
.953
ZNEON
.084
.244
.118
1
.731
1.088
ZLOC
.327
.202
2.620
1
.106
1.387
ZALEKS
.406
.205
3.927
1
.048
1.500
ZMSAT
-.241
.213
1.278
1
.258
.786
ZVSAT
.315
.206
2.335
1
.126
1.370
Intercept
.629
.197
10.151
1
.001
-.157
.195
.649
1
.421
.855
ZNEOC
.777
.225
11.965
1
.001
2.176
ZNEOE
-.091
.205
.199
1
.655
.913
ZNEOA
.112
.198
.319
1
.572
1.119
ZNEON
.348
.240
2.101
1
.147
1.416
ZLOC
.126
.201
.389
1
.533
1.134
ZALEKS
.630
.202
9.735
1
.002
1.878
ZMSAT
.266
.214
1.545
1
.214
1.305
ZVSAT
.071
.206
.120
1
.729
1.074
ZNEOOpen
Good
ZNEOOpen
Poor
In the last step, high school GPA was added to the model.
Model Fitting Information
Model
Model Fitting Criteria
-2 Log Likelihood
Intercept Only
555.273
Final
473.253
Pseudo R-Square
Cox and Snell
.274
Nagelkerke
.310
McFadden
.148
Likelihood Ratio Tests
Chi-Square
82.020
df
Sig.
20
.000
Likelihood Ratio Tests
Effect
Model Fitting Criteria
Likelihood Ratio Tests
-2 Log Likelihood
Chi-Square
df
Sig.
Intercept
488.053
14.800
2
.001
ZNEOOpen
473.641
.388
2
.824
ZNEOC
488.933
15.680
2
.000
ZNEOE
473.844
.591
2
.744
ZNEOA
473.951
.698
2
.705
ZNEON
475.236
1.983
2
.371
ZLOC
475.350
2.096
2
.351
ZALEKS
482.546
9.292
2
.010
ZMSAT
480.010
6.757
2
.034
ZVSAT
475.947
2.694
2
.260
ZHSGPA
493.748
20.495
2
.000
Parameter Estimates
groupsa
B
Intercept
df
Sig.
Exp(B)
.214
8.526
1
.004
-.102
.202
.251
1
.616
.903
ZNEOC
.763
.228
11.140
1
.001
2.144
ZNEOE
.118
.215
.301
1
.583
1.125
ZNEOA
-.114
.202
.319
1
.573
.892
ZNEON
.056
.253
.049
1
.825
1.058
ZLOC
.276
.209
1.754
1
.185
1.318
ZALEKS
.404
.208
3.762
1
.052
1.498
ZMSAT
-.238
.222
1.150
1
.284
.788
ZVSAT
.288
.215
1.796
1
.180
1.334
ZHSGPA
.667
.197
11.480
1
.001
1.949
Intercept
.734
.212
12.034
1
.001
-.125
.204
.373
1
.541
.883
ZNEOC
.807
.230
12.298
1
.000
2.241
ZNEOE
-.008
.216
.001
1
.972
.992
ZNEOA
.030
.207
.022
1
.883
1.031
ZNEON
.289
.252
1.314
1
.252
1.335
ZLOC
.087
.211
.172
1
.678
1.091
ZALEKS
.619
.208
8.833
1
.003
1.858
ZMSAT
.249
.226
1.215
1
.270
1.283
ZVSAT
.049
.217
.051
1
.820
1.051
ZHSGPA
.838
.202
17.161
1
.000
2.312
ZNEOOpen
Stay
Wald
.625
ZNEOOpen
Good
Std. Error
High School GPA, Conscientiousness, ALEKS, and high school GPA contributed significantly to
the model.
For each one standard deviation increase in high school GPA, the odds of being in the good
group rather than the poor group nearly doubled, and the odds of being in the stay group rather than the
poor group more than doubled.
For each one standard deviation increase in conscientiousness, the odds of being in the stay
group rather than the poor group more than doubled, and the same was true when comparing to the
good group.
For each one standard deviation increase in ALEKS, the odds of being in the stay group rather
than the poor group were multiplied by 1.86. The effect of ALEKS on the contrast between the good
group and the poor group fell just short of statistical significance.
Although the removal of math SAT from the model would significantly reduce the fit of the model
to the data, the effect of math SAT on the two focal contrasts fell short of statistical significance. Recall
that math SAT did
Given the final model, I thought it would be helpful to compare the group means on
conscientiousness, ALEKS, math SAT, and high school GPA. I did so with REGWQ tests. When
interpreting the results of these tests, it is important to remember that each tests the group differences
on one continuous variable ignoring the other continuous variables. The corresponding effects in the
logistic regression test the group differences after controlling for all of the other continuous variables.
A Posteriori Pairwise Comparisons Between Group Means.
Variable
Group
Conscientiousness
HS GPA
ALEKS
Math SAT
A
A
A
Persisting
33.23
3.21
59.82
583.30A
A
A
B
LGS
32.24
3.14
52.34
554.00B
LPS
28.21B
2.94B
46.28B
552.79B
Note: Within each column, means sharing a superscript are not significantly different from each other.
N = 256.
Karl L. Wuensch, July, 2014.
Fair Use of this Document
Return to Wuensch’s Stats Lessons
Download