Multinomial Logistic Regression Analysis: Student Status Prediction

Multinomial Logistic Regression As with binomial logistic regression, this technique is employed to predict a categorical variable from a collection of continuous and/or categorical predictors. Unlike with binomial logistic regression, there are more than two levels of the predicted categorical variable. In the summer of 2014 my colleagues and I received feedback on a manuscript we had submitted to a scholarly journal. The categorical variable being predicted was the status of engineering students here at ECU – they were classified as still being in the program, having left the program but in good status, or having left the program in poor status. One of my coauthors had used a discriminant function analysis, but one of the reviewers suggesting using a multinomial logistic regression instead, to avoid the restrictive assumptions associated with a discriminant function analysis. So, I taught myself how to do a multinomial logistic regression, with some help from a colleague in biostatistics. Since the data were in SPSS format, I employed SPSS. Below I present the multinomial logistic analysis recommended by one of our reviewers. Although I have done it in a sequential fashion, for pedagogical purposes, we reported a simultaneous analysis (all the variables thrown in at once, that is, the last step shown below). All of the predictor variables were continuous. To make it easier to compare predictors’ relative importance, I standardized them all to mean 0, standard deviation 1. MSAT is score on the math SAT. VSAT is score on the verbal SAT. HSGPA is high school GPA. ALEKS is score on a mathematics assessment test designed to test a college student’s readiness to take courses that require mastery of mathematics. LOC is locus of control, with high scores representing an external locus of control. The NEO predictors are scores on a Big Five personality test: Openness, Conscientiousness, Extroversion, Agreeableness, and Neuroticism. Descriptive Statistics N Minimum Maximum Mean Std. Deviation MSAT 256 410 780 565.47 62.174 VSAT 256 350 670 492.93 59.728 HSGPA 256 2.22 4.00 3.1167 .34986 ALEKS 256 17 97 53.74 18.985 LOC 256 0 36 13.79 5.950 NEOOpen 256 11 50 26.83 5.663 NEOC 256 14 49 31.57 6.649 NEOE 256 10 46 30.68 5.946 NEOA 256 12 43 28.73 5.460 NEON 256 6 53 25.31 11.284 Valid N (listwise) 256 First I entered the Big Five predictors as a set. Analyze, Regression, Multinomial Logistic. Case Processing Summary N groups Marginal Percentage Poor 68 26.6% Good 85 33.2% Stay 103 40.2% 256 100.0% Valid Missing 0 Total 256 256a Subpopulation a. The dependent variable has only one value observed in 256 (100.0%) subpopulations. Model Fitting Information Model Model Fitting Criteria -2 Log Likelihood Intercept Only 555.273 Final 522.381 Likelihood Ratio Tests Chi-Square 32.892 df Sig. 10 .000 Using these predictors significantly improved the model (compared to a model based only on the differences in group sample sizes). Pseudo R-Square Cox and Snell .121 Nagelkerke .136 McFadden .059 This is an R-squared-like statistic, but cannot really be interpreted as a proportion of variance. I avoid it, but one of our reviewers wanted it. Likelihood Ratio Tests Effect Model Fitting Criteria Likelihood Ratio Tests -2 Log Likelihood of Chi-Square df Sig. Reduced Model Intercept 533.145 10.764 2 .005 ZNEOOpen 523.656 1.274 2 .529 ZNEOC 537.587 15.206 2 .000 ZNEOE 523.370 .989 2 .610 ZNEOA 523.208 .826 2 .662 ZNEON 527.838 5.457 2 .065 The chi-square statistic is the difference in -2 log-likelihoods between the final model and a reduced model. The reduced model is formed by omitting an effect from the final model. The null hypothesis is that all parameters of that effect are 0. Removing consciousness from the model would significantly lower fit between model and data. Neuroticism is nearly significant (but look below). Each predictor has k-1 B weights, each one comparing the reference group with one of the other groups. Here I designated the stay group as the reference group. Parameter Estimates groupsa B Intercept Std. Error Wald df Sig. Exp(B) .404 .184 4.846 1 .028 -.135 .187 .519 1 .471 .874 ZNEOC .658 .213 9.562 1 .002 1.932 ZNEOE .078 .200 .154 1 .695 1.081 ZNEOA -.030 .189 .025 1 .873 .970 ZNEON .233 .214 1.185 1 .276 1.262 ZNEOOpen Good groupsa B Std. Error Intercept Wald df Sig. Exp(B) groupsa .561 .179 9.791 1 .002 -.208 .185 1.270 1 .260 .812 ZNEOC .741 .211 12.372 1 .000 2.099 ZNEOE -.092 .196 .221 1 .638 .912 ZNEOA .121 .189 .410 1 .522 1.129 ZNEON .467 .211 4.893 1 .027 1.595 ZNEOOpen Stay For each one standard deviation increase in conscientiousness, the odds of being in the stay group rather than the poor group more than doubled. For each one standard deviation increase in conscientiousness. the odds of being in the good group rather than the poor group nearly doubled. For each one standard deviation increase in neuroticism the odds of being in the stay group rather than the poor group increased multiplicatively by 1.60. Locus of control was added in the next step. Its addition did not significantly improve the model. Model Fitting Information Model Model Fitting Criteria -2 Log Likelihood Intercept Only 555.273 Final 520.187 Likelihood Ratio Tests Chi-Square df Sig. 35.086 12 .000 Pseudo R-Square Cox and Snell .128 Nagelkerke .145 McFadden .063 Likelihood Ratio Tests Effect Model Fitting Criteria -2 Log Likelihood Likelihood Ratio Tests Chi-Square df Sig. Intercept 531.087 10.901 2 .004 ZNEOOpen 521.362 1.175 2 .556 ZNEOC 536.245 16.058 2 .000 ZNEOE 521.040 .853 2 .653 ZNEOA 521.134 .947 2 .623 ZNEON 524.591 4.405 2 .111 ZLOC 522.381 2.194 2 .334 Parameter Estimates groupsa B Intercept df Sig. Exp(B) .185 4.759 1 .029 -.128 .188 .459 1 .498 .880 ZNEOC .706 .218 10.528 1 .001 2.026 ZNEOE .062 .200 .097 1 .755 1.064 ZNEOA -.065 .193 .112 1 .738 .938 ZNEON .091 .236 .148 1 .700 1.095 ZLOC .282 .198 2.035 1 .154 1.326 Intercept .567 .180 9.946 1 .002 -.201 .186 1.172 1 .279 .818 ZNEOC .759 .214 12.605 1 .000 2.136 ZNEOE -.096 .196 .240 1 .624 .909 ZNEOA .105 .192 .299 1 .585 1.111 ZNEON .410 .230 3.160 1 .075 1.506 ZLOC .114 .192 .355 1 .551 1.121 ZNEOOpen Stay Wald .403 ZNEOOpen Good Std. Error a. The reference category is: Poor. On the third step, ALEKS was added to the model. Model Fitting Information Model Model Fitting Criteria -2 Log Likelihood Intercept Only 555.273 Final 502.495 Pseudo R-Square Cox and Snell .186 Nagelkerke .210 McFadden .095 Likelihood Ratio Tests Chi-Square 52.777 df Sig. 14 .000 Likelihood Ratio Tests Effect Model Fitting Criteria Likelihood Ratio Tests -2 Log Likelihood of Chi-Square df Sig. Reduced Model Intercept 514.751 12.255 2 .002 ZNEOOpen 502.969 .473 2 .789 ZNEOC 517.760 15.265 2 .000 ZNEOE 503.311 .816 2 .665 ZNEOA 503.760 1.265 2 .531 ZNEON 505.689 3.193 2 .203 ZLOC 504.877 2.382 2 .304 ZALEKS 520.187 17.691 2 .000 Parameter Estimates groupsa B Intercept Std. Error Wald df Sig. Exp(B) .502 .197 6.501 1 .011 -.104 .191 .294 1 .587 .901 ZNEOC .743 .222 11.184 1 .001 2.103 ZNEOE .081 .203 .162 1 .687 1.085 ZNEOA -.075 .194 .150 1 .698 .928 ZNEON .084 .239 .122 1 .727 1.087 ZLOC .290 .198 2.136 1 .144 1.337 ZALEKS .341 .187 3.338 1 .068 1.406 Intercept .630 .194 10.536 1 .001 -.129 .193 .451 1 .502 .879 ZNEOC .740 .220 11.271 1 .001 2.096 ZNEOE -.076 .202 .140 1 .708 .927 ZNEOA .124 .197 .395 1 .530 1.132 ZNEON .362 .238 2.314 1 .128 1.436 ZLOC .107 .198 .289 1 .591 1.112 ZALEKS .733 .186 15.499 1 .000 2.081 ZNEOOpen Good ZNEOOpen Stay Adding ALEKS significantly improved the model. Each increase of one standard deviation in ALEKS was associated with a more than doubling of the odds of being in the stay group rather than the poor group. The effect of ALEKS on the odds ratio for good versus poor fell just short of statistical significance. In Step 4 the SAT variables were added. Model Fitting Information Model Model Fitting Criteria Likelihood Ratio Tests -2 Log Likelihood Intercept Only 555.273 Final 493.748 Chi-Square df Sig. 61.525 18 .000 The chi-square for this step is 502.495 – 493.748 = 8.747 on 18-14 = 4 degrees of freedom. That yields a p value of .068. Pseudo R-Square Nagelkerke .241 Likelihood Ratio Tests Effect Model Fitting Criteria -2 Log Likelihood of Likelihood Ratio Tests Chi-Square df Sig. Reduced Model Intercept 505.474 11.726 2 .003 ZNEOOpen 494.425 .677 2 .713 ZNEOC 509.567 15.819 2 .000 ZNEOE 494.480 .732 2 .693 ZNEOA 494.550 .802 2 .670 ZNEON 496.586 2.838 2 .242 ZLOC 496.634 2.886 2 .236 ZALEKS 504.006 10.258 2 .006 ZMSAT 500.976 7.228 2 .027 ZVSAT 496.824 3.076 2 .215 Removing math SAT from the model would significantly reduce the fit of the model to the data, but the effects of math SAT on the two contrasts (stay versus good and stay versus poor) fall short of statistical significance. In another analysis I found that math SAT was significantly associated with the difference between the stay and the good groups, with the odds of being in the stay group rather than the good group increasing multiplicatively by 1.63 for each standard deviation increase in math SAT. Parameter Estimates groupsa B Intercept Std. Error Wald df Sig. Exp(B) .494 .201 6.045 1 .014 -.124 .194 .408 1 .523 .883 ZNEOC .752 .225 11.178 1 .001 2.121 ZNEOE .058 .205 .080 1 .777 1.060 ZNEOA -.048 .196 .060 1 .807 .953 ZNEON .084 .244 .118 1 .731 1.088 ZLOC .327 .202 2.620 1 .106 1.387 ZALEKS .406 .205 3.927 1 .048 1.500 ZMSAT -.241 .213 1.278 1 .258 .786 ZVSAT .315 .206 2.335 1 .126 1.370 Intercept .629 .197 10.151 1 .001 -.157 .195 .649 1 .421 .855 ZNEOC .777 .225 11.965 1 .001 2.176 ZNEOE -.091 .205 .199 1 .655 .913 ZNEOA .112 .198 .319 1 .572 1.119 ZNEON .348 .240 2.101 1 .147 1.416 ZLOC .126 .201 .389 1 .533 1.134 ZALEKS .630 .202 9.735 1 .002 1.878 ZMSAT .266 .214 1.545 1 .214 1.305 ZVSAT .071 .206 .120 1 .729 1.074 ZNEOOpen Good ZNEOOpen Poor In the last step, high school GPA was added to the model. Model Fitting Information Model Model Fitting Criteria -2 Log Likelihood Intercept Only 555.273 Final 473.253 Pseudo R-Square Cox and Snell .274 Nagelkerke .310 McFadden .148 Likelihood Ratio Tests Chi-Square 82.020 df Sig. 20 .000 Likelihood Ratio Tests Effect Model Fitting Criteria Likelihood Ratio Tests -2 Log Likelihood Chi-Square df Sig. Intercept 488.053 14.800 2 .001 ZNEOOpen 473.641 .388 2 .824 ZNEOC 488.933 15.680 2 .000 ZNEOE 473.844 .591 2 .744 ZNEOA 473.951 .698 2 .705 ZNEON 475.236 1.983 2 .371 ZLOC 475.350 2.096 2 .351 ZALEKS 482.546 9.292 2 .010 ZMSAT 480.010 6.757 2 .034 ZVSAT 475.947 2.694 2 .260 ZHSGPA 493.748 20.495 2 .000 Parameter Estimates groupsa B Intercept df Sig. Exp(B) .214 8.526 1 .004 -.102 .202 .251 1 .616 .903 ZNEOC .763 .228 11.140 1 .001 2.144 ZNEOE .118 .215 .301 1 .583 1.125 ZNEOA -.114 .202 .319 1 .573 .892 ZNEON .056 .253 .049 1 .825 1.058 ZLOC .276 .209 1.754 1 .185 1.318 ZALEKS .404 .208 3.762 1 .052 1.498 ZMSAT -.238 .222 1.150 1 .284 .788 ZVSAT .288 .215 1.796 1 .180 1.334 ZHSGPA .667 .197 11.480 1 .001 1.949 Intercept .734 .212 12.034 1 .001 -.125 .204 .373 1 .541 .883 ZNEOC .807 .230 12.298 1 .000 2.241 ZNEOE -.008 .216 .001 1 .972 .992 ZNEOA .030 .207 .022 1 .883 1.031 ZNEON .289 .252 1.314 1 .252 1.335 ZLOC .087 .211 .172 1 .678 1.091 ZALEKS .619 .208 8.833 1 .003 1.858 ZMSAT .249 .226 1.215 1 .270 1.283 ZVSAT .049 .217 .051 1 .820 1.051 ZHSGPA .838 .202 17.161 1 .000 2.312 ZNEOOpen Stay Wald .625 ZNEOOpen Good Std. Error High School GPA, Conscientiousness, ALEKS, and high school GPA contributed significantly to the model. For each one standard deviation increase in high school GPA, the odds of being in the good group rather than the poor group nearly doubled, and the odds of being in the stay group rather than the poor group more than doubled. For each one standard deviation increase in conscientiousness, the odds of being in the stay group rather than the poor group more than doubled, and the same was true when comparing to the good group. For each one standard deviation increase in ALEKS, the odds of being in the stay group rather than the poor group were multiplied by 1.86. The effect of ALEKS on the contrast between the good group and the poor group fell just short of statistical significance. Although the removal of math SAT from the model would significantly reduce the fit of the model to the data, the effect of math SAT on the two focal contrasts fell short of statistical significance. Recall that math SAT did Given the final model, I thought it would be helpful to compare the group means on conscientiousness, ALEKS, math SAT, and high school GPA. I did so with REGWQ tests. When interpreting the results of these tests, it is important to remember that each tests the group differences on one continuous variable ignoring the other continuous variables. The corresponding effects in the logistic regression test the group differences after controlling for all of the other continuous variables. A Posteriori Pairwise Comparisons Between Group Means. Variable Group Conscientiousness HS GPA ALEKS Math SAT A A A Persisting 33.23 3.21 59.82 583.30A A A B LGS 32.24 3.14 52.34 554.00B LPS 28.21B 2.94B 46.28B 552.79B Note: Within each column, means sharing a superscript are not significantly different from each other. N = 256. Karl L. Wuensch, July, 2014. Fair Use of this Document Return to Wuensch’s Stats Lessons

Multinomial Logistic Regression Analysis: Student Status Prediction

Related documents

Products

Support

Multinomial Logistic Regression Analysis: Student Status Prediction

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib