Multinomial Logit Preliminaries Outcomes are unordered and hence arbitrarily shifting “baseline” makes no difference. This is even easy to see in garden variety logit model: Here is our model: . reg winlose leader priorm scandal Source | SS df MS -------------+-----------------------------Model | 21.9022157 3 7.30073858 Residual | 525.065814 5032 .104345353 -------------+-----------------------------Total | 546.96803 5035 .108633174 Number of obs F( 3, 5032) Prob > F R-squared Adj R-squared Root MSE = = = = = = 5036 69.97 0.0000 0.0400 0.0395 .32303 -----------------------------------------------------------------------------winlose | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------leader | -.0727539 .0274624 -2.65 0.008 -.1265921 -.0189157 priorm | -.0016756 .0001633 -10.26 0.000 -.0019956 -.0013555 scandal | .4251082 .0404513 10.51 0.000 .345806 .5044103 _cons | .1448303 .00733 19.76 0.000 .1304603 .1592003 -----------------------------------------------------------------------------Now here is the same d.v. but with the category scores reversed: . gen winlose2=1-winlose (476 missing values generated) . reg winlose2 leader priorm scandal Source | SS df MS -------------+-----------------------------Model | 21.9022157 3 7.30073858 Residual | 525.065814 5032 .104345353 -------------+-----------------------------Total | 546.96803 5035 .108633174 Number of obs F( 3, 5032) Prob > F R-squared Adj R-squared Root MSE = = = = = = 5036 69.97 0.0000 0.0400 0.0395 .32303 -----------------------------------------------------------------------------winlose2 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------leader | .0727539 .0274624 2.65 0.008 .0189157 .1265921 priorm | .0016756 .0001633 10.26 0.000 .0013555 .0019956 scandal | -.4251082 .0404513 -10.51 0.000 -.5044103 -.345806 _cons | .8551697 .00733 116.67 0.000 .8407997 .8695397 Differences? None except sign-shift on the covariates. The intercept changes value to reflect the case that the baseline is different. Now let’s enter the world of MNL. Here is my working dependent variable. It denotes career outcomes: 0=win election; 1=lose in general; 2=lose in primary; 3=retire; 4=run for higher office. Are these categories unordered? Most likely yes. 1 . table event ---------------------event | Freq. ----------+----------0 | 4,593 1 | 313 2 | 69 3 | 250 4 | 226 Suppose I estimated the following model? . reg event leader priorm scandal Source | SS df MS -------------+-----------------------------Model | 22.3391973 3 7.4463991 Residual | 5570.61346 5425 1.02684119 -------------+-----------------------------Total | 5592.95266 5428 1.03038922 Number of obs F( 3, 5425) Prob > F R-squared Adj R-squared Root MSE = = = = = = 5429 7.25 0.0001 0.0040 0.0034 1.0133 -----------------------------------------------------------------------------event | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------leader | -.0387411 .0825463 -0.47 0.639 -.200565 .1230828 priorm | -.0014561 .0004937 -2.95 0.003 -.0024239 -.0004883 scandal | .465824 .1254223 3.71 0.000 .219946 .7117021 _cons | .4312325 .0221377 19.48 0.000 .3878336 .4746313 What problems do you see with this? What does y-hat refer to in this model? (Not much). This is a dumb thing to do so be careful and don’t do it! Here is another approach (which is better than above, but still troublesome). Since we know that MNL will yield (k+1)(J-1) sets of parameters, we may want to collapse scores on y to produce a simpler binary dependent variable (i.e. 1,0). Suppose I did that with the “event” dependent variable. . logit event leader priorm scandal Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log likelihood likelihood likelihood likelihood = = = = -2359.1841 -2306.8604 -2301.2029 -2301.1905 Logit estimates Number of obs LR chi2(3) Prob > chi2 Pseudo R2 Log likelihood = -2301.1905 = = = = 5429 115.99 0.0000 0.0246 -----------------------------------------------------------------------------event | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------leader | -.1651033 .2451446 -0.67 0.501 -.6455779 .3153713 priorm | -.013588 .0016015 -8.48 0.000 -.0167268 -.0104492 scandal | 1.728522 .2574409 6.71 0.000 1.223947 2.233096 _cons | -1.275881 .0593058 -21.51 0.000 -1.392118 -1.159643 ------------------------------------------------------------------------------ 2 What does this model reveal? What are they? It is a garden variety logit but it has problems. How about a multinomial logit? . mlogit event leader priorm scandal Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log log likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood likelihood = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = -3451.2538 -3376.4258 -3285.2495 -3269.0735 -3267.4885 -3267.32 -3267.2616 -3267.2402 -3267.2324 -3267.2295 -3267.2284 -3267.228 -3267.2279 -3267.2278 -3267.2278 -3267.2278 -3267.2278 -3267.2278 -3267.2278 -3267.2278 -3267.2278 -3267.2278 -3267.2278 -3267.2278 -3267.2278 -3267.2278 -3267.2278 -3267.2278 -3267.2278 -3267.2278 -3267.2278 -3267.2278 Multinomial regression Number of obs LR chi2(12) Prob > chi2 Pseudo R2 Log likelihood = -3267.2278 = = = = 5429 368.05 0.0000 0.0533 -----------------------------------------------------------------------------event | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------1 | leader | -.4305763 .5058561 -0.85 0.395 -1.422036 .5608835 priorm | -.0606966 .0047344 -12.82 0.000 -.0699759 -.0514174 scandal | 2.60861 .3772726 6.91 0.000 1.869169 3.34805 _cons | -1.362504 .0938093 -14.52 0.000 -1.546367 -1.178642 -------------+---------------------------------------------------------------2 | leader | -30.89904 2509503 -0.00 1.000 -4918566 4918504 priorm | -.0036454 .0045901 -0.79 0.427 -.0126418 .005351 scandal | 3.192746 .4085509 7.81 0.000 2.392001 3.993491 _cons | -4.171352 .2008908 -20.76 0.000 -4.565091 -3.777613 -------------+---------------------------------------------------------------3 | leader | .722304 .2820702 2.56 0.010 .1694565 1.275151 3 priorm | -.0030767 .0024212 -1.27 0.204 -.0078222 .0016688 scandal | 1.276717 .4076773 3.13 0.002 .4776845 2.07575 _cons | -2.855778 .1064539 -26.83 0.000 -3.064424 -2.647132 -------------+---------------------------------------------------------------4 | leader | -1.857627 1.006686 -1.85 0.065 -3.830696 .1154417 priorm | -.0003354 .0024513 -0.14 0.891 -.0051398 .004469 scandal | -32.30673 8431210 -0.00 1.000 -1.65e+07 1.65e+07 _cons | -2.977372 .1115288 -26.70 0.000 -3.195965 -2.75878 -----------------------------------------------------------------------------(Outcome event==0 is the comparison group) I left the iteration log in the handout to demonstrate that the likelihood function for the MNL model is more complicated than for the standard logit insofar as there are multiple equations and many more parameters to estimate. Not that we have 5 outcomes and 3+1 parameters to estimate per contrast. This gives us 4*4=16 total parameters to interpret. Interpretation is important! Simply reporting the J-1 logits is not sufficient. The question is, how to interpret. Note that these models are very similar to “stand-alone” logits. To see this, consider the following: . logit contrast1 leader priorm scandal Iteration Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: 5: log log log log log log likelihood likelihood likelihood likelihood likelihood likelihood = = = = = = -1157.6154 -1056.2882 -1009.2034 -1002.0104 -1001.6943 -1001.6936 Logit estimates Number of obs LR chi2(3) Prob > chi2 Pseudo R2 Log likelihood = -1001.6936 = = = = 4888 311.84 0.0000 0.1347 -----------------------------------------------------------------------------contrast1 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------leader | -.5861779 .5372326 -1.09 0.275 -1.639135 .4667787 priorm | -.0617046 .0048102 -12.83 0.000 -.0711324 -.0522767 scandal | 2.865274 .4096536 6.99 0.000 2.062368 3.66818 _cons | -1.343004 .0947278 -14.18 0.000 -1.528667 -1.157341 -----------------------------------------------------------------------------. logit contrast2 leader priorm scandal note: leader~=0 predicts failure perfectly leader dropped and 136 obs not used Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: log log log log log likelihood likelihood likelihood likelihood likelihood = = = = = -356.88573 -341.31356 -338.189 -338.15703 -338.15702 Logit estimates Number of obs LR chi2(2) Prob > chi2 Pseudo R2 Log likelihood = -338.15702 = = = = 4510 37.46 0.0000 0.0525 ------------------------------------------------------------------------------ 4 contrast2 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------priorm | -.0040041 .0046536 -0.86 0.390 -.0131249 .0051168 scandal | 3.274799 .4165694 7.86 0.000 2.458338 4.09126 _cons | -4.159832 .2017215 -20.62 0.000 -4.555198 -3.764465 -----------------------------------------------------------------------------. logit contrast3 leader priorm scandal Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: log log log log log likelihood likelihood likelihood likelihood likelihood = = = = = -980.57871 -979.30912 -972.50717 -972.38181 -972.38146 Logit estimates Number of obs LR chi2(3) Prob > chi2 Pseudo R2 Log likelihood = -972.38146 = = = = 4826 16.39 0.0009 0.0084 -----------------------------------------------------------------------------contrast3 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------leader | .7326877 .2810882 2.61 0.009 .1817649 1.28361 priorm | -.0031108 .002431 -1.28 0.201 -.0078755 .0016539 scandal | 1.294154 .4066588 3.18 0.001 .4971172 2.09119 _cons | -2.85535 .1067299 -26.75 0.000 -3.064537 -2.646164 -----------------------------------------------------------------------------. logit contrast4 leader priorm scandal note: scandal~=0 predicts failure perfectly scandal dropped and 37 obs not used Iteration Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: 5: log log log log log log likelihood likelihood likelihood likelihood likelihood likelihood = = = = = = -900.40208 -897.69912 -897.13307 -897.07418 -897.07303 -897.07303 Logit estimates Number of obs LR chi2(2) Prob > chi2 Pseudo R2 Log likelihood = -897.07303 = = = = 4763 6.66 0.0358 0.0037 -----------------------------------------------------------------------------contrast4 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------leader | -1.842536 1.006826 -1.83 0.067 -3.815878 .1308067 priorm | -.0002995 .0024752 -0.12 0.904 -.0051509 .0045518 _cons | -2.979101 .1122009 -26.55 0.000 -3.199011 -2.759191 What does this exercise show? It illustrates the very close connection between logit and MNL. Indeed, it is easy to see that logit is a special case of the MNL model. The parameter estimates you get from MNL will be very similar to estimates you would get if you estimated “stand alone” logit models. (What is going on with “contrast 2”). Now return to the MNL results. The question is how to interpret the results. We could compute probabilities. In Stata this would require us to reference equations, as mlogit is a multiequation model. 5 We could use Stata’s predict option and generate 5 sets of probabilities. requires us to create 5 new variables containing the probabilities. This . predict p1 p2 p3 p4 p5, pr (22 missing values generated) Here, p1 is the probability of a “0” vs. all other categories (this is the baseline probability); p2 is the probability of a “1” vs. “0”; p3 is the probability of a “2” vs. “0”; p4 is the probability of a “3” vs. “0”; p5 is the probability of a “4” vs. “0”. Note that if you use Stata’s canned predict procedure, this computes probabilities for all covariate profiles. Sometimes, as with the logit model, you will want to compute “scenario” probabilities. If this is the case, you need to compute the probabilities directly. Suppose we want to compute the probability of a “1” vs. “0” letting the prior margin variable assume all values and for an incumbent who is not in the leadership and not embroiled by scandal. . gen prob_scen1=exp([1]_b[leader]*0+[1]_b[scandal]*0+[1]_b[priorm]*priorm)/(1+exp([1]_b[leader]*0+[1]_b[scandal]*0+ _b[priorm]*priorm)+exp([2]_b[leader]*0+[2]_b[scandal]*0+[2]_b[priorm]*priorm)+exp([3]_b[leader]*0+[3]_b[scand al]*0+[3]_b[priorm]*priorm)+exp([4]_b[leader]*0+[4]_b[scandal]*0+[4]_b[priorm]*priorm)) (22 missing values generated) Can you read that? This is ugly but it is the way to compute the probability directly. This code, though ugly, is useful if you want to compute probabilities for various scenarios. Of course since the MNL is a direct extension of the logit model, use of odds ratios are also possible (and advisable). The interpretation of the odds ratio in this setting is identical to the interpretation in the binary setting--there are just more odds ratios to compute. Stata will compute these for you. If after estimating an mnl model you type: . mlogit, rr Multinomial regression Number of obs LR chi2(12) Prob > chi2 Pseudo R2 Log likelihood = -3267.2278 = = = = 5429 368.05 0.0000 0.0533 -----------------------------------------------------------------------------event | RRR Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------1 | leader | .6501343 .3288744 -0.85 0.395 .2412224 1.75222 priorm | .9411087 .0044556 -12.82 0.000 .9324163 .9498821 scandal | 13.58016 5.123422 6.91 0.000 6.482906 28.44722 -------------+---------------------------------------------------------------2 | leader | 3.81e-14 9.56e-08 -0.00 1.000 0 . priorm | .9963613 .0045734 -0.79 0.427 .9874378 1.005365 scandal | 24.35523 9.950349 7.81 0.000 10.93536 54.24395 -------------+---------------------------------------------------------------3 | leader | 2.059172 .5808311 2.56 0.010 1.184661 3.579243 priorm | .996928 .0024138 -1.27 0.204 .9922083 1.00167 scandal | 3.584852 1.461463 3.13 0.002 1.612337 7.970522 6 -------------+---------------------------------------------------------------4 | leader | .1560425 .1570858 -1.85 0.065 .0216945 1.122369 priorm | .9996646 .0024504 -0.14 0.891 .9948734 1.004479 scandal | 9.32e-15 7.86e-08 -0.00 1.000 0 . -----------------------------------------------------------------------------(Outcome event==0 is the comparison group) Then Stata will give you the odds ratios (which it refers to as “relative risk ratios” (or RRR). Verify that the odds ratios here are equivalent to the exponentiated coefficients from the MNL model. So interpretation of the MNL model is not a problem: it follows directly on the heels of the binary logit model. The only kick is there are more parameters to estimate. One final thing: note that the baseline case is arbitrary. Suppose we make outcome “1” the baseline. In Stata we can do this by typing: . mlogit event leader priorm scandal, base(1) Iteration 0: … Iteration 33: log likelihood = -3451.2538 log likelihood = -3267.2278 Multinomial regression Number of obs LR chi2(12) Prob > chi2 Pseudo R2 Log likelihood = -3267.2278 = = = = 5429 368.05 0.0000 0.0533 -----------------------------------------------------------------------------event | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------0 | leader | .4305763 .5058561 0.85 0.395 -.5608835 1.422036 priorm | .0606966 .0047344 12.82 0.000 .0514174 .0699759 scandal | -2.60861 .3772726 -6.91 0.000 -3.34805 -1.869169 _cons | 1.362504 .0938093 14.52 0.000 1.178642 1.546367 -------------+---------------------------------------------------------------2 | leader | -32.46847 6821535 -0.00 1.000 -1.34e+07 1.34e+07 priorm | .0570513 .0065238 8.75 0.000 .0442649 .0698377 scandal | .5841368 .4925805 1.19 0.236 -.3813033 1.549577 _cons | -2.808848 .218587 -12.85 0.000 -3.23727 -2.380425 -------------+---------------------------------------------------------------3 | leader | 1.15288 .5597458 2.06 0.039 .0557986 2.249962 priorm | .0576199 .0052518 10.97 0.000 .0473267 .0679132 scandal | -1.331892 .5028736 -2.65 0.008 -2.317506 -.3462784 _cons | -1.493273 .1374234 -10.87 0.000 -1.762618 -1.223929 -------------+---------------------------------------------------------------4 | leader | -1.427051 1.121111 -1.27 0.203 -3.624387 .7702861 priorm | .0603612 .0052794 11.43 0.000 .0500138 .0707087 scandal | -36.91534 2.29e+07 -0.00 1.000 -4.49e+07 4.49e+07 _cons | -1.614868 .1416112 -11.40 0.000 -1.892421 -1.337315 -----------------------------------------------------------------------------(Outcome event==1 is the comparison group) Note that the log-likelihood is the same, as it must be, compared to the previous model. The only difference is the equations are now referencing category 1, not category 0, as the baseline category. The coefficients seem to 7 change, but this is only because the baseline category has changed. inferences from one model would be identical to the other. The To see something neat, look at the coefficient estimate for the scandal variable for outcome “2” and outcome “0”. If we take the following difference: . display [2]_b[scandal]-[0]_b[scandal] we obtain 3.192746. This is exactly equal to the value of the scandal coefficient for outcome “2” in our first MNL model. Thus, the difference in contrasts (2 v. 1)-(0 v. 1)=(2 v. 0) for the scandal coefficient. Because the baseline category is arbitrary, any contrast can be derived from any MNL model with just a little bit of manipulation. 8