Multinomial Logit (Word format)

advertisement
Multinomial Logit
Preliminaries
Outcomes are unordered and hence arbitrarily shifting “baseline” makes no
difference. This is even easy to see in garden variety logit model:
Here is our model:
. reg winlose leader priorm scandal
Source |
SS
df
MS
-------------+-----------------------------Model | 21.9022157
3 7.30073858
Residual | 525.065814 5032 .104345353
-------------+-----------------------------Total |
546.96803 5035 .108633174
Number of obs
F( 3, 5032)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
5036
69.97
0.0000
0.0400
0.0395
.32303
-----------------------------------------------------------------------------winlose |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------leader | -.0727539
.0274624
-2.65
0.008
-.1265921
-.0189157
priorm | -.0016756
.0001633
-10.26
0.000
-.0019956
-.0013555
scandal |
.4251082
.0404513
10.51
0.000
.345806
.5044103
_cons |
.1448303
.00733
19.76
0.000
.1304603
.1592003
-----------------------------------------------------------------------------Now here is the same d.v. but with the category scores reversed:
. gen winlose2=1-winlose
(476 missing values generated)
. reg winlose2 leader priorm scandal
Source |
SS
df
MS
-------------+-----------------------------Model | 21.9022157
3 7.30073858
Residual | 525.065814 5032 .104345353
-------------+-----------------------------Total |
546.96803 5035 .108633174
Number of obs
F( 3, 5032)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
5036
69.97
0.0000
0.0400
0.0395
.32303
-----------------------------------------------------------------------------winlose2 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------leader |
.0727539
.0274624
2.65
0.008
.0189157
.1265921
priorm |
.0016756
.0001633
10.26
0.000
.0013555
.0019956
scandal | -.4251082
.0404513
-10.51
0.000
-.5044103
-.345806
_cons |
.8551697
.00733
116.67
0.000
.8407997
.8695397
Differences? None except sign-shift on the covariates. The intercept changes
value to reflect the case that the baseline is different.
Now let’s enter the world of MNL. Here is my working dependent variable. It
denotes career outcomes: 0=win election; 1=lose in general; 2=lose in primary;
3=retire; 4=run for higher office.
Are these categories unordered?
Most likely yes.
1
. table event
---------------------event |
Freq.
----------+----------0 |
4,593
1 |
313
2 |
69
3 |
250
4 |
226
Suppose I estimated the following model?
. reg event leader priorm scandal
Source |
SS
df
MS
-------------+-----------------------------Model | 22.3391973
3
7.4463991
Residual | 5570.61346 5425 1.02684119
-------------+-----------------------------Total | 5592.95266 5428 1.03038922
Number of obs
F( 3, 5425)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
5429
7.25
0.0001
0.0040
0.0034
1.0133
-----------------------------------------------------------------------------event |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------leader | -.0387411
.0825463
-0.47
0.639
-.200565
.1230828
priorm | -.0014561
.0004937
-2.95
0.003
-.0024239
-.0004883
scandal |
.465824
.1254223
3.71
0.000
.219946
.7117021
_cons |
.4312325
.0221377
19.48
0.000
.3878336
.4746313
What problems do you see with this? What does y-hat refer to in this model?
(Not much). This is a dumb thing to do so be careful and don’t do it!
Here is another approach (which is better than above, but still troublesome).
Since we know that MNL will yield (k+1)(J-1) sets of parameters, we may want to
collapse scores on y to produce a simpler binary dependent variable (i.e. 1,0).
Suppose I did that with the “event” dependent variable.
. logit event leader priorm scandal
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
log
log
log
log
likelihood
likelihood
likelihood
likelihood
=
=
=
=
-2359.1841
-2306.8604
-2301.2029
-2301.1905
Logit estimates
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
Log likelihood = -2301.1905
=
=
=
=
5429
115.99
0.0000
0.0246
-----------------------------------------------------------------------------event |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------leader | -.1651033
.2451446
-0.67
0.501
-.6455779
.3153713
priorm |
-.013588
.0016015
-8.48
0.000
-.0167268
-.0104492
scandal |
1.728522
.2574409
6.71
0.000
1.223947
2.233096
_cons | -1.275881
.0593058
-21.51
0.000
-1.392118
-1.159643
------------------------------------------------------------------------------
2
What does this model reveal?
What are they?
It is a garden variety logit but it has problems.
How about a multinomial logit?
. mlogit event leader priorm scandal
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
log
log
log
log
log
log
log
log
log
log
log
log
log
log
log
log
log
log
log
log
log
log
log
log
log
log
log
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
-3451.2538
-3376.4258
-3285.2495
-3269.0735
-3267.4885
-3267.32
-3267.2616
-3267.2402
-3267.2324
-3267.2295
-3267.2284
-3267.228
-3267.2279
-3267.2278
-3267.2278
-3267.2278
-3267.2278
-3267.2278
-3267.2278
-3267.2278
-3267.2278
-3267.2278
-3267.2278
-3267.2278
-3267.2278
-3267.2278
-3267.2278
-3267.2278
-3267.2278
-3267.2278
-3267.2278
-3267.2278
Multinomial regression
Number of obs
LR chi2(12)
Prob > chi2
Pseudo R2
Log likelihood = -3267.2278
=
=
=
=
5429
368.05
0.0000
0.0533
-----------------------------------------------------------------------------event |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------1
|
leader | -.4305763
.5058561
-0.85
0.395
-1.422036
.5608835
priorm | -.0606966
.0047344
-12.82
0.000
-.0699759
-.0514174
scandal |
2.60861
.3772726
6.91
0.000
1.869169
3.34805
_cons | -1.362504
.0938093
-14.52
0.000
-1.546367
-1.178642
-------------+---------------------------------------------------------------2
|
leader | -30.89904
2509503
-0.00
1.000
-4918566
4918504
priorm | -.0036454
.0045901
-0.79
0.427
-.0126418
.005351
scandal |
3.192746
.4085509
7.81
0.000
2.392001
3.993491
_cons | -4.171352
.2008908
-20.76
0.000
-4.565091
-3.777613
-------------+---------------------------------------------------------------3
|
leader |
.722304
.2820702
2.56
0.010
.1694565
1.275151
3
priorm | -.0030767
.0024212
-1.27
0.204
-.0078222
.0016688
scandal |
1.276717
.4076773
3.13
0.002
.4776845
2.07575
_cons | -2.855778
.1064539
-26.83
0.000
-3.064424
-2.647132
-------------+---------------------------------------------------------------4
|
leader | -1.857627
1.006686
-1.85
0.065
-3.830696
.1154417
priorm | -.0003354
.0024513
-0.14
0.891
-.0051398
.004469
scandal | -32.30673
8431210
-0.00
1.000
-1.65e+07
1.65e+07
_cons | -2.977372
.1115288
-26.70
0.000
-3.195965
-2.75878
-----------------------------------------------------------------------------(Outcome event==0 is the comparison group)
I left the iteration log in the handout to demonstrate that the likelihood
function for the MNL model is more complicated than for the standard logit
insofar as there are multiple equations and many more parameters to estimate.
Not that we have 5 outcomes and 3+1 parameters to estimate per contrast. This
gives us 4*4=16 total parameters to interpret.
Interpretation is important! Simply reporting the J-1 logits is not
sufficient. The question is, how to interpret. Note that these models are
very similar to “stand-alone” logits. To see this, consider the following:
. logit contrast1 leader priorm scandal
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
5:
log
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
=
-1157.6154
-1056.2882
-1009.2034
-1002.0104
-1001.6943
-1001.6936
Logit estimates
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
Log likelihood = -1001.6936
=
=
=
=
4888
311.84
0.0000
0.1347
-----------------------------------------------------------------------------contrast1 |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------leader | -.5861779
.5372326
-1.09
0.275
-1.639135
.4667787
priorm | -.0617046
.0048102
-12.83
0.000
-.0711324
-.0522767
scandal |
2.865274
.4096536
6.99
0.000
2.062368
3.66818
_cons | -1.343004
.0947278
-14.18
0.000
-1.528667
-1.157341
-----------------------------------------------------------------------------. logit contrast2 leader priorm scandal
note: leader~=0 predicts failure perfectly
leader dropped and 136 obs not used
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
-356.88573
-341.31356
-338.189
-338.15703
-338.15702
Logit estimates
Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2
Log likelihood = -338.15702
=
=
=
=
4510
37.46
0.0000
0.0525
------------------------------------------------------------------------------
4
contrast2 |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------priorm | -.0040041
.0046536
-0.86
0.390
-.0131249
.0051168
scandal |
3.274799
.4165694
7.86
0.000
2.458338
4.09126
_cons | -4.159832
.2017215
-20.62
0.000
-4.555198
-3.764465
-----------------------------------------------------------------------------. logit contrast3 leader priorm scandal
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
-980.57871
-979.30912
-972.50717
-972.38181
-972.38146
Logit estimates
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
Log likelihood = -972.38146
=
=
=
=
4826
16.39
0.0009
0.0084
-----------------------------------------------------------------------------contrast3 |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------leader |
.7326877
.2810882
2.61
0.009
.1817649
1.28361
priorm | -.0031108
.002431
-1.28
0.201
-.0078755
.0016539
scandal |
1.294154
.4066588
3.18
0.001
.4971172
2.09119
_cons |
-2.85535
.1067299
-26.75
0.000
-3.064537
-2.646164
-----------------------------------------------------------------------------. logit contrast4 leader priorm scandal
note: scandal~=0 predicts failure perfectly
scandal dropped and 37 obs not used
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
5:
log
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
=
-900.40208
-897.69912
-897.13307
-897.07418
-897.07303
-897.07303
Logit estimates
Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2
Log likelihood = -897.07303
=
=
=
=
4763
6.66
0.0358
0.0037
-----------------------------------------------------------------------------contrast4 |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------leader | -1.842536
1.006826
-1.83
0.067
-3.815878
.1308067
priorm | -.0002995
.0024752
-0.12
0.904
-.0051509
.0045518
_cons | -2.979101
.1122009
-26.55
0.000
-3.199011
-2.759191
What does this exercise show? It illustrates the very close connection between
logit and MNL. Indeed, it is easy to see that logit is a special case of the
MNL model. The parameter estimates you get from MNL will be very similar to
estimates you would get if you estimated “stand alone” logit models. (What is
going on with “contrast 2”).
Now return to the MNL results. The question is how to interpret the results.
We could compute probabilities. In Stata this would require us to reference
equations, as mlogit is a multiequation model.
5
We could use Stata’s predict option and generate 5 sets of probabilities.
requires us to create 5 new variables containing the probabilities.
This
. predict p1 p2 p3 p4 p5, pr
(22 missing values generated)
Here, p1 is the probability of a “0” vs. all other categories (this is the
baseline probability); p2 is the probability of a “1” vs. “0”; p3 is the
probability of a “2” vs. “0”; p4 is the probability of a “3” vs. “0”; p5 is the
probability of a “4” vs. “0”.
Note that if you use Stata’s canned predict procedure, this computes
probabilities for all covariate profiles. Sometimes, as with the logit model,
you will want to compute “scenario” probabilities. If this is the case, you
need to compute the probabilities directly. Suppose we want to compute the
probability of a “1” vs. “0” letting the prior margin variable assume all
values and for an incumbent who is not in the leadership and not embroiled by
scandal.
. gen
prob_scen1=exp([1]_b[leader]*0+[1]_b[scandal]*0+[1]_b[priorm]*priorm)/(1+exp([1]_b[leader]*0+[1]_b[scandal]*0+
_b[priorm]*priorm)+exp([2]_b[leader]*0+[2]_b[scandal]*0+[2]_b[priorm]*priorm)+exp([3]_b[leader]*0+[3]_b[scand
al]*0+[3]_b[priorm]*priorm)+exp([4]_b[leader]*0+[4]_b[scandal]*0+[4]_b[priorm]*priorm))
(22 missing values generated)
Can you read that? This is ugly but it is the way to compute the probability
directly. This code, though ugly, is useful if you want to compute
probabilities for various scenarios.
Of course since the MNL is a direct extension of the logit model, use of odds
ratios are also possible (and advisable). The interpretation of the odds ratio
in this setting is identical to the interpretation in the binary setting--there are just more odds ratios to compute. Stata will compute these for you.
If after estimating an mnl model you type:
. mlogit, rr
Multinomial regression
Number of obs
LR chi2(12)
Prob > chi2
Pseudo R2
Log likelihood = -3267.2278
=
=
=
=
5429
368.05
0.0000
0.0533
-----------------------------------------------------------------------------event |
RRR
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------1
|
leader |
.6501343
.3288744
-0.85
0.395
.2412224
1.75222
priorm |
.9411087
.0044556
-12.82
0.000
.9324163
.9498821
scandal |
13.58016
5.123422
6.91
0.000
6.482906
28.44722
-------------+---------------------------------------------------------------2
|
leader |
3.81e-14
9.56e-08
-0.00
1.000
0
.
priorm |
.9963613
.0045734
-0.79
0.427
.9874378
1.005365
scandal |
24.35523
9.950349
7.81
0.000
10.93536
54.24395
-------------+---------------------------------------------------------------3
|
leader |
2.059172
.5808311
2.56
0.010
1.184661
3.579243
priorm |
.996928
.0024138
-1.27
0.204
.9922083
1.00167
scandal |
3.584852
1.461463
3.13
0.002
1.612337
7.970522
6
-------------+---------------------------------------------------------------4
|
leader |
.1560425
.1570858
-1.85
0.065
.0216945
1.122369
priorm |
.9996646
.0024504
-0.14
0.891
.9948734
1.004479
scandal |
9.32e-15
7.86e-08
-0.00
1.000
0
.
-----------------------------------------------------------------------------(Outcome event==0 is the comparison group)
Then Stata will give you the odds ratios (which it refers to as “relative risk
ratios” (or RRR). Verify that the odds ratios here are equivalent to the
exponentiated coefficients from the MNL model. So interpretation of the MNL
model is not a problem: it follows directly on the heels of the binary logit
model. The only kick is there are more parameters to estimate.
One final thing: note that the baseline case is arbitrary. Suppose we make
outcome “1” the baseline. In Stata we can do this by typing:
. mlogit event leader priorm scandal, base(1)
Iteration 0:
…
Iteration 33:
log likelihood = -3451.2538
log likelihood = -3267.2278
Multinomial regression
Number of obs
LR chi2(12)
Prob > chi2
Pseudo R2
Log likelihood = -3267.2278
=
=
=
=
5429
368.05
0.0000
0.0533
-----------------------------------------------------------------------------event |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------0
|
leader |
.4305763
.5058561
0.85
0.395
-.5608835
1.422036
priorm |
.0606966
.0047344
12.82
0.000
.0514174
.0699759
scandal |
-2.60861
.3772726
-6.91
0.000
-3.34805
-1.869169
_cons |
1.362504
.0938093
14.52
0.000
1.178642
1.546367
-------------+---------------------------------------------------------------2
|
leader | -32.46847
6821535
-0.00
1.000
-1.34e+07
1.34e+07
priorm |
.0570513
.0065238
8.75
0.000
.0442649
.0698377
scandal |
.5841368
.4925805
1.19
0.236
-.3813033
1.549577
_cons | -2.808848
.218587
-12.85
0.000
-3.23727
-2.380425
-------------+---------------------------------------------------------------3
|
leader |
1.15288
.5597458
2.06
0.039
.0557986
2.249962
priorm |
.0576199
.0052518
10.97
0.000
.0473267
.0679132
scandal | -1.331892
.5028736
-2.65
0.008
-2.317506
-.3462784
_cons | -1.493273
.1374234
-10.87
0.000
-1.762618
-1.223929
-------------+---------------------------------------------------------------4
|
leader | -1.427051
1.121111
-1.27
0.203
-3.624387
.7702861
priorm |
.0603612
.0052794
11.43
0.000
.0500138
.0707087
scandal | -36.91534
2.29e+07
-0.00
1.000
-4.49e+07
4.49e+07
_cons | -1.614868
.1416112
-11.40
0.000
-1.892421
-1.337315
-----------------------------------------------------------------------------(Outcome event==1 is the comparison group)
Note that the log-likelihood is the same, as it must be, compared to the
previous model. The only difference is the equations are now referencing
category 1, not category 0, as the baseline category. The coefficients seem to
7
change, but this is only because the baseline category has changed.
inferences from one model would be identical to the other.
The
To see something neat, look at the coefficient estimate for the scandal
variable for outcome “2” and outcome “0”. If we take the following difference:
. display [2]_b[scandal]-[0]_b[scandal]
we obtain 3.192746.
This is exactly equal to the value of the scandal coefficient for outcome “2”
in our first MNL model. Thus, the difference in contrasts (2 v. 1)-(0 v. 1)=(2
v. 0) for the scandal coefficient. Because the baseline category is arbitrary,
any contrast can be derived from any MNL model with just a little bit of
manipulation.
8
Download