Multinomial Logistic Regression stat 557 Heike Hofmann

advertisement
Multinomial Logistic
Regression
stat 557
Heike Hofmann
Outline
• Ordinal Co-variates
• Baseline Categorical Model
• Proportional Odds Logistic Regression
π(x)
log
= α + βi ,
1 − π(x)
Example: Alcohol during
pregnancy
e βi is the effect of the ith category in X on the log odds, i.e. for each category one effect
means that the above model is overparameterized (the “last” category can be explaine
thers). To make the solution unique again, we have to use an additional constraint. I
fault. Whenever one of the effects is fixed to be zero, this is called a contrast coding arison of all the�
other effects to the baseline effect. For effect coding the constraint is on t
s of a variable: i βi = 0. In a binary variable the effects are then the negatives of each
Observational
ctions and inference
are independentStudy:
from the specific coding used and are not affecte
in the coding. at 3 months of pregnancy, expectant
•
mothers asked for average daily alcohol
mple: Alcohol and Malformation
consume.
hol during pregnancy
is believed to be associated with congenital malformation. The follow
om an observational study - after three months of pregnancy questions on the average nu
infant checked for malformation at birth
olic beverages were asked; at birth the infant was checked for malformations:
Alcohol malformed absent P(malformed)
1
0
48
17066
0.0028
2
<1
38
14464
0.0026
3
1-2
5
788
0.0063
4
3-5
1
126
0.0079
5
≥6
1
37
0.0263
els m1 and m2 are the same in terms of statistical behavior: deviance, predictions and
the same numbers. The variable Alcohol is recoded for the second model, giving differ
Saturated Model
glm(formula = cbind(malformed, absent) ~ Alcohol, family = binomial())
Deviance Residuals:
[1] 0 0 0 0 0
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.87364
0.14454 -40.637
<2e-16 ***
Alcohol<1
-0.06819
0.21743 -0.314
0.7538
Alcohol1-2
0.81358
0.47134
1.726
0.0843 .
Alcohol3-5
1.03736
1.01431
1.023
0.3064
Alcohol>=6
2.26272
1.02368
2.210
0.0271 *
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 6.2020e+00
Residual deviance: -3.0775e-13
AIC: 28.627
on 4
on 0
Number of Fisher Scoring iterations: 4
degrees of freedom
degrees of freedom
‘Linear’ Effect
glm(formula = cbind(malformed, absent) ~ as.numeric(Alcohol),
family = binomial())
Deviance Residuals:
1
2
3
0.7302 -1.1983
0.9636
4
0.4272
5
1.1692
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)
-6.2089
0.2873 -21.612
<2e-16 ***
as.numeric(Alcohol)
0.2278
0.1683
1.353
0.176
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 6.2020
Residual deviance: 4.4473
AIC: 27.074
on 4
on 3
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 5
levels: 1,2,3,4,5
‘Linear’ Effect
glm(formula = cbind(malformed, absent) ~ as.numeric(Alcohol),
family = binomial())
Deviance Residuals:
1
2
3
0.5921 -0.8801
0.8865
4
-0.1449
5
0.1291
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)
-5.9605
0.1154 -51.637
<2e-16 ***
as.numeric(Alcohol)
0.3166
0.1254
2.523
0.0116 *
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 6.2020
Residual deviance: 1.9487
AIC: 24.576
on 4
on 3
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 4
levels: 0,0.5,1.5,4,7
Ordinal X
•
Scores of categorical variables critically
influence a model
•
•
usually, scores will be given by data experts
•
•
assume default scores are values 1 to n
various choices:
e.g. midpoints of interval variables,
Linear changes to scores do not affect the overall
model (predictions, goodness of fit)
Example: Alligator Food
• 219 alligators from four lakes in Florida were
examined with respect to their primary food
choice:
fish, invertebrate, birds, reptile, other.
• Additionally, size of alligators (≤2.3m, >2.3m)
and gender were recorded.
> summary(alligator)
ID
food
Min.
: 1.0
bird :13
1st Qu.: 55.5
fish :94
Median :110.0
invert:61
Mean
:110.0
other :32
3rd Qu.:164.5
rep
:19
Max.
:219.0
size
<2.3:124
>2.3: 95
rep
other
invert
bird
fish
gender
f: 89
m:130
lake
george :63
hancock :55
oklawaha:48
trafford:53
> summary(alligator)
ID
food
Min.
: 1.0
bird :13
1st Qu.: 55.5
fish :94
Median :110.0
invert:61
Mean
:110.0
other :32
3rd Qu.:164.5
rep
:19
Max.
:219.0
size
<2.3:124
>2.3: 95
gender
f: 89
m:130
lake
george :63
hancock :55
oklawaha:48
trafford:53
> summary(alligator)
ID
food
Min.
: 1.0
bird :13
1st Qu.: 55.5
fish :94
Median :110.0
invert:61
Mean
:110.0
other :32
3rd Qu.:164.5
rep
:19
Max.
:219.0
size
<2.3:124
>2.3: 95
gender
f: 89
m:130
lake
george :63
hancock :55
oklawaha:48
trafford:53
Baseline Categorical
Model
Models for Multinomial Logit
Response Y is categorical with J > 2
•
ariable Y be a nominal variable with J > 2 categ
categories
define π (x) = P(Y=j | X=x)
•
ne Category Logit Models
• Baseline Categorical Model:
j
pick one
reference
al” category
i, e.g.
i = 1category
or i =i, express
J or i logit
is largest
with respect to this reference:
πj (x)
�
log
= αj + βj x for all j = 1,
πi (x)
Multinomial Model
• Choices for baseline:
• largest category gives most stable results
• R picks first level
• Haberman : G and X are χ distributed, if
2
2
2
data is categorical and not sparse;
for sparse or continuous data, deviance
differences between nested models are still
χ2 distributed, if the models differ in few
parameters.
library(nnet)
• Brian Ripley’s nnet package allows to fit
multinomial models:
library(nnet)
alli.main <- multinom(food~lake+size+gender, data=alligator)
> summary(alli.main)
Call:
multinom(formula = food ~ lake + size + gender, data = alligator)
Coefficients:
(Intercept) lakehancock lakeoklawaha laketrafford
size>2.3
bird
-2.4321397
0.5754699 -0.55020075
1.237216 0.7300740
invert
0.1690702 -1.7805555
0.91304120
1.155722 -1.3361658
other
-1.4309095
0.7667093
0.02603021
1.557820 -0.2905697
rep
-3.4161432
1.1296426
2.53024945
3.061087 0.5571846
genderm
bird
-0.6064035
invert -0.4629388
other -0.2524299
rep
-0.6276217
Std. Errors:
(Intercept) lakehancock lakeoklawaha laketrafford size>2.3
bird
0.7706720
0.7952303
1.2098680
0.8661052 0.6522657
invert
0.3787475
0.6232075
0.4761068
0.4927795 0.4111827
other
0.5381162
0.5685673
0.7777958
0.6256868 0.4599317
rep
1.0851582
1.1928075
1.1221413
1.1297557 0.6466092
genderm
bird
0.6888385
invert 0.3955162
other 0.4663546
rep
0.6852750
Residual Deviance: 537.8655
AIC: 585.8655
πa (x)
πa (x)
πb (x)
log
= log
− log
= (αa − αb ) + (βa − βb )� x.
πb (x)
πi (x)
πi (x)
Alligator
an : G2 and X 2 are χ2 distributed, if data is categorical and not sparse; if data is
us, deviance differences between nested models are still χ2 distributed, if the models diff
rs.
e: Alligator - Food Choice 219 alligators were examined with respect to their prim
sh, invertebrae, birds, reptile, other). Explanatory variables are lake (4 categories), size( <
d gender. The full
model
then has
thethe
formform
Full
Model
has
•
log
πj (x)
LG
SG
LSG
S
G
LS
L
+ βsgj
+ βlsgj
, for j = 1, ..., 4
+ βsj
+ βgj
+ βlsj
+ βlgj
= αj + βlj
πF (x)
•
ber of parameters#weparameters
estimate is thenestimated:
(in the above order):
3 +11 +
+ 13++
3) 1
· 4+
= 16
(1 +(13+ +
+ 11++3 +33 +
3)· 4* =464= 64
• find suitable sub-model
model has 0 degrees of freedom:
ry(nnet)
ns(contrasts=c("contr.treatment","contr.poly"))
-multinom(food~lake*size*gender,data=table.7.1)
ts: 85 (64 variable)
value 352.466903
0 value 261.200857
# saturated model
Alligator
• Corner-stone Models
Full
Two-way
Main Effects
Null
Deviance
df
487.6018
0
489.5426
12
537.8655
40
604.3629
60
• Suitable model ‘around’ main effects and all
two-way interactions
> anova(alli.full, alli.twoway, alli.main, alli.null)
Model Resid. df Resid. Dev
1
1
872
604.3629
2
lake + size + gender
852
537.8655
3 size * gender * lake - size:gender:lake
824
489.5426
4
size * gender * lake
812
487.6018
Test
Df LR stat.
Pr(Chi)
1
NA
NA
NA
2 1 vs 2
20 66.497442 6.723394e-07
3 2 vs 3
28 48.322909 9.889238e-03
4 3 vs 4
12 1.940776 9.994914e-01
Estimated
Response
ponse Probabilities For model
πj (x)
log
= αj + βj� x for all j = 1, ..., J and all x
πi (x)
onse probabilities are given as
estimated probabilities:
�
�
�
exp αj + βj x
�
πj (x) =
for all j = 1, ..., J
�
1 + k�=i exp (αk + βk x)
•
•
deling assumption we have for all j = 1, ..., J:
πj (x)
log
= αj + βj� x
πi (x)
Model Diagnostics
Fitted Values
1
2
3
4
5
6
7
8
size
<2.3
>2.3
<2.3
>2.3
<2.3
>2.3
<2.3
>2.3
Observed Values
lake bird fish invert other rep
george 1.2 18.5
16.9
3.8 0.5
george 1.8 14.5
3.1
2.2 0.5
hancock 2.7 20.9
3.6
9.9 1.9
hancock 2.3 9.1
0.4
3.1 1.1
oklawaha 0.2 5.2
12.0
1.1 1.5
oklawaha 0.8 12.8
7.0
1.9 5.5
trafford 0.9 4.4
12.4
4.2 2.1
trafford 3.1 8.6
5.6
5.8 5.9
1
2
3
4
5
6
7
8
size
<2.3
>2.3
<2.3
>2.3
<2.3
>2.3
<2.3
>2.3
lake
george
george
hancock
hancock
oklawaha
oklawaha
trafford
trafford
bird
-0.8
0.8
0.7
-0.7
0.2
-0.2
-0.1
0.1
1
2
3
4
5
6
7
8
lake
george
george
hancock
hancock
oklawaha
oklawaha
trafford
trafford
size fish bird invert other rep
<2.3
16
2
19
3
1
>2.3
17
1
1
3
0
<2.3
23
2
4
8
2
>2.3
7
3
0
5
1
<2.3
5
0
11
3
1
>2.3
13
1
8
0
6
<2.3
5
1
11
5
2
>2.3
8
3
7
5
6
fish invert other rep
2.5
-2.1
0.8 -0.5
-2.5
2.1 -0.8 0.5
-2.1
-0.4
1.9 -0.1
2.1
0.4 -1.9 0.1
0.2
1.0 -1.9 0.5
-0.2
-1.0
1.9 -0.5
-0.6
1.4 -0.8 0.1
0.6
-1.4
0.8 -0.1
Differences
Model Diagnostics
Fitted Values
1
2
3
4
5
6
7
8
size
<2.3
>2.3
<2.3
>2.3
<2.3
>2.3
<2.3
>2.3
Observed Values
lake bird fish invert other rep
george 1.2 18.5
16.9
3.8 0.5
george 1.8 14.5
3.1
2.2 0.5
hancock 2.7 20.9
3.6
9.9 1.9
hancock 2.3 9.1
0.4
3.1 1.1
oklawaha 0.2 5.2
12.0
1.1 1.5
oklawaha 0.8 12.8
7.0
1.9 5.5
trafford 0.9 4.4
12.4
4.2 2.1
trafford 3.1 8.6
5.6
5.8 5.9
1
2
3
4
5
6
7
8
size
<2.3
>2.3
<2.3
>2.3
<2.3
>2.3
<2.3
>2.3
lake
george
george
hancock
hancock
oklawaha
oklawaha
trafford
trafford
bird
-0.6
0.4
0.3
-0.3
1.0
-0.2
-0.2
0.0
1
2
3
4
5
6
7
8
lake
george
george
hancock
hancock
oklawaha
oklawaha
trafford
trafford
size fish bird invert other rep
<2.3
16
2
19
3
1
>2.3
17
1
1
3
0
<2.3
23
2
4
8
2
>2.3
7
3
0
5
1
<2.3
5
0
11
3
1
>2.3
13
1
8
0
6
<2.3
5
1
11
5
2
>2.3
8
3
7
5
6
fish invert other rep
0.1
-0.1
0.2 -1.1
-0.2
0.7 -0.4 1.0
-0.1
-0.1
0.2 -0.1
0.2
1.0 -0.6 0.1
0.0
0.1 -1.8 0.4
0.0
-0.1
1.0 -0.1
-0.1
0.1 -0.2 0.1
0.1
-0.3
0.1 0.0
Pearson
Residuals
Model Diagnostics
Fitted Values
1
2
3
4
5
6
7
8
size
<2.3
>2.3
<2.3
>2.3
<2.3
>2.3
<2.3
>2.3
george
Observed Values
lake bird fish invert other rep
george 1.2 18.5
16.9
3.8 0.5
george 1.8 14.5
3.1
2.2 0.5
hancock 2.7 20.9
3.6
9.9 1.9
hancock 2.3 9.1
0.4
3.1 1.1
oklawaha 0.2 5.2
12.0
1.1 1.5
oklawaha 0.8 12.8
7.0
1.9 5.5
trafford 0.9 4.4
12.4
4.2 2.1
trafford 3.1 8.6
5.6
5.8 5.9
george
hancock
hancock oklawaha oklawaha
trafford
trafford
1
2
3
4
5
6
7
8
lake
george
george
hancock
hancock
oklawaha
oklawaha
trafford
trafford
george
george
size fish bird invert other rep
<2.3
16
2
19
3
1
>2.3
17
1
1
3
0
<2.3
23
2
4
8
2
>2.3
7
3
0
5
1
<2.3
5
0
11
3
1
>2.3
13
1
8
0
6
<2.3
5
1
11
5
2
>2.3
8
3
7
5
6
hancock
hancock oklawaha
oklawaha
trafford
trafford
Proportional Odds
Logistic Regression
8.68.6 5.65.6
5.95.93.13.1 5.85.8
8 8 77
66 33
d.3>2.3
0.50.51.21.2 3.83.8
1616 1919
11 22
<2.318.5
18.5 16.9
16.9
0.50.51.81.8 2.22.2
1717 1 1
00 11
>2.314.5
14.5 3.13.1
bserved
andand
fitted
cellcell
counts
gives
anan
idea
ng
observed
fitted
counts
gives
ideaof ofthe
thesign
signofofthe
theresiduals.
residuals.
we
use use
the the
same
residuals
as as
before,
e.g.
on will
we will
same
residuals
before,
e.g.Pearsons
Pearsonsresiduals
residuals
Ordinal Response
•
55
33
33
For
For aa
oijo−
eijeij
ij −
√√ , ,
eijeij
asymptotic
vehave
asymptotic
distributions.
Y distributions.
is categorical
variable with J > 2 levels,
that have natural ordering
roportional
Odds
Model
portional
Odds
Model
Assume y1 < y2 < ... < yJ
ponse
variable
is ordinal,
take
a different
approachtotomodeling
modelingit:
it:based
basedon
onth
th
e variable
Y isYordinal,
we we
cancan
take
a different
approach
ty
(Yj ≤
j | for
x) cumulative
for
1, J
...,we
J log
we
define
cumulativeloglogodds
oddsasas
(YP≤
| x)
j =j1,=...,
define
thethe
cumulative
odds:
•
P (Y
≤| jx)| x)
π
+
...++π π(x)
i (x)
j (x)
P
(Y
≤
j
π
(x)
+
...
i
j
log log 1 − P (Y ≤ j | x)
==
loglog πj+1 (x) + ... + πJ (x)
1 − P (Y ≤ j | x)
πj+1 (x) + ... + πJ (x)
ulative odds model
or proportionalodds
odds model
is then given as
logistic
ve odds model proportional
or proportional odds
model
is thenregression
given as
P (Y ≤ j | x)
logP (Y ≤ j | x)
= αj +�β � x, for j = 1, ..., J
1 − P (Y ≤ j | x)
log
= αj + β x, for j = 1, ..., J
1 − P (Y ≤ j | x)
es αj are ordered, i.e. αj1 ≤ αj2 for j1 < j2 : for j1 < j2 , the cumulative prob
αjordering:
are ordered,
j2 : Since
for j1the< logit
j2 , the
proba
P (Y i.e.
≤ j1 α|j1x)≤≤ αPj2(Yfor≤ jj12 <
| x).
is a cumulative
monotone increas
•
Happiness
happy.age <- polr(happy~poly(age,4)*sex, data=na.omit(happy[,c("happy","age","sex")]))
1.0
Estimated Probabilities
0.8
variable
0.6
not.too.happy
pretty.happy
very.happy
sex
0.4
female
male
0.2
0.0
20
30
40
age
50
60
70
80
Download