Binary and Multinomial Logistic Regression stat 557 Heike Hofmann

advertisement
Binary and Multinomial
Logistic Regression
stat 557
Heike Hofmann
Outline
• Logistic Regression:
• model checking by grouping
• Model selection
• scores
• Intro to Multinomial Regression
Example: Happiness Data
> summary(happy)
happy
not too happy: 5629
pretty happy :25874
very happy
:14800
NA's
: 4717
marital
divorced
: 6131
married
:27998
never married:10064
separated
: 1781
widowed
: 5032
NA's
:
14
year
Min.
:1972
1st Qu.:1982
Median :1990
Mean
:1990
3rd Qu.:2000
Max.
:2006
age
sex
Min.
: 18.00
female:28581
1st Qu.: 31.00
male :22439
Median : 43.00
Mean
: 45.43
3rd Qu.: 58.00
Max.
: 89.00
NA's
:184.00
degree
finrela
bachelor
: 6918
above average
: 8536
graduate
: 3253
average
:23363
high school
:26307
below average
:10909
junior college: 2601
far above average: 898
lt high school:11777
far below average: 2438
NA's
: 164
NA's
: 4876
health
excellent:11951
fair
: 7149
good
:17227
poor
: 2164
NA's
:12529
only consider extremes: very happy and not very happy
individuals
prodplot(data=happy, ~ happy+sex, c("vspine",
"hspine"), na.rm=T, subset=level==2)
# almost perfect independence
# try a model
happy.sex <- glm(happy~sex, family=binomial(),
data=happy)
summary(happy.sex)
Call:
glm(formula = happy ~ sex, family = binomial(), data = happy)
Deviance Residuals:
Min
1Q
Median
-1.6060 -1.6054
0.8027
3Q
0.8031
Max
0.8031
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.96613
0.02075 46.551
<2e-16 ***
sexmale
0.00130
0.03162
0.041
0.967
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 24053
Residual deviance: 24053
AIC: 24057
on 20428
on 20427
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 4
female
male
> anova(happy.sex)
Analysis of Deviance Table
Model: binomial, link: logit
Response: happy
> confint(happy.sex)
Waiting for profiling to be done...
2.5 %
97.5 %
(Intercept) 0.92557962 1.00693875
sexmale
-0.06064378 0.06332427
Terms added sequentially (first to last)
Df
NULL
sex
•
Deviance Resid. Df Resid. Dev
20428
24053
1 0.0016906
20427
24053
Deviance difference is asymptotically χ2
distributed
• Null hypothesis of independence cannot be
rejected
Age and Happiness
qplot(age, geom="histogram", fill=happy,
binwidth=1, data=happy)
300
count
qplot(age, geom="histogram", fill=happy,
binwidth=1, position="fill", data=happy)
400
happy
not too happy
200
very happy
100
0
20
# research paper claims that happiness is
u-shaped
happy.age <- glm(happy~poly(age,2),
family=binomial(), data=na.omit(happy[,c
("age","happy")]))
30
40
50
age
60
70
80
1.0
0.8
count
0.6
happy
not too happy
0.4
very happy
0.2
0.0
20
30
40
50
age
60
70
80
1.0
0.8
count
0.6
happy
not too happy
0.4
very happy
0.2
> summary(happy.age)
0.0
20
30
40
50
age
60
70
Call:
glm(formula = happy ~ poly(age, 2), family = binomial(), data = na.omit(happy[,
c("age", "happy")]))
Deviance Residuals:
Min
1Q
Median
-1.6400 -1.5480
0.7841
3Q
0.8061
Max
0.8707
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)
0.96850
0.01571 61.660 < 2e-16 ***
poly(age, 2)1 6.41183
2.22171
2.886 0.00390 **
poly(age, 2)2 -7.81568
2.21981 -3.521 0.00043 ***
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 23957
Residual deviance: 23936
AIC: 23942
on 20351
on 20349
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 4
80
age
20
30
40
50
60
70
80
0.0
1.0
count
tnuoc
0.2
0.8
very happy
happy
not too happy
not too happy
happy
0.4
0.6
0.6
0.4
very happy
0.8
0.2
1.0
0.0
20
30
40
# effect of age
X <- data.frame(cbind(age=20:85))
X$pred <- predict(happy.age, newdata=X, type="response")
qplot(age, pred, data=X) + ylim(c(0,1))
50
60
age
70
80
1.0
0.8
> anova(happy.age)
Analysis of Deviance Table
Model: binomial, link: logit
Response: happy
pred
0.6
0.4
0.2
0.0
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev
NULL
20351
23957
poly(age, 2) 2
20.739
20349
23936
20
30
40
50
age
60
70
80
# effect of age
X <- data.frame(expand.grid(age=20:85, sex=c("female","male")))
preds <- predict(happy.age, newdata=X, type="response", se.fit=T)
X$pred <- preds$fit
X$pred.se <- preds$se.fit
limits <- aes(ymax = pred + pred.se, ymin=pred - pred.se)
qplot(age, pred, data=X, size=I(1)) + ylim(c(0,1)) + geom_point(aes(age, pred2), size=1,
colour="blue") +
geom_errorbar(limits) + geom_errorbar(limits2, colour="blue") + geom_point(aes(x=age,
y=happy/(happy+not), colour=sex), data=happy.age.df)
1.0
> anova(midlife.sex)
Analysis of Deviance Table
0.8
Model: binomial, link: logit
Terms added sequentially (first to last)
0.6
sex
pred3
Response: happy
Df Deviance Resid. Df Resid. Dev
NULL
20351
23957
poly(age, 4)
4
59.021
20347
23898
sex
1
0.000
20346
23898
poly(age, 4):sex 4
37.554
20342
23860
female
male
0.4
0.2
0.0
20
30
40
50
age
60
70
80
Problems with Deviance
•
if X is continuous, deviance has no longer χ2
distribution. Two-fold violations:
•
regard X to be categorical (with lots of categories):
we might end up with a contingency table that has
lots of small cells - which means, that the χ2
approximation does not hold.
•
Increases in sample size, most likely increase the
number of different values of X.
Corresponding contingency table changes size
(asymptotic distribution for the smaller
contingency table doesn’t exist).
... but
• Differences in deviances between models that
are only a few degrees of freedom apart, still
have asymptotically χ2
# effect of age
X <- data.frame(expand.grid(age=20:85, sex=c("female","male")))
preds <- predict(happy.age, newdata=X, type="response", se.fit=T)
X$pred <- preds$fit
X$pred.se <- preds$se.fit
limits <- aes(ymax = pred + pred.se, ymin=pred - pred.se)
qplot(age, pred, data=X, size=I(1)) + ylim(c(0,1)) + geom_point(aes(age, pred2), size=1,
colour="blue") +
geom_errorbar(limits) + geom_errorbar(limits2, colour="blue") + geom_point(aes(x=age,
y=happy/(happy+not), colour=sex), data=happy.age.df)
1.0
> anova(midlife.sex)
Analysis of Deviance Table
0.8
Model: binomial, link: logit
Terms added sequentially (first to last)
0.6
sex
pred3
Response: happy
Df Deviance Resid. Df Resid. Dev
NULL
20351
23957
poly(age, 4)
4
59.021
20347
23898
sex
1
0.000
20346
23898
poly(age, 4):sex 4
37.554
20342
23860
female
male
0.4
0.2
0.0
20
30
40
50
age
60
70
80
Model
Checking
by
Grouping
blem with deviance: if X continuous, deviance has no longer χ distribution. Th
2
ptions are violated two-fold: even if we regard X to be categorical (with lots of cate
we end up with a contingency table that has lots of small cells - which means, tha
data along
estimates,
e.g. such
thatlikely the number of d
does notGroup
hold. Secondly,
if we increase
the sample
size, most
eases, too,
which makes
the correspondingequal
contingency
table change size (so we can
groups
are approximately
in size.
asymptotic distribution for the smaller contingency table, as it doesn’t exist anym
is larger).
Partition
•
•
smallest n1 estimates into group 1,
del Checking by Grouping To get around the problems with the distribution a
second
smallest
batch
into such that group
2 estimates
group the
data along
estimates,
e.g. of
by n
partitioning
on estimates
al in size.group 2,
titioning ...
theIf estimates
is done
by size, we
the smallest
we assume
g groups,
wegroup
get the
Hosmer-n1 estimates into g
llest batch of n2 estimates into group 2, ... If we assume g groups, we get the Hos
Lemeshow test statistic:
istic
g
�
i=1
��
ni
j=1
�ni
�2
yij − j=1 π̂ij
2
��
��
�
∼
χ
g−2 .
�
ni
1 − j π̂ij /ni
j=1 π̂ij
Problems with
Grouping
• Different groupings might (and will) lead to
different decisions w.r.t model fit
• Hosmer et al (1997): “A COMPARISON OF
GOODNESS-OF-FIT TESTS FOR THE LOGISTIC
REGRESSION MODEL” (on Blackboard)
Model Selection
?
Ideal Situation:
Theory for relationship between response and
outcome is well developed, model is fitted
because we want to fine-tune dependency
structure
Model Selection
?
Exploratory Modelling
After initial data check,
visually inspect relationship between response
and potential co-variates
include strongest co-variates first,
build up from there, check whether additions are
significant improvements
Model Selection
Stepwise Modelling
(not recommended by itself)
Include/Exclude variables based on goodness-offit criteria such as AIC, adjusted R2, ...
In Practice: combination of all three methods
(Forward) Selection
• Results are often not easy to interpret
- questionable value?
Step: AIC=18176
cbind(happy, not) ~ sex + poly(age, 4) + marital + degree + finrela +
degree:finrela + poly(age, 4):degree + poly(age, 4):finrela +
sex:finrela + sex:degree
Df Deviance
<none>
16714
+ sex:marital
4
16707
+ marital:degree
16
16688
+ poly(age, 4):marital 16
16688
+ sex:poly(age, 4)
4
16714
+ marital:finrela
16
16693
AIC
18176
18177
18182
18182
18184
18187
(Forward) Selection
Step: AIC=18176
cbind(happy, not) ~ sex + poly(age, 4) + marital + degree + finrela +
degree:finrela + poly(age, 4):degree + poly(age, 4):finrela +
sex:finrela + sex:degree
Df Deviance
<none>
16714
- sex:degree
4
16722
+ sex:marital
4
16707
- sex:finrela
4
16724
+ marital:degree
16
16688
+ poly(age, 4):marital 16
16688
+ sex:poly(age, 4)
4
16714
+ marital:finrela
16
16693
- poly(age, 4):finrela 16
16759
- poly(age, 4):degree 16
16766
- degree:finrela
16
16774
- marital
4
18232
AIC
18176
18176
18177
18178
18182
18182
18184
18187
18189
18196
18204
19686
Investigate Interactions
• Financial Relation / Gender
prodplot(happy, ~happy+sex+finrela, c("vspine","hspine","hspine"), subset=level==3)
far below
far
female
below
average
average
female
below average below averagefemale
average
female female
female
male
male
average
above
male average
above average
farfar
male
above
abovemale
average
average
male
NA
NA
Investigate Interactions
• Financial Relation / Gender
prodplot(happy, ~happy+finrela+sex, c("vspine","hspine","hspine"), subset=level==3)
female
female
female
female female
female
male
male
male
male
male
male
Effect plots
bachelor
graduate
high school
junior college
lt high school
0.8
0.6
0.4
0.2
divorced
divorced
0.8
0.6
0.4
0.2
married never marriedseparated
married never marriedseparated
0.8
0.6
0.4
0.2
widowed
widowed
0.8
0.6
0.4
0.2
0.8
0.6
0.4
0.2
male
female
pred
0.8
0.6
0.4
0.2
male
female
0.8
0.6
0.4
0.2
male
female
0.8
0.6
0.4
0.2
male
female
0.8
0.6
0.4
0.2
male
female
0.8
0.6
0.4
0.2
20 30 40 50 60 70 80 90 20 30 40 50 60 70 80 90 20 30 40 50 60 70 80 90 20 30 40 50 60 70 80 90 20 30 40 50 60 70 80 90
age
finrela
far below average
below average
average
above average
far above average
Standardized Residuals
• Standardize by dividing by leverage values:
• Hat matrix is result of iterative weighted
fitting,
• with the weights determined by the link:
Diagnostics
• Residual Plots
• Predictive Power (corresponds to R )
• Deletion Statistics (Belsley, Kuh and Welsch
2
(1980), Cook and Weisberg (1982)):
dfbeta, dffits, covratio,
cooks.distance
π(x)
log
= α + βi ,
1 − π(x)
Example: Alcohol during
pregnancy
e βi is the effect of the ith category in X on the log odds, i.e. for each category one effect
means that the above model is overparameterized (the “last” category can be explaine
thers). To make the solution unique again, we have to use an additional constraint. I
fault. Whenever one of the effects is fixed to be zero, this is called a contrast coding arison of all the�
other effects to the baseline effect. For effect coding the constraint is on t
s of a variable: i βi = 0. In a binary variable the effects are then the negatives of each
Observational
ctions and inference
are independentStudy:
from the specific coding used and are not affecte
in the coding. at 3 months of pregnancy, expectant
•
mothers asked for average daily alcohol
mple: Alcohol and Malformation
consume.
hol during pregnancy
is believed to be associated with congenital malformation. The follow
om an observational study - after three months of pregnancy questions on the average nu
infant checked for malformation at birth
olic beverages were asked; at birth the infant was checked for malformations:
Alcohol malformed absent P(malformed)
1
0
48
17066
0.0028
2
<1
38
14464
0.0026
3
1-2
5
788
0.0063
4
3-5
1
126
0.0079
5
≥6
1
37
0.0263
els m1 and m2 are the same in terms of statistical behavior: deviance, predictions and
the same numbers. The variable Alcohol is recoded for the second model, giving differ
Saturated Model
glm(formula = cbind(malformed, absent) ~ Alcohol, family = binomial())
Deviance Residuals:
[1] 0 0 0 0 0
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.87364
0.14454 -40.637
<2e-16 ***
Alcohol<1
-0.06819
0.21743 -0.314
0.7538
Alcohol1-2
0.81358
0.47134
1.726
0.0843 .
Alcohol3-5
1.03736
1.01431
1.023
0.3064
Alcohol>=6
2.26272
1.02368
2.210
0.0271 *
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 6.2020e+00
Residual deviance: -3.0775e-13
AIC: 28.627
on 4
on 0
Number of Fisher Scoring iterations: 4
degrees of freedom
degrees of freedom
‘Linear’ Effect
glm(formula = cbind(malformed, absent) ~ as.numeric(Alcohol),
family = binomial())
Deviance Residuals:
1
2
3
0.7302 -1.1983
0.9636
4
0.4272
5
1.1692
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)
-6.2089
0.2873 -21.612
<2e-16 ***
as.numeric(Alcohol)
0.2278
0.1683
1.353
0.176
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 6.2020
Residual deviance: 4.4473
AIC: 27.074
on 4
on 3
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 5
levels: 1,2,3,4,5
‘Linear’ Effect
glm(formula = cbind(malformed, absent) ~ as.numeric(Alcohol),
family = binomial())
Deviance Residuals:
1
2
3
0.5921 -0.8801
0.8865
4
-0.1449
5
0.1291
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)
-5.9605
0.1154 -51.637
<2e-16 ***
as.numeric(Alcohol)
0.3166
0.1254
2.523
0.0116 *
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 6.2020
Residual deviance: 1.9487
AIC: 24.576
on 4
on 3
degrees of freedom
degrees of freedom
Number of Fisher Scoring iterations: 4
levels: 0,0.5,1.5,4,7
Scores
• Scores of categorical variables critically
influence a model
• usually, scores will be given by data experts
• various choices:
e.g. midpoints of interval variables,
• assume default scores are values 1 to n
Multinomial Models
Models for Multinomial Logit
Response Y is categorical with J > 2
•
ariable Y be a nominal variable with J > 2 categ
categories
define π (x) = P(Y=j | X=x)
•
ne Category Logit Models
• Baseline Categorical Model:
j
pick one
reference
al” category
i, e.g.
i = 1category
or i =i, express
J or i logit
is largest
with respect to this reference:
πj (x)
�
log
= αj + βj x for all j = 1,
πi (x)
Multinomial Model
• Choices for baseline:
largest category gives most stable results
• R picks first level
Example: Alligator Food
• 219 alligators from four lakes in Florida were
examined with respect to their primary food
choice:
fish, invertebrae, birds, reptile, other.
• Additionally, size of alligators (≤2.3m, >2.3m)
and gender were recorded.
> summary(alligator)
ID
food
Min.
: 1.0
bird :13
1st Qu.: 55.5
fish :94
Median :110.0
invert:61
Mean
:110.0
other :32
3rd Qu.:164.5
rep
:19
Max.
:219.0
size
<2.3:124
>2.3: 95
gender
f: 89
m:130
lake
george :63
hancock :55
oklawaha:48
trafford:53
rep
other
invert
bird
fish
<2.3
>2.3
size
<2.3:124
>2.3: 95
gender
f: 89
m:130
lake
george :63
hancock :55
oklawaha:48
xtabs(~lake
+ food, data = alligator)
trafford:53
hancock
oklawaha
fish
george
rep other
invert
bird
food
> summary(alligator)
ID
food
Min.
: 1.0
bird :13
1st Qu.: 55.5
fish :94
Median :110.0
invert:61
Mean
:110.0
other :32
3rd Qu.:164.5
rep
:19
Max.
:219.0
lake
trafford
size
<2.3:124
>2.3: 95
gender
f: 89
m:130
lake
george :63
hancock :55
oklawaha:48
xtabs(~gender
+ food, data = alligator)
trafford:53
m
bird
fish
f
rep
other
invert
food
> summary(alligator)
ID
food
Min.
: 1.0
bird :13
1st Qu.: 55.5
fish :94
Median :110.0
invert:61
Mean
:110.0
other :32
3rd Qu.:164.5
rep
:19
Max.
:219.0
gender
library(nnet)
• Brian Ripley’s nnet package allows to fit
multinomial models:
library(nnet)
alli.main <- multinom(food~lake+size+gender, data=alligator)
> summary(alli.main)
Call:
multinom(formula = food ~ lake + size + gender, data = alligator)
Coefficients:
(Intercept) lakehancock lakeoklawaha laketrafford
size>2.3
bird
-2.4321397
0.5754699 -0.55020075
1.237216 0.7300740
invert
0.1690702 -1.7805555
0.91304120
1.155722 -1.3361658
other
-1.4309095
0.7667093
0.02603021
1.557820 -0.2905697
rep
-3.4161432
1.1296426
2.53024945
3.061087 0.5571846
genderm
bird
-0.6064035
invert -0.4629388
other -0.2524299
rep
-0.6276217
Std. Errors:
(Intercept) lakehancock lakeoklawaha laketrafford size>2.3
bird
0.7706720
0.7952303
1.2098680
0.8661052 0.6522657
invert
0.3787475
0.6232075
0.4761068
0.4927795 0.4111827
other
0.5381162
0.5685673
0.7777958
0.6256868 0.4599317
rep
1.0851582
1.1928075
1.1221413
1.1297557 0.6466092
genderm
bird
0.6888385
invert 0.3955162
other 0.4663546
rep
0.6852750
Residual Deviance: 537.8655
AIC: 585.8655
Download