Connections bet w een

advertisement
Connections between log-linear and
Wisconsin Driver Data:
logistic regression models
Any
log-linear model can be expressed as a logistic regression
model
{
Logisitic regression requires specication of a response variable
{
Log-linear models address association among all of the variables
Several
log-linear models may correspond to the same logistic regression model
(i=1) 'Male'
(i=2) 'Female'
Sex (S)
(j=1) '16-36'
(j=2) '36-55'
(j=3) 'over 55'
Age (A)
(k=1) 'Disease'
(k=2) 'Control'
Disease (D)
Violation (V) (`=1)
(`=2) 'None'
'Some'
1176
Conditional independence of Disease status
and Violation status given any combination of
sex and age categories:
V
SA
SD
log(mijk`) = + Si + Aj + D
k + ` + ij + ik
AD
AV
SAD
SAV
+SV
i` + jk + j` + ijk + ij`
1177
Polychotomous Logistic Regression
There are J > 2 response categories:
8
>
<
ji = P r >
:
9
>
response is =
in the j -th X1i; X2i; ; Xki>
;
category for j = 1; 2; : : : ; J
Conditional log-odds of a traÆc violation:
log
m
m
ij k
1
ij k
2
=
=
log (m
ij k
log (m
1)
ij k
Note that
2)
+ + + + 1 + S
A
D
i
j
k
V
SA
ij
+ SikD
S AV
+Si1V + AD
+ AV
+ SijAD
+ ij 1
jk
j1
k
+ + + + 2 + S
A
i
D
j
V
SA
k
ij
+i2 + j k + j 2 + ij k
SV
=
1
AV
V
1
AV
j
2
SV
SV
i
j
S AV
i
2 + AV
S AV
ij
1
+ SikD
+ ij 2
S AD
2 + 1
V
+
AD
S AV
ij
2
1178
J
X
j =1
ji = 1
for each i = 1; 2; : : : ; n.
There are several ways to
construct \simultaneous" logistic
regression models for such data.
1179
Then
Log-odds with respect to a baseline
category (e.g., the last category)
log
1i
J i
= 01 + 11X1i + 21X2i + + k1Xki
log 2 = 02 + 12X1i + 22X2i + + k2Xki
i
Ji
..
log
J
ji =
exp(0j + 1j X1i + + kj Xki)
1+
;i
`=1
exp(0` + i`X1i + + kj Xki)
for j = 1; 2; :::; J
Ji =
1+
1 = 0;J 1 + 1;J 1X1i + +K;J 1Xki
J i
JX1
JX1
j =1
1, and
1
exp(0j + 1j X1i + + kj Xki)
1180
1181
Adjacent-categories logits:
PROC CATMOD in SAS uses this when
the LOGISTIC response is applied to a variable with more than two categories.
It can be used for nominal response variables.
Other log-odds
!
!
ji
ji Ji
log
= log
`i
Ji `i
!
!
log `i
= log ji
Ji
Ji
h
i
= 0j + 1j X1i + + kj Xki
[0` + 1`X1i + + k`X`i]
= (0j 0`) + (1j )1`)X1i
+ + (kj k`)Xki
1182
log
1i
2i
!
log
2i
3i
!
J 1;i
J;i
!
log
= 01 + 11X1i + + k1Xki
= 02 + 12X1i + + k2Xki
..
= 0;J
1
+ 1;J 1X1i + +k;J 1Xki
equivalent to the \baselinecategory" logistic model
sometimes used when the polychotomous response is an ordinal
variable
1183
Example: (3 response categories)
Cummulative logits:
log
1
i
2 + 3 + + i
i
3 + + i
1 + 2 + + log
i
i
J
01 + 11 X1 + + 1X
=
02 + 12 X1 + + 2X
i
i
=
Ji
1 + 2
i
log
log
i
Ji
1;i
k
...
k
ki
ki
!
0
+ 2i
log 1i
= 02 + 12X1i + 22X2i = Xi2
3i
Then
0
=
;J
Ji
+
1 + 1;J 1 X1i
+ k;J
0
1i = (2i + 3i)eX 1
0
1i + 2i = 3ieX 2
1i + 2i + pi3i = 1
i
1X
ki
used for ordinal response variables
Does not yield the same estimates
of fjig as the \baseline-category"
!
0
1i
= 01 + 11X1i + 21X2i = Xi1
2i + 3i
i
logit model.
log 12ii , for example, is generally
not a linear function of the parameters
1184
1185
Continuation-ratio logits:
and
1i
Fit J 1 separate logistic regression
models:
0
eX 1
=
0
1 + eX 1
0
0
eX 2 eX 1
=
0
0
(1 + eX 1)(1 + eX 2 )
1
=
0
1 + eX 2
i
2i
i
3i
i
i
i
i
i
1
2 + 3 + + log
i
log
i
3 + 4 + + i
log
i
Cosequently,
log
1i
2i
!
i
i
02 + 12 X1 + + 2X
1;i
J
=
0
..
.
k
i
;J
k
1 + 1;J 1 X1i +
+k;J 1 Xki
0
0
eX 1 (1 + eX 1 ) 7
= log 64
5
0
0
eX 2 eX 1
i
=
Ji
i
Ji
3
2
01 + 11 X1 + + 1X
2
i
i
=
Ji
i
1186
used for ordinal response variables
is the conditional loglog +1 +
+
odds that a response falls in the j -th category given that it does not fall in a category
preceeding the j -th category
ji
j
;i
J;i
1187
ki
ki
Example: Toxicity study (Price, 1987)
(from Agresti, pp 320{321)
Administered
diethylene-glycoldimethylether (DIEGdiME) to
pregnant mice
Each mouse was exposed to one of
5 concentrations for exactly 10 days
early in the pregnancy
{ level 0 is a control group
{ each fetus was classifed as
(1) non-live
(2) malformed
(3) normal
Concentration
Response rate (%)
Number
mg/kg/day
non-live malformed normal
exposed
______________________________________________________
0
5.05
0.34
94.61
297
62.5
7.02
0
92.98
242
125
7.05
2.24
90.71
312
250
12.71
19.73
67.56
299
500
50.53
46.32
3.16
285
______________________________________________________
1188
1189
Baseline-logit model:
Baseline-logit model:
log
^non-live
^normal
!
^
log malformed
^normal
!
log
=
^non-live
^normal
!
=
3:969 + :0119(conc.)
+:000025(conc.)2
(:0000042)
(:191) (:0007)
=
4:952 + :014(conc.)
(:249) (:0008)
2:7824 :00168(conc.)
(:2009) (:00208)
^
log malformed
^normal
!
=
6:7156 + :0252(conc.)
(:7724) (:00512)
:00001(conc.)2
(:0000077)
Lack-of-t test:
G2 = 57.50 with 6 d.f. (p-value < .0001)
Lack-of-t test:
G2 = 4.51 with 4 d.f. (p-value = .341)
1190
1191
Continuous-ratio logit model:
log
^non-live
^malformed + normal
^
=
3:2479 + :00639(conc.)
(:1577)
(:000435)
G21 = 5:78 on 3 d.f.(:123)
log
^malformed
^normal
=
5:7019 + :0174(conc.)
(:3322)
(:00123)
G21 = 6:06 on 3 d.f.(:109)
Overall lack-of-t test:
G2 = G21 + G22 = 11:84 on 6 d.f.(0:066)
1192
1193
exp(.00639
100) = 1.9
exp(0.0174
100) = 5.7
odds of \non-live" increases by a factor
of 1.9 for each 100mg/kg/day increase in
concentration (1.73), 2.07)
given that a fetus survives, the odds of
malformation increase by a factor of 5.7
for every 100mg/kg/day increase in concentration (4.48, 7.25)
exp(-3.2479) = .039
odds that a fetus fails to survive a zero
concentration (.028, .053)
1194
1195
PROC LOGISTIC and PROC GENMOD in
SAS t a special form of the \cummulativelogit" model
Walker-Duncan model (Biometrika, 1967
pp 167-179).
this model has \proportional odds" constraint
!
1i
= 1 + 1X1i + + k Xki
log
1 1i
!
1i + 2i
= 2 + 1X1i + + k Xki
log
1 1i 2i
..
!
+ + J 1;i
log 1i
= j 1 + 1X1i + + k Xki
J;i
For the toxicity data:
log
log
^non-live
^malform + ^normal
^malform + ^non-live
^normal
=
4:5311 + :00962(conc)
(:1783)
=
(:00044)
3:1533 + :00962(conc)
(:1381)
(:00044)
Score test for the \proportional odds" hypothesis
X 2 = 267:62 on 1 d.f. (p-value < :0001)
G2 = G2Walker-Duncan G2cumulative logit model
= (1640:41) (1029:54 + 431:28)
= 179:6 on 1 d.f.
1196
/* This file is stored as diegdime.sas */
1197
/* Now put the result for each
response on a separate line */
data set2; set set1;
y=1; x=x1; p=p1; output;
y=2; x=x2; p=p2; output;
y=3; x=x3; p=p3; output;
keep conc conc2 y x p;
run;
/* Fit multi-response logit models
to the diegdime exposue data for
pregnant mice and make plots.
First enter the counts and
compute sample proportions. */
data set1;
input conc x1-x3;
conc2 = conc*conc;
p1 = x1/(x1+x2+x3);
p2 = x2/(x1+x2+x3);
p3 = x3/(x1+x2+x3);
cards;
0 15 1 281
62.5 17 0 225
125 22 7 283
250 38 59 202
500 144 132 9
run;
/* Add some more concentration
levels to be used to obtain
estimated proportions for plotting */
data setp;
do conc = 0 to 500 by 5;
do y = 1 to 3;
conc2=conc*conc;
x=0;
output; end; end;
run;
1198
data set2p; set set2 setp;
run;
1199
/* Fit two logistic regression models
corresponding to the baseline
logits. */
data setp1; set setp1;
if(y=1); run;
data setp2; set setp2;
if(y=2); run;
data set3; set set2p;
if(y=1 or y=3); run;
/* Merge the files containing the mles
and compute mles for probabilities
of the three responses */
proc logistic data=set3;
model y = conc conc2 / itprint
covb converge=.00001 maxiter=50;
output out=setp1 xbeta=xbeta1;
weight x;
run;
data setp2; merge setp1 setp2;
p1 = exp(xbeta1)/(1. + exp(xbeta1) + exp(xbeta2));
p2 = exp(xbeta2)/(1. + exp(xbeta1) + exp(xbeta2));
p3 = 1.-p1-p2;
keep conc p1 p2 p3;
run;
data set4; set set2p;
if(y=2 or y=3); run;
proc logistic data=set4;
model y = conc conc2 / itprint
covb converge=.00001 maxiter=50;
output out=setp2 xbeta=xbeta2;
weight x;
run;
proc sort data=setp2;
by conc; run;
proc print data=setp2;
run;
1201
1200
/* Use this to create graphs in Windows */
goptions cback=white colors=(black)
device=WIN target=ps
rotate=portrait;
/* Use this to produce a postscript plot
in the VINCENT system */
/* goptions cback=white colors=(black)
targetdevice=ps300 rotate=landscape;*/
axis1 label = (h=0.45 in r=0 a=90 f=swiss
'Proportion')
value =(h=0.35 in f=swiss)
length = 5.0 in
order = 0.0 to 1.0 by 0.2;
axis2 label = (h=0.40 in f=swiss
'DIEGdiME Concentration')
value=(h=0.30 in f=swiss)
length = 5.5 in
offset=(.1in)
order = 0 to 500 by 100;
1202
legend1 frame across=1 down=3
value = (h=0.28 in f=swiss
'Dead' 'Malformed' 'Normal')
shape = line(2 in)
offset = (0.5in)
label = (h=0.28in 'Outcomes');
symbol1 v=none i=spline l=1 h=2 w=3;
symbol2 v=none i=spline l=3 h=2 w=3;
symbol3 v=none i=spline l=7 h=2 w=3;
proc gplot data=setp2;
plot (p1 p2 p3)*conc /
overlay vaxis=axis1 haxis=axis2
legend = legend1;
title ls=0.5 in h=0.5 in f=swiss
c=black ' DIEGdiME Data';
run;
1203
data setp1; set setp1;
if(y=1); run;
/* Fit two logistic regression models
corresponding to the continuousratio logits. */
data setp2; set setp2;
if(y=2); run;
data set3; set set2p;
if(y>2) then y=2; run;
/* Merge the files containing the mles
and compute mles for probabilities
of the three responses */
proc logistic data=set3;
model y = conc / itprint
covb converge=.00001 maxiter=50;
output out=setp1 xbeta=xbeta1;
weight x;
run;
data setp2; merge setp1 setp2;
p1 = exp(xbeta1)/(1. + exp(xbeta1));
p2 = exp(xbeta2)/((1. + exp(xbeta1))*(1.+exp(xbeta
p3 = 1.-p1-p2;
keep conc p1 p2 p3;
run;
data set4; set set2p;
if(y=2 or y=3); run;
proc logistic data=set4;
model y = conc / itprint
covb converge=.00001 maxiter=50;
output out=setp2 xbeta=xbeta2;
weight x;
run;
proc sort data=setp2;
by conc; run;
proc print data=setp2;
run;
1205
1204
axis1 label = (h=0.45 in r=0 a=90 f=swiss
'Proportion')
value =(h=0.35 in f=swiss)
length = 5.0 in
order = 0.0 to 1.0 by 0.2;
axis2 label = (h=0.40 in f=swiss
'DIEGdiME Concentration')
value=(h=0.30 in f=swiss)
length = 5.5 in
offset=(.1in)
order = 0 to 500 by 100;
legend1 frame across=1 down=3
value = (h=0.28 in f=swiss
'Dead' 'Malformed' 'Normal')
shape = line(2 in)
offset = (0.5in)
label = (h=0.28in 'Outcomes');
1206
symbol1 v=none i=spline l=1 h=2 w=3;
symbol2 v=none i=spline l=3 h=2 w=3;
symbol3 v=none i=spline l=7 h=2 w=3;
proc gplot data=setp2;
plot (p1 p2 p3)*conc /
overlay vaxis=axis1 haxis=axis2
legend = legend1;
title ls=0.5 in h=0.5 in f=swiss
c=black ' DIEGdiME Data';
run;
/* Fit the Walker-Duncan model */
proc logistic data=set2;
model y = conc / itprint
covb converge=.00001 maxiter=50;
weight x;
run;
1207
Download