Connections between log-linear and Wisconsin Driver Data: logistic regression models Any log-linear model can be expressed as a logistic regression model { Logisitic regression requires specication of a response variable { Log-linear models address association among all of the variables Several log-linear models may correspond to the same logistic regression model (i=1) 'Male' (i=2) 'Female' Sex (S) (j=1) '16-36' (j=2) '36-55' (j=3) 'over 55' Age (A) (k=1) 'Disease' (k=2) 'Control' Disease (D) Violation (V) (`=1) (`=2) 'None' 'Some' 1176 Conditional independence of Disease status and Violation status given any combination of sex and age categories: V SA SD log(mijk`) = + Si + Aj + D k + ` + ij + ik AD AV SAD SAV +SV i` + jk + j` + ijk + ij` 1177 Polychotomous Logistic Regression There are J > 2 response categories: 8 > < ji = P r > : 9 > response is = in the j -th X1i; X2i; ; Xki> ; category for j = 1; 2; : : : ; J Conditional log-odds of a traÆc violation: log m m ij k 1 ij k 2 = = log (m ij k log (m 1) ij k Note that 2) + + + + 1 + S A D i j k V SA ij + SikD S AV +Si1V + AD + AV + SijAD + ij 1 jk j1 k + + + + 2 + S A i D j V SA k ij +i2 + j k + j 2 + ij k SV = 1 AV V 1 AV j 2 SV SV i j S AV i 2 + AV S AV ij 1 + SikD + ij 2 S AD 2 + 1 V + AD S AV ij 2 1178 J X j =1 ji = 1 for each i = 1; 2; : : : ; n. There are several ways to construct \simultaneous" logistic regression models for such data. 1179 Then Log-odds with respect to a baseline category (e.g., the last category) log 1i J i = 01 + 11X1i + 21X2i + + k1Xki log 2 = 02 + 12X1i + 22X2i + + k2Xki i Ji .. log J ji = exp(0j + 1j X1i + + kj Xki) 1+ ;i `=1 exp(0` + i`X1i + + kj Xki) for j = 1; 2; :::; J Ji = 1+ 1 = 0;J 1 + 1;J 1X1i + +K;J 1Xki J i JX1 JX1 j =1 1, and 1 exp(0j + 1j X1i + + kj Xki) 1180 1181 Adjacent-categories logits: PROC CATMOD in SAS uses this when the LOGISTIC response is applied to a variable with more than two categories. It can be used for nominal response variables. Other log-odds ! ! ji ji Ji log = log `i Ji `i ! ! log `i = log ji Ji Ji h i = 0j + 1j X1i + + kj Xki [0` + 1`X1i + + k`X`i] = (0j 0`) + (1j )1`)X1i + + (kj k`)Xki 1182 log 1i 2i ! log 2i 3i ! J 1;i J;i ! log = 01 + 11X1i + + k1Xki = 02 + 12X1i + + k2Xki .. = 0;J 1 + 1;J 1X1i + +k;J 1Xki equivalent to the \baselinecategory" logistic model sometimes used when the polychotomous response is an ordinal variable 1183 Example: (3 response categories) Cummulative logits: log 1 i 2 + 3 + + i i 3 + + i 1 + 2 + + log i i J 01 + 11 X1 + + 1X = 02 + 12 X1 + + 2X i i = Ji 1 + 2 i log log i Ji 1;i k ... k ki ki ! 0 + 2i log 1i = 02 + 12X1i + 22X2i = Xi2 3i Then 0 = ;J Ji + 1 + 1;J 1 X1i + k;J 0 1i = (2i + 3i)eX 1 0 1i + 2i = 3ieX 2 1i + 2i + pi3i = 1 i 1X ki used for ordinal response variables Does not yield the same estimates of fjig as the \baseline-category" ! 0 1i = 01 + 11X1i + 21X2i = Xi1 2i + 3i i logit model. log 12ii , for example, is generally not a linear function of the parameters 1184 1185 Continuation-ratio logits: and 1i Fit J 1 separate logistic regression models: 0 eX 1 = 0 1 + eX 1 0 0 eX 2 eX 1 = 0 0 (1 + eX 1)(1 + eX 2 ) 1 = 0 1 + eX 2 i 2i i 3i i i i i i 1 2 + 3 + + log i log i 3 + 4 + + i log i Cosequently, log 1i 2i ! i i 02 + 12 X1 + + 2X 1;i J = 0 .. . k i ;J k 1 + 1;J 1 X1i + +k;J 1 Xki 0 0 eX 1 (1 + eX 1 ) 7 = log 64 5 0 0 eX 2 eX 1 i = Ji i Ji 3 2 01 + 11 X1 + + 1X 2 i i = Ji i 1186 used for ordinal response variables is the conditional loglog +1 + + odds that a response falls in the j -th category given that it does not fall in a category preceeding the j -th category ji j ;i J;i 1187 ki ki Example: Toxicity study (Price, 1987) (from Agresti, pp 320{321) Administered diethylene-glycoldimethylether (DIEGdiME) to pregnant mice Each mouse was exposed to one of 5 concentrations for exactly 10 days early in the pregnancy { level 0 is a control group { each fetus was classifed as (1) non-live (2) malformed (3) normal Concentration Response rate (%) Number mg/kg/day non-live malformed normal exposed ______________________________________________________ 0 5.05 0.34 94.61 297 62.5 7.02 0 92.98 242 125 7.05 2.24 90.71 312 250 12.71 19.73 67.56 299 500 50.53 46.32 3.16 285 ______________________________________________________ 1188 1189 Baseline-logit model: Baseline-logit model: log ^non-live ^normal ! ^ log malformed ^normal ! log = ^non-live ^normal ! = 3:969 + :0119(conc.) +:000025(conc.)2 (:0000042) (:191) (:0007) = 4:952 + :014(conc.) (:249) (:0008) 2:7824 :00168(conc.) (:2009) (:00208) ^ log malformed ^normal ! = 6:7156 + :0252(conc.) (:7724) (:00512) :00001(conc.)2 (:0000077) Lack-of-t test: G2 = 57.50 with 6 d.f. (p-value < .0001) Lack-of-t test: G2 = 4.51 with 4 d.f. (p-value = .341) 1190 1191 Continuous-ratio logit model: log ^non-live ^malformed + normal ^ = 3:2479 + :00639(conc.) (:1577) (:000435) G21 = 5:78 on 3 d.f.(:123) log ^malformed ^normal = 5:7019 + :0174(conc.) (:3322) (:00123) G21 = 6:06 on 3 d.f.(:109) Overall lack-of-t test: G2 = G21 + G22 = 11:84 on 6 d.f.(0:066) 1192 1193 exp(.00639 100) = 1.9 exp(0.0174 100) = 5.7 odds of \non-live" increases by a factor of 1.9 for each 100mg/kg/day increase in concentration (1.73), 2.07) given that a fetus survives, the odds of malformation increase by a factor of 5.7 for every 100mg/kg/day increase in concentration (4.48, 7.25) exp(-3.2479) = .039 odds that a fetus fails to survive a zero concentration (.028, .053) 1194 1195 PROC LOGISTIC and PROC GENMOD in SAS t a special form of the \cummulativelogit" model Walker-Duncan model (Biometrika, 1967 pp 167-179). this model has \proportional odds" constraint ! 1i = 1 + 1X1i + + k Xki log 1 1i ! 1i + 2i = 2 + 1X1i + + k Xki log 1 1i 2i .. ! + + J 1;i log 1i = j 1 + 1X1i + + k Xki J;i For the toxicity data: log log ^non-live ^malform + ^normal ^malform + ^non-live ^normal = 4:5311 + :00962(conc) (:1783) = (:00044) 3:1533 + :00962(conc) (:1381) (:00044) Score test for the \proportional odds" hypothesis X 2 = 267:62 on 1 d.f. (p-value < :0001) G2 = G2Walker-Duncan G2cumulative logit model = (1640:41) (1029:54 + 431:28) = 179:6 on 1 d.f. 1196 /* This file is stored as diegdime.sas */ 1197 /* Now put the result for each response on a separate line */ data set2; set set1; y=1; x=x1; p=p1; output; y=2; x=x2; p=p2; output; y=3; x=x3; p=p3; output; keep conc conc2 y x p; run; /* Fit multi-response logit models to the diegdime exposue data for pregnant mice and make plots. First enter the counts and compute sample proportions. */ data set1; input conc x1-x3; conc2 = conc*conc; p1 = x1/(x1+x2+x3); p2 = x2/(x1+x2+x3); p3 = x3/(x1+x2+x3); cards; 0 15 1 281 62.5 17 0 225 125 22 7 283 250 38 59 202 500 144 132 9 run; /* Add some more concentration levels to be used to obtain estimated proportions for plotting */ data setp; do conc = 0 to 500 by 5; do y = 1 to 3; conc2=conc*conc; x=0; output; end; end; run; 1198 data set2p; set set2 setp; run; 1199 /* Fit two logistic regression models corresponding to the baseline logits. */ data setp1; set setp1; if(y=1); run; data setp2; set setp2; if(y=2); run; data set3; set set2p; if(y=1 or y=3); run; /* Merge the files containing the mles and compute mles for probabilities of the three responses */ proc logistic data=set3; model y = conc conc2 / itprint covb converge=.00001 maxiter=50; output out=setp1 xbeta=xbeta1; weight x; run; data setp2; merge setp1 setp2; p1 = exp(xbeta1)/(1. + exp(xbeta1) + exp(xbeta2)); p2 = exp(xbeta2)/(1. + exp(xbeta1) + exp(xbeta2)); p3 = 1.-p1-p2; keep conc p1 p2 p3; run; data set4; set set2p; if(y=2 or y=3); run; proc logistic data=set4; model y = conc conc2 / itprint covb converge=.00001 maxiter=50; output out=setp2 xbeta=xbeta2; weight x; run; proc sort data=setp2; by conc; run; proc print data=setp2; run; 1201 1200 /* Use this to create graphs in Windows */ goptions cback=white colors=(black) device=WIN target=ps rotate=portrait; /* Use this to produce a postscript plot in the VINCENT system */ /* goptions cback=white colors=(black) targetdevice=ps300 rotate=landscape;*/ axis1 label = (h=0.45 in r=0 a=90 f=swiss 'Proportion') value =(h=0.35 in f=swiss) length = 5.0 in order = 0.0 to 1.0 by 0.2; axis2 label = (h=0.40 in f=swiss 'DIEGdiME Concentration') value=(h=0.30 in f=swiss) length = 5.5 in offset=(.1in) order = 0 to 500 by 100; 1202 legend1 frame across=1 down=3 value = (h=0.28 in f=swiss 'Dead' 'Malformed' 'Normal') shape = line(2 in) offset = (0.5in) label = (h=0.28in 'Outcomes'); symbol1 v=none i=spline l=1 h=2 w=3; symbol2 v=none i=spline l=3 h=2 w=3; symbol3 v=none i=spline l=7 h=2 w=3; proc gplot data=setp2; plot (p1 p2 p3)*conc / overlay vaxis=axis1 haxis=axis2 legend = legend1; title ls=0.5 in h=0.5 in f=swiss c=black ' DIEGdiME Data'; run; 1203 data setp1; set setp1; if(y=1); run; /* Fit two logistic regression models corresponding to the continuousratio logits. */ data setp2; set setp2; if(y=2); run; data set3; set set2p; if(y>2) then y=2; run; /* Merge the files containing the mles and compute mles for probabilities of the three responses */ proc logistic data=set3; model y = conc / itprint covb converge=.00001 maxiter=50; output out=setp1 xbeta=xbeta1; weight x; run; data setp2; merge setp1 setp2; p1 = exp(xbeta1)/(1. + exp(xbeta1)); p2 = exp(xbeta2)/((1. + exp(xbeta1))*(1.+exp(xbeta p3 = 1.-p1-p2; keep conc p1 p2 p3; run; data set4; set set2p; if(y=2 or y=3); run; proc logistic data=set4; model y = conc / itprint covb converge=.00001 maxiter=50; output out=setp2 xbeta=xbeta2; weight x; run; proc sort data=setp2; by conc; run; proc print data=setp2; run; 1205 1204 axis1 label = (h=0.45 in r=0 a=90 f=swiss 'Proportion') value =(h=0.35 in f=swiss) length = 5.0 in order = 0.0 to 1.0 by 0.2; axis2 label = (h=0.40 in f=swiss 'DIEGdiME Concentration') value=(h=0.30 in f=swiss) length = 5.5 in offset=(.1in) order = 0 to 500 by 100; legend1 frame across=1 down=3 value = (h=0.28 in f=swiss 'Dead' 'Malformed' 'Normal') shape = line(2 in) offset = (0.5in) label = (h=0.28in 'Outcomes'); 1206 symbol1 v=none i=spline l=1 h=2 w=3; symbol2 v=none i=spline l=3 h=2 w=3; symbol3 v=none i=spline l=7 h=2 w=3; proc gplot data=setp2; plot (p1 p2 p3)*conc / overlay vaxis=axis1 haxis=axis2 legend = legend1; title ls=0.5 in h=0.5 in f=swiss c=black ' DIEGdiME Data'; run; /* Fit the Walker-Duncan model */ proc logistic data=set2; model y = conc / itprint covb converge=.00001 maxiter=50; weight x; run; 1207