Association Models Stat 557 Heike Hofmann Outline • Uniform Association • Row/Column Association • Fitting Association Models in R • Some Examples Association Models • Framework of loglinear models, i.e. we want to model counts • For ordinal variable X we have scores u1≤ u2 ≤... ≤ up (most commonly ui=i) similarly, for ordinal variable Y assume scores vj • describe interaction term using scores • Note: main effects in X are still modeled in terms of the categories Association Models • Uniform association λXY ij = βui vj • λXY ij XY λ = βijj ui = βj ui Z Y X + λ + λ log m = λ + λ ijk i jmodel k Row/Column effects X Y Z XY XY log mXY = λ + λ + λ + λ + λ XY ijk ij λ i = βijvj k λij = βiji vj λij = βj ui Y Z XZ YZ + λ + λ + λ log mijk = λ + λX + λ j i k ik jk • XY XY XY Xλ Y = βu Z v λXZ YβZ v(not XYlinear) Column & Row effects model = βu v + λ + λ + λ + λ log mijk =λλij+ λ= + λ i j i j ij i ij ij j k ikij jk Y XY Z XZ YZ XY XY Z log mijk = λ + λX + λ + λ + λ + λ + λ + λ βj i jλij k= βiik ij jk ijk XY Y Z X log mijk = λ + λi + λj + λλ k ij = βui vj Example: Mental Health • SES of parents and mental health of 1660 students recorded. • SES is measured from A=high to F= low, • Mental Health is rated in four levels from well to poor Mental Health prodplot(data=mh, count~status+ses, c("vspine","hspine"))+aes(fill=status) status impaired moderate mild well A B C D E F Mental Health • Model of independence obviously does not fit • But: there is some pattern discernible: as socio-economic status improves, mental health improves steadily • Idea: make use of ordinal structure in variables Prepping the data > summary(mh) ses status A:4 impaired:6 B:4 moderate:6 C:4 mild :6 D:4 well :6 E:4 F:4 count Min. : 21.00 1st Qu.: 54.00 Median : 64.50 Mean : 69.17 3rd Qu.: 82.00 Max. :141.00 • Make two versions of the same variable: one as a factor, one as the numeric scores mh$sesn <- as.numeric(mh$ses) mh$statusn <- as.numeric(mh$status) factors scores glm(formula = count ~ ses + status + sesn:statusn, family = poisson(link = log), data = mh) Deviance Residuals: Min 1Q Median -1.2663 -0.3285 0.2025 3Q 0.3912 Max 1.0820 Mental Health Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.83169 0.09545 40.143 < 2e-16 *** sesB 0.17682 0.09881 1.790 0.07353 . sesC 0.57035 0.11955 4.771 1.84e-06 *** sesD 1.08802 0.14528 7.489 6.94e-14 *** sesE 0.93471 0.17854 5.235 1.65e-07 *** sesF 0.94357 0.20883 4.518 6.23e-06 *** statusmoderate 0.26461 0.09399 2.815 0.00487 ** statusmild 1.08882 0.12912 8.432 < 2e-16 *** statuswell 0.70991 0.17460 4.066 4.79e-05 *** sesn:statusn -0.09069 0.01501 -6.043 1.51e-09 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 uniform association (Dispersion parameter for poisson family taken to be 1) Null deviance: 217.3998 Residual deviance: 9.8951 AIC: 174.07 on 23 on 14 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 4 Raw vs Fitted status status status A B C D E F Combine ses A and B? impaired impaired impaired moderate moderate moderate mild mild well well well AA AB B B CC D D E E FF glm(formula = count ~ ses + status + sesn:status, family = poisson(link = log), data = mh) Deviance Residuals: Min 1Q -0.91405 -0.32206 Median 0.09134 3Q 0.28206 Mental Health Max 1.01943 Coefficients: (1 not defined because of singularities) Estimate Std. Error z value Pr(>|z|) (Intercept) 3.40215 0.15025 22.643 < 2e-16 *** sesB -0.20810 0.09325 -2.232 0.025634 * sesC -0.20056 0.10388 -1.931 0.053526 . sesD -0.06977 0.12221 -0.571 0.568039 sesE -0.61074 0.15445 -3.954 7.68e-05 *** sesF -0.99023 0.18850 -5.253 1.49e-07 *** statusmoderate 0.45446 0.18464 2.461 0.013844 * statusmild 1.02664 0.16609 6.181 6.36e-10 *** statuswell 0.82565 0.18492 4.465 8.01e-06 *** statusimpaired:sesn 0.30682 0.04894 6.270 3.62e-10 *** statusmoderate:sesn 0.16343 0.04888 3.343 0.000827 *** statusmild:sesn 0.14509 0.04423 3.281 0.001036 ** statuswell:sesn NA NA NA NA --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 217.3998 Residual deviance: 6.2808 AIC: 174.45 on 23 on 12 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 4 row association Row effects model well status mild impaired status Difference to the uniform association model barely visible moderate mild well moderate Goodness of fit test impaired Analysis of Deviance Table Model 1: Model 2: Resid. 1 2 count ~ ses + status + sesn:statusn count ~ ses + status + sesn:status Df Resid. Dev Df Deviance Pr(>Chi) 14 9.8951 12 6.2808 2 3.6144 0.1641 A B C ses D E F Conclusion: stick with the uniform association model, maybe collapse levels A & B into one Party Affiliation Gender Female:6 Male :6 Race Black:6 White:6 Party Democrat :4 Independent:4 Republican :4 Number Min. : 4.00 1st Qu.: 14.25 Median : 91.50 Mean : 83.42 3rd Qu.:130.50 Max. :176.00 Party Female Male Party Democrat Democrat Independent Independent Republican Republican Black White Introduce score variables Any binary variable is automatically also an ordinal variable: # two uniform associations? party$Partyn <- as.numeric(party$Party) # 'order' in gender and race doesn't matter, since it is binary: party$Gendern <- as.numeric(party$Gender) party$Racen <- as.numeric(party$Race) party.uniform <- glm(Number~Party+Race+Gender+Racen:Partyn+Gendern:Partyn, family=poisson(link=log), data=party) Party Affiliation glm(formula = Number ~ Party + Race + Gender + Racen:Partyn + Gendern:Partyn, family = poisson(link = log), data = party) Deviance Residuals: 1 2 0.148681 -0.014165 8 9 -0.165061 -0.163533 3 -0.117567 10 -0.380321 4 -0.008349 11 0.135880 5 0.107635 12 0.449677 6 0.076275 7 -0.111428 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.600823 0.145496 17.876 < 2e-16 *** PartyIndependent -2.862458 0.304212 -9.409 < 2e-16 *** PartyRepublican -5.410051 0.618489 -8.747 < 2e-16 *** RaceWhite -0.005983 0.228950 -0.026 0.979151 GenderMale -0.576295 0.158528 -3.635 0.000278 *** Racen:Partyn 1.135962 0.147876 7.682 1.57e-14 *** Partyn:Gendern 0.289683 0.075876 3.818 0.000135 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 711.35267 Residual deviance: 0.48532 AIC: 82.555 on 11 on 5 degrees of freedom degrees of freedom Number of Fisher Scoring iterations: 3 Uniform association model is almost perfect! Party & Ideology Self-proclaimed ideology and Party affiliation Affil Republican Independent Democrat Conservative Moderate Liberal Models Analysis of Deviance Table Model 1: count ~ Affil + Ideology Model 2: count ~ Affil + Ideology + Affiln:Ideon Model 3: count ~ Affil + Ideology + Affil:Ideon Resid. Df Resid. Dev Df Deviance Pr(>Chi) 1 4 105.662 2 3 17.618 1 88.044 < 2.2e-16 *** 3 2 2.815 1 14.803 0.0001194 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 • Main effect obviously not good (not independent) • Uniform fits better, but is not a good fit yet • Row effects for Affiliation almost perfect fit Party & Ideology Raw vs Fit Affil Conservative Moderate Liberal Affil Republican Republican Independent Independent Democrat Democrat Conservative Moderate Liberal Next time: Poisson rate regression