1 A. Analysis of count data Introduction to log-linear models 2 Log-linear analysis • Contingency-table analysis • Categorical data analysis • Discrete multivariate analysis (Bishop, Fienberg and Holland, 1975) • Analysis of cross-classified data • Multivariate analysis of qualitative data (Goodman, 1978) • Count data analysis 3 Contrast Coding Log-linear models for two-way tables Saturated log-linear model: ln λij μ μi μ j μij A μ B AB Overall effect (level) In case of 2 x 2 table: μ μ A i B j Main effects (marginal freq.) μ AB ij Interaction effect 4 observations 9 parameters Normalisation constraints 4 Survey: leaving parental home in the Netherlands Number leaving perantal home, by age and sex, 1961 birth cohort Age <20 >=20 Total Censored Total Female 135 143 278 13 291 Sex Male 74 178 252 40 292 Total 209 321 530 53 583 The survey (Sept. 1987 - Febr. 1988): Sample of 583 young adults born in 1961 530 left home before survey 53 censored cases 5 Leaving home Descriptive statistics • Counts • Percentages Age <20 >=20 Total Female 48.6 51.4 100.0 Sex Male 29.4 70.6 100.0 Total 39.4 60.6 100.0 Female 64.6 44.5 52.5 Sex Male 35.4 55.5 47.5 Total 100.0 100.0 100.0 • Odds of leaving home early rather than late Sex Female Male Total Odds 0.9441 0.4157 0.6511 Odds ratio (ref.cat: males): 2.271 Reference category 6 Leaving home Log-linear models for two-way tables 4 models Model 1: Null model or overall effect model All categories are equiprobable (an observation is equally likely to fall into any cell) ln λij μ for all i and j = 4.887 s.e. 0.0434 Exp(4.887) = 132.5 = 530/4 ij is expected count (frequency) in cell (ij): category i of variable A (row) and category j of variable B (column) 7 Leaving home μ 14 ln λij ln 4 λij ij ij Var μ 1 4 2 1 1 1 1 132.50 132.50 132.50 132.50 Var μ Var [ln λij] 1/ λij Where ij is a cell frequency generated by a Poisson process and Var[aX] = a2 Var[X] where a is a constant (e.g. Fingleton, 1984, p. 29) μ ln λij Var μ Var[ ln λij] ij ij 1 4 1 4 1 4 ln 2 ij λ ij 8 Leaving home Log-linear models for two-way tables Model 2: B null model Categories of variable B (sex) are equiprobable within levels of variable A (age) ln λij μ μi A GLIM estimate 4.649 μ μ μ A 1 A 2 s.e. Parameter 0.06914 Overall effect 0.0000 0.4291 for all j Exp(parameter) 104.5 TIME(1) 0.08886 TIME(2) 1.536 9 Leaving home Log-linear models for two-way tables Model 3: B null model Categories of variable A (age) are equiprobable within levels of variable B (sex) ln ij Bj SPSS μ μ μ A 1 A 2 estimate 5.773 s.e. 0.0558 -0.4283 0.0888 0.0000 for all j Parameter Exp(parameter) Overall effect 321.5 TIME(1) TIME(2) 0.6516 10 Leaving home Log-linear models for two-way tables Model 4: independence model (unsaturated model) Categories of variable B (sex) are not equiprobable but the probability is independent of levels of variable A (time) ln ij iA Bj ij exp[ iA Bj ] iA Bj estimate GLIM A2 B2 4.697 0.429 -0.098 s.e. Parameter 0.0806 Overall effect 0.0889 TIME(2) 0.0870 SEX(2) Exp(parameter) 109.62 1.536 0.906 11 Leaving home LOG-LINEAR MODEL: predictions Females leaving home early: 109.62 Females leaving home late: 109.62 * 1.536 = 168.37 Males leaving home early: 109.62 * 0.906 = 99.37 Males leaving home late: 109.62 * 1.536 * 0.906 = 152.63 12 Leaving home SPSS Parameter Estimate μ μ μ μ SE 1 5.0280 .0721 Overall effect 2 -.4291 .0889 Time(1) 3 .0000 . Time(2) 4 .0982 .0870 Sex(1) 5 .0000 . Sex (2) A 1 A 2 B 1 μ B 2 13 Leaving home Log-linear models for two-way tables Model 5: saturated model The values of categories of variable B (sex) depend on levels of variable A (time) ln λij μ μi μ j μij A GLIM estimate 4.905 μ μ μ μ B AB s.e. 0.08607 parameter Overall effect 0.05757 0.1200 TIME(2) -0.6012 0.1446 SEX(2) 0.8201 0.1831 TIME(2).SEX(2) A 2 B 2 AB 22 14 SPSS Parameter Estimate 1 5.1846 μ μ μ μ μ SE Parameter .0748 Overall effect A 1 A 2 2 -.8738 .1379 Time(1) 3 .0000 . 4 -.2183 .1121 Sex(1) 5 .0000 . 6 .8164 .1827 Time(1) * Sex(1) 7 .0000 . Time(1) * Sex(2) 8 .0000 . Time(2) * Sex(1) 9 .0000 . Time(2) * Sex(2) Time(2) B 1 B 2 μ μ μ μ Leaving home AB 11 AB 12 AB 21 AB 22 Sex(2) 15 Leaving home LOG-LINEAR MODEL: predictions Expected frequencies Fem_<20 Mal_<20 Fem_>20 Mal_>20 Observed F11 135 F12 74 F21 143 F22 178 Model 1 Model 2 Model 3 Model 4 Model 5 132.50 104.50 139.00 109.63 135.00 132.50 104.50 126.00 99.37 74.00 132.50 160.50 139.00 168.37 143.00 132.50 160.50 126.00 152.63 178.00 D:\s\1\liebr\2_2\2_2.wq2 16 Relation log-linear model and Poisson regression model ln λij μ μi μ j μij A B AB ln λ x x x ij x ,x , x 1i 2j 3ij 0 1 1i 2 2j 3 3ij are dummy variables (0 if i or j is equal to 1and1 if i or j equal to 2) and interaction variable is x3ij x1i * x 2j 17 Observed Age < 20 > 20 M 135 143 278 F 74 178 252 209 321 530 18 Model 1: Null Model Age < 20 > 20 M 132.5 132.5 265 F 132.5 132.5 265 265 265 530 ln ij A i 19 Model 2: B Null Model (sex equiprobable) Age < 20 > 20 M 104.5 160.5 265 F 104.5 160.5 265 209 321 530 ln ij A i 20 Model 3: A Null Model (age equiprobable) Age < 20 > 20 M 139 139 278 F 126 126 252 265 265 530 ln ij B j 21 Model 4: Independence Model (no interaction) Age < 20 > 20 M 109.63 168.37 278 F 99.37 152.63 252 209 321 530 ln ij A i B j 22 Model 5: Saturated Model Age < 20 > 20 M 135 143 278 F 74 178 252 209 321 530 ln ij A i B j AB ij 23 Log-linear model fit a model to a table of frequencies Data: survey of political attitudes of British electors OBSERVED FREQUENCIES FOR VOTE by gender Gender Party Male Female Total Conservative 279 352 631 Labour 335 291 626 Total 614 643 1257 Source: Payne, C. (1977) The log-linear model for contingency. In: C.O. Muircheartaigh and C. Payne eds. The analysis of survey data. Vol 2: Model fitting, Wiley, New York, pp. 105-144 [data p. 106].(from Butler and Stokes, ‘Political change in Britain’, Macmillan, 2nd edidition, 1974) 24 The classical approach Geometric means (Birch, 1963) Effect coding (mean is ref. Cat.) Birch, M.W. (1963) ‘Maximum likelihood in three-way contingency tables’,J. Royal Stat. Soc. (B), 25:220-233 25 Political attitudes The basic model Logarithm of frequencies Gender Party Male Female Conservative 5.6312 5.8636 Labour 5.8141 5.6733 Total 11.4453 11.5370 Overall effect : 22.98/4 = 5.7456 Effect of party : Conservative Labour Effect of gender : Male Female Total 11.4948 11.4875 22.9823 : 11.49/2 - 5.7456 = 0.0018 : 11.49/2 - 5.7456 = -0.0018 : 11.44/2 - 5.7456 = -0.0229 : 11.54/2 - 5.7456 = 0.0229 Interaction effects: Gender-Party interaction effect Male conservative : 5.6312 - 5.7456 - 0.0018 + 0.0229 = -0.0933 Female conservative : 5.8636 - 5.7456 - 0.0018 - 0.0229 = 0.0933 Male labour : 5.8141 - 5.7456 + 0.0018 + 0.0229 = 0.0933 Female labour : 5.6733 - 5.7456 + 0.0018 - 0.0229 = -0.0933 26 Political attitudes The basic model (Effect Coding: Mean) Birch, M.W. (1963) ‘Maximum likelihood in three-way contingency tables’,J. Royal Stat. Soc. (B), 25:220-233 Main effect Party effect Conservative Labour Gender effect Male Female Gender-Party interaction Male conservative Female conservative Male labour Female labour 5.7456 0.0018 -0.0018 -0.0229 0.0229 -0.0933 0.0933 0.0933 -0.0933 μ μ Coding: effect coding A μ 0 A i μ B μ AB μ 0 B j ij i i j j μ μ 0 AB i ij AB ij j Parameters are subject to constraints: normalisation constraints Only first-order contrasts can be estimated: μ -μ A A 2 1 27 Political attitudes The basic model (GLIM) Main effect Party effect Conservative Labour Gender effect Male Female Gender-Party interaction Male conservative Female conservative Male labour Female labour Estimate 5.6310 S.E. 0.0599 μ 0.0000 0.1829 . 0.0811 μ 0.0000 0.2324 . 0.0802 μ 0.0000 0.0000 0.0000 -0.3732 . . . 0.1133 μ A i B j AB ij 28 Political attitudes The basic model (SPSS) Main effect Party effect Conservative Labour Gender effect Male Female Gender-Party interaction Male conservative Female conservative Male labour Female labour Asymptotic 95% CI Lower Upper Estimate SE 5.6750 0.0586 5.56 5.79 0.1900 0.0000 0.0792 . 0.03 . 0.35 . 0.1406 0.0000 0.0801 . -0.02 . 0.30 . -0.3726 0.0000 0.0000 0.0000 0.1133 . . . -0.59 . . . -0.15 . . . 29 Political attitudes The basic model (1) ln λij μ μi μ j μij A B AB λ exp[μ μ μ μ ] ij A B AB i j ij ln 11 = 5.7456 + 0.0018 - 0.0229 - 0.0933 = 5.6312 ln 12 = 5.7456 + 0.0018 + 0.0229 + 0.0933 = 5.8636 ln 21 = 5.7456 - 0.0018 - 0.0229 + 0.0933 = 5.8142 ln 22 = 5.7456 - 0.0018 + 0.0229 - 0.0933 = 5.6734 Logarithm of frequencies Gender Party Male Female Conservative 5.6312 5.8636 Labour 5.8141 5.6733 Total 11.4453 11.5370 Total 11.4948 11.4875 22.9823 30 The design-matrix approach 31 I. Design matrix: Effect Coding unsaturated log-linear model ln λij μ μi μ j A ln λ11 1 ln 1 λ12 ln λ21 1 ln λ22 1 Y Xμ B 1 0 1 1 0 0 0 1 1 0 1 0 μ u 0 A u 1 1 u A2 0 B u1 1 B u 2 X'X Y -1 Number of parameters exceeds number of equations need for additional equations (X’X)-1 is singular identify linear dependencies 32 I. Design matrix unsaturated log-linear model ln λij μ μi μ j A μ -μ A A 2 1 B μ -μ B B 2 1 (additional eq.) Coding! ln λ11 1 1 1 ln 1 1 - 1 u λ12 u1A ln λ21 1 - 1 1 B u1 ln λ22 1 - 1 - 1 33 3 unknowns 3 equations 1 u ln λ11 1 1 ln 1 1 1 A λ12 u1B ln λ21 1 1 1 u1 u 1 A 1 u1B u1 1 1 1 1 1 1 1 1 ln λ11 ln λ12 ln λ21 u1 0 0.5 0.5 ln λ11 A 0.5 0 0.5 ln λ12 u1 u1B 0.5 - 0.5 0 ln λ21 A where ln λij μ μi μ j A B is the frequency predicted by the model 34 Political attitudes OBSERVED FREQUENCIES FOR VOTE by gender Gender Party Male Female Total Conservative 279 352 631 Labour 335 291 626 Total 614 643 1257 F1 631 614 308.22 λ11 F1 1257 F F2 λ21 F1 631 F 643 322.78 λ F 1257 F F2 λ22 F2 1 2 12 F F 626 614 305.78 1257 626 643 320.22 1257 PREDICTED FREQUENCIES FOR VOTE by gender Gender Party Male Female Total Conservative 308.22 322.78 631.00 Labour 305.78 320.22 626.00 Total 614.00 643.00 1257.00 35 Political attitudes A u1 0 0.5 0.5 ln λ11 u1 0 0.5 0.5 ln 308.22 A 0.5 ln A 0.5 ln 322.78 0 0.5 u λ 12 1 0 0.5 u1 B B u1 0.5 - 0.5 0 ln λ21 u1 0.5 - 0.5 0 ln 305.78 A A u1 0 A 0.5 u1 u1B 0.5 0.5 0 - 0.5 0.5 5.7308 5.74995 - 0.5 5.7770 0.00395 0 5.7229 - 0.02310 τ exp[ u ] 314.17 A exp[ A ] 1.0040 u1 τ1 B A τ1 exp[ u1 ] 0.9772 A B λ11 τ τ1 τ1 A B [ 1/ ] λ21 τ τ1 τ1 314.17*1.0040*0.9772 = 308.23 314.17*[1/1.0040]*0.9772 = 305.78 36 Design matrix Saturated log-linear model ln λij μ μi μ j μij A μ -μ μ -μ AB AB 12 11 A A 2 1 B AB μ -μ μ -μ B B 2 1 AB AB 21 11 μ μ AB AB 22 11 ln λ11 1 1 1 1 u ln 1 1 - 1 - 1 A λ12 u1 ln λ21 1 - 1 1 - 1 u1B AB ln 1 1 1 1 λ22 u11 37 Political attitudes u 1 A 1 u1 u1B 1 AB u11 1 1 1 -1 -1 -1 1 -1 -1 -1 1 1 1 -1 ln λ11 ln λ12 ln λ21 ln λ22 0.25 0.25 u 0.25 0.25 A 0.25 0.25 - 0.30 - 0.25 u1 B u1 0.25 - 0.25 0.25 - 0.25 AB 0.25 0 . 25 0 . 25 0 . 25 u11 5.6312 5.74555 5.8636 0.00185 5.8141 - 0.02290 5.6733 0.09330 λ11 exp[ μ μ1 μ1 μ11 ] exp[5.7456+0.0018-0.0229-0.0933] = exp[5.6312] = 279 A B AB λ21 exp[ μ μ2 μ1 μ21 ] exp[5.7456-0.0018-0.0229+0.0933] = 335 A B AB 38 LOG-LINEAR MODEL: expected frequencies Type of model Overall Party Gender Observed Model 1 Model 2 Model 3 Mal_Cons F11 F11 279 314.25 315.50 307.00 Fem_Cons F12 352 314.25 315.50 321.50 Mal_Labour F21 335 314.25 313.00 307.00 Fem_Labour F22 291 314.25 313.00 321.50 -------------------------------------------------------------------------Chi-square 11.58 11.54 10.9 Degrees of freedom 3 2 2 Political attitudes Unsatur. Model 4 308.22 322.78 305.78 320.22 Saturated Model 5 279.00 352.00 335.00 291.00 10.89 1 0 0 LOG-LINEAR MODEL: Parameters (EFFECT CODING: first category = 0) A. Additive model Type of model Overall Party Gender Unsatur. Satur. Main effect 5.7502 5.7542 5.7269 5.7308 5.6312 Gender effect 0.0000 0.0000 0.0461 0.0462 0.2324 Party effect 0.0000 -0.0080 0.0000 -0.0080 0.1829 Gender-Party interaction effect 0.0000 0.0000 0.0000 0.0000 -0.3732 B. Multiplicative model [exp(u)] Type of model Main effect Gender effect Party effect Gender-Party interaction effect Overall Party Gender Unsatur. Satur. 314.2500 315.5001 307.0007 308.2157 278.9967 0.0000 1.0000 1.0472 1.0472 1.2616 0.0000 0.9920 1.0000 0.9920 1.2007 0.0000 1.0000 1.0000 1.0000 0.6885 39 Other Ways of Restricting II. Design Matrix: Contrast Coding 40 III. Design matrix: other restrictions on parameters saturated log-linear model ln λij μ μi μ j μij A μ 0 A 2 μ 0 B 2 ln λ11 1 ln 1 λ12 ln λ21 1 ln λ22 1 B AB μ μ μ 0 AB AB AB 12 21 22 1 1 1 u 1 0 0 u1A 0 1 0 u1B 0 0 0 u11AB (SPSS) 41 Political attitudes u A u1 u1B AB u11 0 0 0 1 0 1 0 -1 1 0 -1 1 -1 -1 1 0 ln 279 0 ln 352 0 ln 335 0 ln 291 1 0 1 0 -1 1 0 -1 1 -1 -1 1 0 Coding 2 Coding 1 (SPSS) (Birch) 5.6750 5.7456 Main effect Party effect Conservative 0.1900 0.0019 Labour 0.0000 -0.0019 Gender effect Male 0.1406 -0.0229 Female 0.0000 0.0229 Gender-Party interaction Male conservative -0.3726 -0.0933 Female conservative 0.0000 0.0933 5.6312 5.8636 5.8141 5.6733 5.6750 0.1900 0.1406 0.3726 42 Political attitudes u A u1 u1B AB u11 0 0 0 1 0 1 0 -1 1 0 -1 1 -1 -1 1 0 ln 279 0 ln 352 0 ln 335 0 ln 291 1 0 1 0 -1 1 0 -1 1 -1 -1 1 0 5.6312 5.8636 5.8141 5.6733 Coding 2 Coding 1 (SPSS) (Birch) 5.6750 5.7456 Main effect Party effect Conservative 0.1900 Labour 0.0000 Gender effect Male 0.1406 Female 0.0000 Gender-Party interaction Male conservative -0.3726 Female conservative 0.0000 Male labour 0.0000 Female labour 0.0000 0.0019 -0.0019 -0.0229 0.0229 -0.0933 0.0933 0.0933 -0.0933 5.6750 0.1900 0.1406 0.3726 43 Political attitudes OBSERVED FREQUENCIES FOR VOTE BY SEX Sex Party Male Female Total Conservative 279 352 631 Labour 335 291 626 Total 614 643 1257 Parameter estimates Main effect Party effect Conservative Labour Gender effect Male Female Gender-Party interaction Male conservative Female conservative Male labour Female labour Contrast coding (SPSS) exp(mu) mu 291.49 5.6750 Contrast coding (GLIM) exp(mu) mu 279.00 5.6312 Effect coding (Birch) exp(mu) mu 312.80 5.7456 0.1900 0.0000 1.2092 1.0000 0.0000 0.1829 1.0000 0.0019 1.2007 -0.0019 1.0019 0.9982 0.1406 0.0000 1.1510 1.0000 0.0000 0.2324 1.0000 -0.0229 1.2616 0.0229 0.9774 1.0232 -0.3726 0.0000 0.0000 0.0000 0.6889 1.0000 1.0000 1.0000 0.0000 0.0000 0.0000 -0.3732 1.0000 -0.0933 1.0000 0.0933 1.0000 0.0933 0.6885 -0.0933 0.9109 1.0978 1.0978 0.9109 44 Political attitudes Parameters estimates and standard error Main effect Party effect Conservative Labour Gender effect Male Female Gender-Party interaction Male conservative Female conservative Male labour Female labour Contrast coding (SPSS) Param s.e. 5.6750 0.0586 Contrast coding (GLIM) Param s.e. 5.6312 0.0599 0.1900 0.0000 0.0792 0.0000 0.1829 0.1406 0.0000 0.0801 -0.3726 0.0000 0.0000 0.0000 0.1133 . . . . . 0.0000 0.2324 0.0000 0.0000 0.0000 -0.3732 . 0.0811 . 0.0802 . . . 0.1133 45 Political attitudes Prediction of counts or frequencies: A. Effect coding 279 = 312.80 * 0.97736 * 1.00185 * 0.91092 352 = 312.80 * 1.02316 * 1.00185 * 1.09779 335 = 312.80 * 0.97736 * 0.99815 * 1.09779 291 = 312.80 * 1.02316 * 0.99815 * 0.91092 B. Contrast coding: GLIM 291 = 279 * 1.2616 * 1.2007 * 0.6885 (females voting labour) 279 = 279 * 1 *1 *1 (males voting conservative = ref.cat) 352 = 279 * 1.2616 * 1 *1 (females voting conservative) 335 = 279 * 1 * 1.2007 * 1 (males voting labour) C. Contrast coding: SPSS (SPSS adds 0.5 to observed values ) 279.5 = 291.5 * 1.15096 * 1.20925 * 0.68894 352.5 = 291.5 * 1 * 1.20925 * 1 291.5 = 291.5 * 1 *1 * 1 (females voting labour = ref.cat) 335.5 = 291.5 * 1.15096 * 1 *1