Analysis of collapsibility of categories in multidimensional contingency tables by stepwise MCA Svend Kreiner and Peter Gundelach Two-way tables Definition 1 Let X and Y be two categorical variables. We say that a set of Xcategories, {x1,..,xr}, are collapsible in the XY-table if P(Y = y | X = xi) = P(Y = y | X = xj) for all pairs {xi ,xj} {x1,..,xr} and all Y-categories. Y=1 Y=2 Y=3 X=1 p1 p2 p3 X=2 q1 q2 q3 X=3 q1 q2 q3 Multi-way tables Let X1,...,Xr be r categorical variables. We assume that variables are coded 1,...,mi where mi is the number of categories of the i’th variable. The stack procedure combines all these variables into one categorical variable with one category for each of the r original variables: Stack(x1,...,xr) = x1 i 2 ( xi 1) di r (1) where di j 1 mi i 1 (2) Definition 2 Let X = (X1,...,Xa) and Y = (Y1,...,Yb) be a partitioning of the variables summarized in an (a+b)-dimensional contingency table. A subset {z1,...,zr} of the combined categories of the stacked variable Z = Stack(X1,...,Xa) are collapsible in the XY-table if P(Y = y | X = zi) = P(Y = y | X = zj) for all pairs {zi ,zj} {z1,..,zr} and all Y-categories. B=1 B=2 A=1 A=2 A=1 A=2 (1) C = 1 (1) D=1 C = 2 (2) C = 1 (3) D=2 C = 2 (4) (2) (3) (4) An example: Political attitudes in Denmark, 1981-1999 Table 1. Means and standard deviations of political attitudes in Denmark in 1981, 1990 and 1999. Year n Mean S.d. 1981 870 5.68 1.88 1990 934 5.74 1.90 1999 553 5.50 2.07 40 Percent 30 20 Year 10 1981 1990 9 8 7 10: far right 6 5 4 3 1999 2 1: Far left 0 Political attitude Fig 1. Distribution of political attitude in Denmark in 1981,1990 and 1990. Table 2. Distribution of political attitudes in 1981, 1990 and 1999 Year 1 2 3 % % % 1981 1.3 2.6 7.0 1990 0.7 1.9 9.4 1999 3.6 3.6 10.8 Political attitude 4 5 6 7 8 9 10 n % % % % % % % 7.7 34.5 19.8 9.3 9.8 2.9 4.9 870 12.0 28.1 12.8 14.0 13.9 4.0 3.1 934 9.0 28.4 11.0 13.7 13.9 3.1 2.7 553 Column categories are collapsible: Table 3. Political attitudes in 1981, 1990 and 1999 Year 1981 1990 1999 1-2 % 4.1 2.7 7.2 Political attitude 3-4 5-6 7-9 % % % 14.7 54.3 22.0 21.4 40.9 31.9 19.9 39.4 30.7 10 % 4.9 3.1 2.7 n 934 870 553 Table 4. Preferred political parties in 1981, 1990 and 1999 vie w i n te 9 9 T 8 9 9 o 1 0 9 t , P S C A o o 7 8 1 7 5 5 6 6 % 3 % % % % i n D C e o 2 3 2 8 0 9 4 3 % 8 % % % % i n D C e o 7 3 3 4 0 3 8 1 F o % 9 % % % % i n C C e o 5 1 6 4 0 5 9 % 0 % % % % i n S C o o 3 3 6 3 7 7 3 7 % 7 % % % % i n D C a o 2 2 2 2 % 3 % % i n K C r i o 1 1 1 3 0 7 0 7 % 1 % % % % i n V C e o 5 0 6 3 9 8 3 0 l i b % 1 % % % % i n F C r e o 2 4 7 3 6 7 6 % 4 % % % % i n E C n o 2 1 2 5 2 7 0 9 v e n % 4 % % % % i n T C o o 2 3 7 3 0 2 8 0 % 0 % % % % i n Table 5. Trends in political attitudes during 1981 – 1999 among voters of the Danish political parties. p A: Social democrats -0.20 0.0002 B: The radical left -0.41 0.0006 C: The Conservative -0.24 0.0036 CD: Centrum democrats -0.53 0.0014 F: The peoples socialist party -0.16 0.0758 DF: The Danish People Party n.a. n.a. Q: The Christian Peoples party -0.29 0.1608 V : The liberal left -0.14 0.0332 Z: The Progressive Party -0.06 0.6600 Ø: The Unity Party 0.04 0.7884 Party Table 6. Frequencies of collapsed year*party categories. Collapsed categories are denoted by different fonts B 1 A 2 B 3 C 4 CD 5 SF 6 DF 7 Q 8 V 9 Z 10 Ø 81 155 1.19 2.38 8.33 40.48 29.37 9.92 5.56 2.78 18 1.19 2.38 8.33 40.48 29.37 9.92 5.56 2.78 67 3.37 0.00 0.00 7.87 13.48 13.48 39.33 22.47 4 0.00 1.56 0.78 16.41 23.44 28.91 22.66 6.25 34 12.81 33.88 26.86 19.42 1.65 3.31 2.07 0.00 0 9 0.00 4.92 8.20 27.87 6.56 6.56 34.43 11.48 54 0.00 1.56 0.78 16.41 23.44 28.91 22.66 6.25 23 0.00 1.56 0.78 16.41 23.44 28.91 22.66 6.25 22 59.09 31.82 0.00 9.09 0.00 0.00 0.00 0.00 90 261 2.51 10.80 15.58 44.72 13.82 7.04 5.03 0.50 38 1.19 2.38 8.33 40.48 29.37 9.92 5.56 2.78 131 0.00 0.42 2.94 9.66 11.34 21.43 46.64 7.56 47 0.00 1.56 0.78 16.41 23.44 28.91 22.66 6.25 129 12.81 33.88 26.86 19.42 1.65 3.31 2.07 0.00 0 16 1.19 2.38 8.33 40.48 29.37 9.92 5.56 2.78 107 0.00 0.42 2.94 9.66 11.34 21.43 46.64 7.56 45 0.00 4.92 8.20 27.87 6.56 6.56 34.43 11.48 17 12.81 33.88 26.86 19.42 1.65 3.31 2.07 0.00 99 113 2.51 10.80 15.58 44.72 13.82 7.04 5.03 0.50 24 2.51 10.80 15.58 44.72 13.82 7.04 5.03 0.50 37 2.07 5.70 3.11 15.03 9.33 27.46 33.16 4.15 15 1.19 2.38 8.33 40.48 29.37 9.92 5.56 2.78 62 12.81 33.88 26.86 19.42 1.65 3.31 2.07 0.00 22 3.37 0.00 0.00 7.87 13.48 13.48 39.33 22.47 10 1.19 2.38 8.33 40.48 29.37 9.92 5.56 2.78 156 2.07 5.70 3.11 15.03 9.33 27.46 33.16 4.15 7 0.00 4.92 8.20 27.87 6.56 6.56 34.43 11.48 19 63.16 15.79 5.26 E 0.00 0.00 0.00 0.00 Nine views of category collapsibility Conditional independence Collapsibility over variables in graphical models for multidimensional tables Differential item functioning Stepwise exploratory model search Multiple comparisons Cluster analysis Classification Latent classes Property spaces Collapsibility over variables in graphical models for multidimensional tables Figure 2. Independence variable with two connected X-variables and three Y variables. Two variables, Y1 and Y3 are assumed to be conditionally independent. Assume that X1*X2 categories are collapsible Include C defined by the collapsed categories in the model: Figure 3. Augmented graph corresponding to the graph in Figure 2 with a category collapsed variable C combining categories of Stack(X1,X2) This model is collapsible over X1 and X2 onto the CY-table Differential item functioning Items: (I1,...,Ik) S is a scale summarizing item responses Y is an exogenous variable Definition 3. An index scale, X, is based on a set of items without DIF if items responses are conditionally independent of all exogenous variables, Y, given the index scale: (I1,...,Ik) ╨ Y | S Stack(Item 1,Item 2,Item3) Item 1 Item 2 Item 3 Score Y=0 Y=1 1 0 0 0 0 p10 p11 2 1 0 0 1 p20 p21 3 0 1 0 1 p30 p31 4 0 0 1 1 p40 p41 5 1 1 0 2 p50 p51 6 1 0 1 2 p60 p61 7 0 1 1 2 p70 p71 8 1 1 1 3 p80 p81 Figure 4. A table relating item responses to an exogenous variable Y. No DIF implies that rows 2-4 and rows 5-7 are collapsible. Stepwise exploratory Model Search C1 C2 C3 (C1 = C2 ) C3 (C1 = C3 ) C2 C1= C2 = C3 (C2 = C3 ) C1 Multiple Comparisons: Compare G groups with respect to some criteria, 1, ... , G i may be The mean of some variable (as in ANOVA) Odds ratio statistics (as in analysis of interaction in logistic regr.) The distribution of a categorical variable (as in category collaps) Correspondence analysis 1 Ø81 Ø99 SF81+SF90 0 +SF90+Ø90 A90+A99+B99 A81+B81+B90 -1 +Q90+Q99+CD99 Q81+Z90+Z99 CD81+CD90 +V81+Z81 -2 C99+V99 C90+V90 C81+DF99 -3 -1,5 -1,0 -,5 0,0 ,5 1,0 1,5 2,0 2,5 Korrespondence analyse af parti*år i forhold til vh-skala