On analysis of collapsibility of categories in multidimensional

advertisement
Analysis of collapsibility of categories in multidimensional contingency
tables by stepwise MCA
Svend Kreiner and Peter Gundelach
Two-way tables
Definition 1
Let X and Y be two categorical variables. We say that a set of Xcategories, {x1,..,xr}, are collapsible in the XY-table if P(Y = y | X = xi) =
P(Y = y | X = xj) for all pairs {xi ,xj}  {x1,..,xr} and all Y-categories.
Y=1 Y=2 Y=3
X=1
p1
p2
p3
X=2
q1
q2
q3
X=3
q1
q2
q3
Multi-way tables
Let X1,...,Xr be r categorical variables. We assume that variables are coded
1,...,mi where mi is the number of categories of the i’th variable. The stack
procedure combines all these variables into one categorical variable with
one category for each of the r original variables:
Stack(x1,...,xr) = x1  i  2 ( xi  1)  di
r
(1)
where
di   j 1 mi
i 1
(2)
Definition 2
Let X = (X1,...,Xa) and Y = (Y1,...,Yb) be a partitioning of the variables
summarized in an (a+b)-dimensional contingency table. A subset
{z1,...,zr} of the combined categories of the stacked variable Z =
Stack(X1,...,Xa) are collapsible in the XY-table if P(Y = y | X = zi) = P(Y
= y | X = zj) for all pairs {zi ,zj}  {z1,..,zr} and all Y-categories.
B=1
B=2
A=1 A=2 A=1 A=2
(1)
C = 1 (1)
D=1
C = 2 (2)
C = 1 (3)
D=2
C = 2 (4)
(2)
(3)
(4)
An example: Political attitudes in Denmark, 1981-1999
Table 1. Means and standard deviations of political
attitudes in Denmark in 1981, 1990 and 1999.
Year
n
Mean
S.d.
1981
870
5.68
1.88
1990
934
5.74
1.90
1999
553
5.50
2.07
40
Percent
30
20
Year
10
1981
1990
9
8
7
10: far right
6
5
4
3
1999
2
1: Far left
0
Political attitude
Fig 1. Distribution of political attitude in Denmark in 1981,1990 and 1990.
Table 2. Distribution of political attitudes in 1981, 1990 and 1999
Year
1
2
3
% % %
1981 1.3 2.6 7.0
1990 0.7 1.9 9.4
1999 3.6 3.6 10.8
Political attitude
4
5
6
7
8
9 10
n
% % % % % % %
7.7 34.5 19.8 9.3 9.8 2.9 4.9 870
12.0 28.1 12.8 14.0 13.9 4.0 3.1 934
9.0 28.4 11.0 13.7 13.9 3.1 2.7 553
Column categories are collapsible:
Table 3. Political attitudes in 1981, 1990 and 1999
Year
1981
1990
1999
1-2
%
4.1
2.7
7.2
Political attitude
3-4
5-6
7-9
%
%
%
14.7 54.3 22.0
21.4 40.9 31.9
19.9 39.4 30.7
10
%
4.9
3.1
2.7
n
934
870
553
Table 4. Preferred political parties in 1981, 1990 and 1999
vie
w
i
n
te
9
9
T
8
9
9
o
1
0
9
t
,
P
S
C
A
o
o
7
8
1
7
5
5
6
6
%
3
%
%
%
%
i
n
D
C
e
o
2
3
2
8
0
9
4
3
%
8
%
%
%
%
i
n
D
C
e
o
7
3
3
4
0
3
8
1
F
o
%
9
%
%
%
%
i
n
C
C
e
o
5
1
6
4
0
5
9
%
0
%
%
%
%
i
n
S
C
o
o
3
3
6
3
7
7
3
7
%
7
%
%
%
%
i
n
D
C
a
o
2
2
2
2
%
3
%
%
i
n
K
C
r
i
o
1
1
1
3
0
7
0
7
%
1
%
%
%
%
i
n
V
C
e
o
5
0
6
3
9
8
3
0
l
i
b
%
1
%
%
%
%
i
n
F
C
r
e
o
2
4
7
3
6
7
6
%
4
%
%
%
%
i
n
E
C
n
o
2
1
2
5
2
7
0
9
v
e
n
%
4
%
%
%
%
i
n
T
C
o o
2
3
7
3
0
2
8
0
%
0
%
%
%
%
i
n
Table 5. Trends in political attitudes during 1981 – 1999 among voters of the
Danish political parties.

p
A: Social democrats
-0.20
0.0002
B: The radical left
-0.41
0.0006
C: The Conservative
-0.24
0.0036
CD: Centrum democrats
-0.53
0.0014
F: The peoples socialist party
-0.16
0.0758
DF: The Danish People Party
n.a.
n.a.
Q: The Christian Peoples party
-0.29
0.1608
V : The liberal left
-0.14
0.0332
Z: The Progressive Party
-0.06
0.6600
Ø: The Unity Party
0.04
0.7884
Party
Table 6. Frequencies of collapsed year*party categories.
Collapsed categories are denoted by different fonts
B
1
A
2
B
3
C
4
CD
5
SF
6
DF
7
Q
8
V
9
Z
10
Ø
81
155
1.19
2.38
8.33
40.48
29.37
9.92
5.56
2.78
18
1.19
2.38
8.33
40.48
29.37
9.92
5.56
2.78
67
3.37
0.00
0.00
7.87
13.48
13.48
39.33
22.47
4
0.00
1.56
0.78
16.41
23.44
28.91
22.66
6.25
34
12.81
33.88
26.86
19.42
1.65
3.31
2.07
0.00
0
9
0.00
4.92
8.20
27.87
6.56
6.56
34.43
11.48
54
0.00
1.56
0.78
16.41
23.44
28.91
22.66
6.25
23
0.00
1.56
0.78
16.41
23.44
28.91
22.66
6.25
22
59.09
31.82
0.00
9.09
0.00
0.00
0.00
0.00
90
261
2.51
10.80
15.58
44.72
13.82
7.04
5.03
0.50
38
1.19
2.38
8.33
40.48
29.37
9.92
5.56
2.78
131
0.00
0.42
2.94
9.66
11.34
21.43
46.64
7.56
47
0.00
1.56
0.78
16.41
23.44
28.91
22.66
6.25
129
12.81
33.88
26.86
19.42
1.65
3.31
2.07
0.00
0
16
1.19
2.38
8.33
40.48
29.37
9.92
5.56
2.78
107
0.00
0.42
2.94
9.66
11.34
21.43
46.64
7.56
45
0.00
4.92
8.20
27.87
6.56
6.56
34.43
11.48
17
12.81
33.88
26.86
19.42
1.65
3.31
2.07
0.00
99
113
2.51
10.80
15.58
44.72
13.82
7.04
5.03
0.50
24
2.51
10.80
15.58
44.72
13.82
7.04
5.03
0.50
37
2.07
5.70
3.11
15.03
9.33
27.46
33.16
4.15
15
1.19
2.38
8.33
40.48
29.37
9.92
5.56
2.78
62
12.81
33.88
26.86
19.42
1.65
3.31
2.07
0.00
22
3.37
0.00
0.00
7.87
13.48
13.48
39.33
22.47
10
1.19
2.38
8.33
40.48
29.37
9.92
5.56
2.78
156
2.07
5.70
3.11
15.03
9.33
27.46
33.16
4.15
7
0.00
4.92
8.20
27.87
6.56
6.56
34.43
11.48
19
63.16
15.79
5.26
E
0.00
0.00
0.00
0.00
Nine views of category collapsibility
 Conditional independence
 Collapsibility over variables in graphical models for multidimensional tables
 Differential item functioning
 Stepwise exploratory model search
 Multiple comparisons
 Cluster analysis
 Classification
 Latent classes
 Property spaces
Collapsibility over variables in graphical models
for multidimensional tables
Figure 2. Independence variable with two connected X-variables and three Y variables. Two variables, Y1 and
Y3 are assumed to be conditionally independent.
Assume that X1*X2 categories are collapsible
Include C defined by the collapsed categories in the model:
Figure 3. Augmented graph corresponding to the graph in Figure 2 with a category collapsed variable C
combining categories of Stack(X1,X2)
This model is collapsible over X1 and X2 onto the CY-table
Differential item functioning
Items: (I1,...,Ik)
S is a scale summarizing item responses
Y is an exogenous variable
Definition 3.
An index scale, X, is based on a set of items without DIF if items responses are
conditionally independent of all exogenous variables, Y, given the index scale:
(I1,...,Ik) ╨ Y | S
Stack(Item 1,Item 2,Item3) Item 1 Item 2 Item 3 Score Y=0 Y=1
1
0
0
0
0
p10
p11
2
1
0
0
1
p20
p21
3
0
1
0
1
p30
p31
4
0
0
1
1
p40
p41
5
1
1
0
2
p50
p51
6
1
0
1
2
p60
p61
7
0
1
1
2
p70
p71
8
1
1
1
3
p80
p81
Figure 4. A table relating item responses to an exogenous variable Y.
No DIF implies that rows 2-4 and rows 5-7 are collapsible.
Stepwise exploratory Model Search
C1  C2  C3
(C1 = C2 )  C3
(C1 = C3 )  C2
C1= C2 = C3
(C2 = C3 )  C1
Multiple Comparisons:
Compare G groups with respect to some criteria, 1, ... , G
i may be
The mean of some variable (as in ANOVA)
Odds ratio statistics (as in analysis of interaction in logistic regr.)
The distribution of a categorical variable (as in category collaps)
Correspondence analysis
1
Ø81
Ø99
SF81+SF90
0
+SF90+Ø90
A90+A99+B99
A81+B81+B90
-1
+Q90+Q99+CD99
Q81+Z90+Z99
CD81+CD90
+V81+Z81
-2
C99+V99
C90+V90
C81+DF99
-3
-1,5
-1,0
-,5
0,0
,5
1,0
1,5
2,0
2,5
Korrespondence analyse af parti*år i forhold til vh-skala
Download