Correspondence Analysis - Winona State University

Correspondence Analysis in R and JMP Correspondence analysis (CA) is a dimension reduction technique for contingency tables/cross-tabulations of nominal or ordinal variables. It is similar to principal components analysis for continuous variables. The data matrix for simple correspondence analysis is typically a two-way contingency table, but it could be any other table of non-negative ratio-scale data where relative values are of primary interest. Singular value decomposition (SVD) of a properly scaled matrix is used to achieve the dimension reduction to typically k = 2 or k = 3 dimensions. Multiple correspondence analysis (MCA) allows for larger dimensional situations, such as a three-way contingency table. We will not dig too deep into the theory in this course, especially for MCA, but we will consider the how SVD is used to create the lower dimensional representation of the data. Correspondence Analysis in R The data for this example are taken from a study of suicides in the former West Germany in the years 1974 to 1977, reported by Van der Heijden and de Leeuw (1985). Nine methods of suicide were tabulated by sex and age category. The primary interest here is in the variation of suicide patterns by age and by sex. The data can be regarded as a two-way contingency table, with the 34 age/sex categories forming the rows and the nine methods of suicide forming the columns. > Suicide pois cookgas toxgas hang drown gun knife jump other m1015 4 0 0 247 1 17 1 6 9 m1520 348 7 67 578 22 179 11 74 175 m2025 808 32 229 699 44 316 35 109 289 m2530 789 26 243 648 52 268 38 109 226 m3035 916 17 257 825 74 291 52 123 281 m3540 1118 27 313 1278 89 299 53 78 198 m4045 926 13 250 1273 89 299 53 78 198 m4550 855 9 203 1381 71 347 68 103 190 m5055 684 14 136 1282 87 229 62 63 146 m5560 502 6 77 972 49 151 46 66 77 m6065 516 5 74 1249 83 162 62 92 122 m6570 513 8 31 1360 75 164 56 115 95 m7075 425 5 21 1268 90 121 44 119 82 m7580 266 4 9 866 63 78 30 79 34 m8085 159 2 2 479 39 18 18 46 19 1 m8590 m90p w1015 w1520 w2025 w2530 w3035 w3540 w4045 w4550 w5055 w5560 w6065 w6570 w7075 w7580 w8085 w8590 w90p 70 18 28 353 540 454 530 688 566 716 942 723 820 740 624 495 292 113 24 1 0 0 2 4 6 2 5 4 6 7 3 8 8 6 8 3 4 1 0 1 3 11 20 27 29 44 24 24 26 14 8 4 4 1 2 0 0 259 76 20 81 111 125 178 272 343 447 691 527 702 785 610 420 223 83 19 16 4 0 6 24 33 42 64 76 94 184 163 245 271 244 161 78 14 4 10 2 1 15 9 26 14 24 18 13 21 14 11 4 1 2 0 0 0 9 4 0 2 9 7 20 14 22 21 37 30 35 38 27 29 10 6 2 18 6 10 43 78 86 92 98 103 95 129 92 140 156 129 129 84 34 7 10 2 6 47 67 75 78 110 86 88 131 92 114 90 46 35 23 2 0 > dim(Suicide) [1] 34 9 > suicide.prop = Suicide/sum(Suicide) > suicide.prop m1015 m1520 m2025 m2530 m3035 m3540 m4045 m4550 m5055 m5560 m6065 m6570 m7075 m7580 m8085 m8590 m90p w1015 w1520 w2025 w2530 w3035 w3540 w4045 w4550 w5055 w5560 w6065 w6570 w7075 w7580 w8085 w8590 w90p pois 7.5e-05 6.6e-03 1.5e-02 1.5e-02 1.7e-02 2.1e-02 1.7e-02 1.6e-02 1.3e-02 9.5e-03 9.7e-03 9.7e-03 8.0e-03 5.0e-03 3.0e-03 1.3e-03 3.4e-04 5.3e-04 6.6e-03 1.0e-02 8.5e-03 1.0e-02 1.3e-02 1.1e-02 1.3e-02 1.8e-02 1.4e-02 1.5e-02 1.4e-02 1.2e-02 9.3e-03 5.5e-03 2.1e-03 4.5e-04 cookgas 0.0e+00 1.3e-04 6.0e-04 4.9e-04 3.2e-04 5.1e-04 2.4e-04 1.7e-04 2.6e-04 1.1e-04 9.4e-05 1.5e-04 9.4e-05 7.5e-05 3.8e-05 1.9e-05 0.0e+00 0.0e+00 3.8e-05 7.5e-05 1.1e-04 3.8e-05 9.4e-05 7.5e-05 1.1e-04 1.3e-04 5.6e-05 1.5e-04 1.5e-04 1.1e-04 1.5e-04 5.6e-05 7.5e-05 1.9e-05 toxgas 0.0e+00 1.3e-03 4.3e-03 4.6e-03 4.8e-03 5.9e-03 4.7e-03 3.8e-03 2.6e-03 1.4e-03 1.4e-03 5.8e-04 4.0e-04 1.7e-04 3.8e-05 0.0e+00 1.9e-05 5.6e-05 2.1e-04 3.8e-04 5.1e-04 5.5e-04 8.3e-04 4.5e-04 4.5e-04 4.9e-04 2.6e-04 1.5e-04 7.5e-05 7.5e-05 1.9e-05 3.8e-05 0.0e+00 0.0e+00 hang 0.00465 0.01088 0.01316 0.01220 0.01553 0.02406 0.02397 0.02600 0.02414 0.01830 0.02352 0.02561 0.02388 0.01631 0.00902 0.00488 0.00143 0.00038 0.00153 0.00209 0.00235 0.00335 0.00512 0.00646 0.00842 0.01301 0.00992 0.01322 0.01478 0.01149 0.00791 0.00420 0.00156 0.00036 drown 1.9e-05 4.1e-04 8.3e-04 9.8e-04 1.4e-03 1.7e-03 1.7e-03 1.3e-03 1.6e-03 9.2e-04 1.6e-03 1.4e-03 1.7e-03 1.2e-03 7.3e-04 3.0e-04 7.5e-05 0.0e+00 1.1e-04 4.5e-04 6.2e-04 7.9e-04 1.2e-03 1.4e-03 1.8e-03 3.5e-03 3.1e-03 4.6e-03 5.1e-03 4.6e-03 3.0e-03 1.5e-03 2.6e-04 7.5e-05 gun 3.2e-04 3.4e-03 6.0e-03 5.0e-03 5.5e-03 5.6e-03 5.6e-03 6.5e-03 4.3e-03 2.8e-03 3.1e-03 3.1e-03 2.3e-03 1.5e-03 3.4e-04 1.9e-04 3.8e-05 1.9e-05 2.8e-04 1.7e-04 4.9e-04 2.6e-04 4.5e-04 3.4e-04 2.4e-04 4.0e-04 2.6e-04 2.1e-04 7.5e-05 1.9e-05 3.8e-05 0.0e+00 0.0e+00 0.0e+00 knife 1.9e-05 2.1e-04 6.6e-04 7.2e-04 9.8e-04 1.0e-03 1.0e-03 1.3e-03 1.2e-03 8.7e-04 1.2e-03 1.1e-03 8.3e-04 5.6e-04 3.4e-04 1.7e-04 7.5e-05 0.0e+00 3.8e-05 1.7e-04 1.3e-04 3.8e-04 2.6e-04 4.1e-04 4.0e-04 7.0e-04 5.6e-04 6.6e-04 7.2e-04 5.1e-04 5.5e-04 1.9e-04 1.1e-04 3.8e-05 jump 0.00011 0.00139 0.00205 0.00205 0.00232 0.00147 0.00147 0.00194 0.00119 0.00124 0.00173 0.00217 0.00224 0.00149 0.00087 0.00034 0.00011 0.00019 0.00081 0.00147 0.00162 0.00173 0.00185 0.00194 0.00179 0.00243 0.00173 0.00264 0.00294 0.00243 0.00243 0.00158 0.00064 0.00013 other 1.7e-04 3.3e-03 5.4e-03 4.3e-03 5.3e-03 3.7e-03 3.7e-03 3.6e-03 2.7e-03 1.4e-03 2.3e-03 1.8e-03 1.5e-03 6.4e-04 3.6e-04 1.9e-04 3.8e-05 1.1e-04 8.9e-04 1.3e-03 1.4e-03 1.5e-03 2.1e-03 1.6e-03 1.7e-03 2.5e-03 1.7e-03 2.1e-03 1.7e-03 8.7e-04 6.6e-04 4.3e-04 3.8e-05 0.0e+00 2 > suicide.condprop = Suicide/apply(Suicide,1,sum) > suicide.condprop m1015 m1520 m2025 m2530 m3035 m3540 m4045 m4550 m5055 m5560 m6065 m6570 m7075 m7580 m8085 m8590 m90p w1015 w1520 w2025 w2530 w3035 w3540 w4045 w4550 w5055 w5560 w6065 w6570 w7075 w7580 w8085 w8590 w90p pois 0.014 0.238 0.316 0.329 0.323 0.324 0.291 0.265 0.253 0.258 0.218 0.212 0.195 0.186 0.203 0.178 0.159 0.412 0.630 0.626 0.541 0.538 0.522 0.456 0.476 0.435 0.436 0.394 0.353 0.369 0.387 0.408 0.441 0.421 cookgas 0.00000 0.00479 0.01250 0.01084 0.00599 0.00782 0.00409 0.00279 0.00518 0.00308 0.00211 0.00331 0.00230 0.00280 0.00256 0.00254 0.00000 0.00000 0.00357 0.00464 0.00715 0.00203 0.00379 0.00322 0.00399 0.00323 0.00181 0.00384 0.00382 0.00355 0.00625 0.00420 0.01562 0.01754 toxgas 0.000000 0.045859 0.089418 0.101292 0.090621 0.090646 0.078641 0.062907 0.050314 0.039568 0.031290 0.012826 0.009655 0.006298 0.002558 0.000000 0.008850 0.044118 0.019643 0.023202 0.032181 0.029442 0.033359 0.019324 0.015957 0.011993 0.008444 0.003841 0.001908 0.002365 0.000781 0.002797 0.000000 0.000000 hang 0.867 0.396 0.273 0.270 0.291 0.370 0.400 0.428 0.474 0.499 0.528 0.563 0.583 0.606 0.613 0.659 0.673 0.294 0.145 0.129 0.149 0.181 0.206 0.276 0.297 0.319 0.318 0.337 0.375 0.361 0.328 0.312 0.324 0.333 drown 0.00351 0.01506 0.01718 0.02168 0.02609 0.02577 0.02800 0.02200 0.03219 0.02518 0.03510 0.03103 0.04138 0.04409 0.04987 0.04071 0.03540 0.00000 0.01071 0.02784 0.03933 0.04264 0.04852 0.06119 0.06250 0.08487 0.09831 0.11762 0.12929 0.14429 0.12578 0.10909 0.05469 0.07018 gun 0.059649 0.122519 0.123389 0.111713 0.102609 0.086591 0.094055 0.107530 0.084721 0.077595 0.068499 0.067853 0.055632 0.054584 0.023018 0.025445 0.017699 0.014706 0.026786 0.010441 0.030989 0.014213 0.018196 0.014493 0.008644 0.009686 0.008444 0.005281 0.001908 0.000591 0.001563 0.000000 0.000000 0.000000 knife 0.00351 0.00753 0.01367 0.01584 0.01834 0.01535 0.01667 0.02107 0.02294 0.02364 0.02622 0.02317 0.02023 0.02099 0.02302 0.02290 0.03540 0.00000 0.00357 0.01044 0.00834 0.02030 0.01061 0.01771 0.01396 0.01707 0.01809 0.01680 0.01813 0.01597 0.02266 0.01399 0.02344 0.03509 jump 0.0211 0.0507 0.0426 0.0454 0.0434 0.0226 0.0245 0.0319 0.0233 0.0339 0.0389 0.0476 0.0547 0.0553 0.0588 0.0458 0.0531 0.1471 0.0768 0.0905 0.1025 0.0934 0.0743 0.0829 0.0632 0.0595 0.0555 0.0672 0.0744 0.0763 0.1008 0.1175 0.1328 0.1228 other 0.03158 0.11978 0.11285 0.09421 0.09908 0.05734 0.06228 0.05888 0.05401 0.03957 0.05159 0.03930 0.03770 0.02379 0.02430 0.02545 0.01770 0.08824 0.08393 0.07773 0.08939 0.07919 0.08340 0.06924 0.05851 0.06042 0.05549 0.05473 0.04294 0.02720 0.02734 0.03217 0.00781 0.00000 > suicide.rowprop = apply(suicide.prop,1,sum) > suicide.rowprop m1015 m1520 m2025 m2530 m3035 m3540 m4045 m4550 m5055 m5560 m6065 m6570 m7075 m7580 m8085 m8590 0.0054 0.0275 0.0482 0.0452 0.0534 0.0650 0.0599 0.0608 0.0509 0.0366 0.0445 0.0455 0.0410 0.0269 0.0147 0.0074 m90p w1015 w1520 w2025 w2530 w3035 w3540 w4045 w4550 w5055 w5560 w6065 w6570 w7075 w7580 w8085 0.0021 0.0013 0.0105 0.0162 0.0158 0.0185 0.0248 0.0234 0.0283 0.0408 0.0312 0.0392 0.0395 0.0318 0.0241 0.0135 w8590 w90p 0.0048 0.0011 > barplot(suicide.rowprop,xlab="Gender-Age") 3 > suicide.colprop = apply(suicide.prop,2,sum) > suicide.colprop pois cookgas toxgas hang drown gun knife jump other 0.33075 0.00476 0.04056 0.38370 0.04992 0.05882 0.01791 0.05252 0.06107 > barplot(suicide.colprop,xlab=”Method Utilized”) > suicide.mat = as.matrix(Suicide) > mosaicplot(suicide.mat,color=T,main=”Suicides in West Germany”) The mosaic plot above gives the breakdown of method chosen within each age/sex category. As some of the age/sex categories have very few people in them, the results are a bit hard to read. However, we can see some general trends with age and differences in methods chosen across gender. We will now see how correspondence 4 analysis can be used to visualize the relationships between age/sex and method used. Correspondence Analysis by Direct Computation The mathematics behind this method of summarizing a contingency table involves performing a SVD of the “residuals” from chi-square test of independence. 𝑟 𝑐 𝑟 2 𝑐 2 (𝑂𝑖𝑗 − 𝐸𝑖𝑗 ) (𝑝̂𝑖𝑗 − 𝑝̂ 𝑖∙ 𝑝̂ ∙𝑗 ) 2 𝜒 = ∑∑ = ∑∑ ~ 𝜒(𝑟−1)×(𝑐−1) 𝐸𝑖𝑗 𝑝̂ 𝑖∙ 𝑝̂ ∙𝑗 2 𝑖=1 𝑗=1 𝑖=1 𝑗=1 > chisq.test(Suicide) Pearson's Chi-squared test data: Suicide X-squared = 10061, df = 264, p-value < 2.2e-16 Here is general idea: 5 In terms of proportions the matrix C has elements 2 𝑐𝑖𝑗 = √𝜒𝑖𝑗 = 𝑝𝑖𝑗 − 𝑝𝑖∙ 𝑝∙𝑗 √𝑝𝑖∙ 𝑝∙𝑗 The dimension of C for the suicide data is 34 rows and 9 columns. In order to form the elements of C we need to form the following matrices in R: > > > > > > suicide.prop = as.matrix(Suicide/sum(Suicide)) rdiag = diag(1/sqrt(suicide.rowprop)) cdiag = diag(1/sqrt(suicide.colprop)) suicide.rowtot = as.matrix(suicide.rowtot) suicide.coltot = as.matrix(suicide.coltot) suicide.prop = as.matrix(suicide.prop) > E = rdiag%*%(as.matrix(suicide.prop) -suicide.rowtot%*%t(suicide.coltot))%*%cdiag > E [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] [11,] [12,] [13,] [14,] [15,] [16,] [17,] [18,] [19,] [20,] [21,] [22,] [23,] [24,] [25,] [26,] [27,] [28,] [29,] [30,] [,1] -0.04034 -0.02669 -0.00582 -0.00069 -0.00312 -0.00309 -0.01679 -0.02820 -0.03048 -0.02423 -0.04130 -0.04396 -0.04763 -0.04124 -0.02689 -0.02283 -0.01375 0.00504 0.05350 0.06551 0.04598 0.04910 0.05230 0.03323 0.04252 0.03645 0.03236 0.02167 0.00771 0.01187 [,2] -5.1e-03 6.6e-05 2.5e-02 1.9e-02 4.1e-03 1.1e-02 -2.4e-03 -7.1e-03 1.4e-03 -4.7e-03 -8.1e-03 -4.5e-03 -7.2e-03 -4.7e-03 -3.9e-03 -2.8e-03 -3.2e-03 -2.5e-03 -1.8e-03 -2.3e-04 4.3e-03 -5.4e-03 -2.2e-03 -3.4e-03 -1.9e-03 -4.5e-03 -7.6e-03 -2.6e-03 -2.7e-03 -3.1e-03 [,3] -0.01475 0.00436 0.05327 0.06409 0.05744 0.06342 0.04626 0.02735 0.01093 -0.00094 -0.00971 -0.02938 -0.03105 -0.02791 -0.02290 -0.01732 -0.00726 0.00063 -0.01067 -0.01098 -0.00523 -0.00752 -0.00563 -0.01613 -0.02056 -0.02866 -0.02818 -0.03611 -0.03813 -0.03384 [,4] 0.0571 0.0032 -0.0393 -0.0390 -0.0346 -0.0056 0.0066 0.0176 0.0330 0.0358 0.0492 0.0616 0.0651 0.0589 0.0448 0.0382 0.0215 -0.0052 -0.0396 -0.0524 -0.0476 -0.0446 -0.0452 -0.0265 -0.0235 -0.0212 -0.0188 -0.0149 -0.0029 -0.0066 [,5] -1.5e-02 -2.6e-02 -3.2e-02 -2.7e-02 -2.5e-02 -2.8e-02 -2.4e-02 -3.1e-02 -1.8e-02 -2.1e-02 -1.4e-02 -1.8e-02 -7.7e-03 -4.3e-03 -2.5e-05 -3.5e-03 -3.0e-03 -8.0e-03 -1.8e-02 -1.3e-02 -6.0e-03 -4.4e-03 -9.9e-04 7.7e-03 9.5e-03 3.2e-02 3.8e-02 6.0e-02 7.1e-02 7.5e-02 [,6] 0.00025 0.04356 0.05846 0.04635 0.04172 0.02919 0.03554 0.04950 0.02409 0.01481 0.00842 0.00794 -0.00266 -0.00287 -0.01791 -0.01184 -0.00782 -0.00651 -0.01356 -0.02542 -0.01443 -0.02505 -0.02640 -0.02795 -0.03482 -0.04093 -0.03670 -0.04372 -0.04662 -0.04284 [,7] -0.00788 -0.01286 -0.00696 -0.00328 0.00074 -0.00487 -0.00226 0.00583 0.00848 0.00820 0.01310 0.00839 0.00351 0.00378 0.00463 0.00321 0.00603 -0.00479 -0.01100 -0.00711 -0.00898 0.00244 -0.00859 -0.00022 -0.00496 -0.00127 0.00025 -0.00163 0.00033 -0.00259 [,8] -0.01006 -0.00135 -0.00954 -0.00657 -0.00922 -0.03330 -0.02987 -0.02216 -0.02876 -0.01554 -0.01254 -0.00460 0.00194 0.00198 0.00334 -0.00252 0.00012 0.01476 0.01087 0.02111 0.02742 0.02430 0.01498 0.02030 0.00782 0.00616 0.00229 0.01270 0.01899 0.01851 [,9] -0.00874 0.03941 0.04602 0.02850 0.03555 -0.00384 0.00121 -0.00218 -0.00644 -0.01665 -0.00810 -0.01879 -0.01913 -0.02474 -0.01806 -0.01240 -0.00810 0.00393 0.00950 0.00859 0.01441 0.00999 0.01424 0.00506 -0.00174 -0.00052 -0.00399 -0.00508 -0.01457 -0.02445 6 [31,] [32,] [33,] [34,] 0.01511 3.3e-03 -0.03066 -0.0139 0.01567 -9.6e-04 -0.02176 -0.0135 0.01336 1.1e-02 -0.01398 -0.0067 0.00514 6.1e-03 -0.00660 -0.0027 5.3e-02 3.1e-02 1.5e-03 3.0e-03 -0.03665 0.00551 -0.02814 -0.00340 -0.01684 0.00287 -0.00795 0.00421 0.03270 0.03289 0.02433 0.01005 -0.02119 -0.01357 -0.01496 -0.00810 > suicide.sva = svd(E) > suicide.sva $d [1] 3.1e-01 2.7e-01 1.0e-01 7.1e-02 5.1e-02 3.1e-02 2.6e-02 2.4e-02 4.9e-17 $u [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] [11,] [12,] [13,] [14,] [15,] [16,] [17,] [18,] [19,] [20,] [21,] [22,] [23,] [24,] [25,] [26,] [27,] [28,] [29,] [30,] [31,] [32,] [33,] [34,] [,1] -0.138 -0.163 -0.179 -0.151 -0.144 -0.182 -0.201 -0.235 -0.185 -0.141 -0.152 -0.144 -0.114 -0.095 -0.032 -0.046 -0.028 0.026 0.147 0.216 0.169 0.190 0.192 0.173 0.193 0.227 0.217 0.254 0.247 0.255 0.235 0.182 0.096 0.045 [,2] -0.214 0.079 0.329 0.313 0.283 0.173 0.098 0.034 -0.072 -0.111 -0.187 -0.262 -0.298 -0.278 -0.226 -0.178 -0.100 0.026 0.169 0.197 0.181 0.151 0.166 0.053 0.038 -0.017 -0.035 -0.099 -0.176 -0.166 -0.109 -0.056 -0.023 -0.017 [,3] -0.1394 -0.0757 0.1278 0.1747 0.1479 0.1457 0.1167 0.0105 0.0240 -0.1105 -0.0756 -0.2032 -0.1514 -0.1155 -0.1065 -0.0979 -0.0556 -0.1106 -0.3294 -0.3326 -0.1998 -0.2253 -0.1858 -0.1347 -0.1544 0.0018 0.0771 0.2271 0.3165 0.3736 0.1958 0.0486 -0.1170 -0.0291 [,1] 0.486 -0.026 -0.346 -0.359 0.416 -0.509 -0.030 0.278 -0.060 [,2] 0.380 0.097 0.458 -0.633 -0.242 0.252 -0.075 -0.016 0.329 [,3] -0.290 0.119 0.383 -0.153 0.824 0.144 0.037 -0.175 -0.032 [,4] 0.0413 0.5724 0.3399 0.0864 0.1202 -0.4865 -0.2722 -0.0738 -0.1595 -0.1643 0.0139 0.0911 0.1382 0.0606 -0.0092 -0.0266 -0.0350 0.0907 -0.0714 -0.0854 0.1374 -0.0026 -0.0445 0.0404 -0.1552 -0.0915 -0.1144 0.0607 0.0995 0.0043 0.1062 0.1305 0.0262 -0.0017 [,5] -0.1164 -0.3123 0.0012 0.2368 0.0014 0.1409 -0.0406 0.0104 -0.1209 0.0372 -0.0703 0.0372 0.0593 0.1386 0.0921 0.0011 0.0547 0.1380 -0.1034 0.0294 0.1330 0.0818 -0.1028 0.0177 -0.1358 -0.2722 -0.3073 -0.2209 -0.0545 0.0413 0.3374 0.3165 0.4368 0.2091 [,6] -0.43758 0.00765 -0.08003 -0.13647 -0.06274 -0.31954 -0.02373 0.52413 0.14021 0.23640 0.06487 0.17603 -0.09030 -0.00925 -0.25406 -0.22181 -0.09727 -0.20771 0.10104 0.07245 -0.02952 0.09283 -0.15594 -0.04242 -0.09728 -0.04405 0.10523 -0.04252 -0.08752 0.05583 0.16690 -0.00084 0.01442 0.05107 [,7] 0.1539 0.1566 0.2709 -0.0063 -0.4603 0.2057 -0.0313 -0.0240 0.0813 0.0291 -0.3393 0.1047 -0.0064 0.1068 -0.1270 -0.0604 -0.2136 -0.1640 0.2820 0.1003 0.0893 -0.3897 -0.0633 -0.2173 0.1162 -0.0122 -0.0171 0.0378 -0.0331 0.1614 0.0060 0.0045 0.1944 0.0649 [,8] 0.2091 0.2196 -0.3765 -0.1096 0.0439 0.0637 0.3168 0.2999 -0.2833 -0.0318 -0.1901 -0.1379 0.0797 0.0786 -0.0768 -0.1129 -0.0957 0.2407 0.1321 0.0221 0.0823 -0.0101 0.0689 -0.0313 -0.0831 -0.1679 -0.0742 -0.0956 0.0027 0.2438 -0.0443 0.3027 -0.2209 -0.2030 [,9] 0.290 -0.182 0.051 0.035 0.064 -0.247 -0.016 0.066 -0.134 0.357 -0.237 -0.124 -0.282 0.081 0.036 0.031 -0.042 -0.040 -0.019 -0.180 -0.026 -0.061 -0.114 0.211 0.172 -0.310 0.086 0.410 -0.180 -0.229 -0.037 0.108 -0.035 0.078 $v [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [,4] -0.37 0.10 -0.36 -0.10 0.10 0.28 -0.11 0.55 0.56 [,5] -0.115 0.322 0.314 -0.054 -0.131 0.013 0.157 0.669 -0.541 [,6] 0.154 -0.275 -0.376 -0.221 0.076 0.620 0.470 -0.043 -0.313 [,7] [,8] [,9] 0.180 0.023 0.575 0.568 -0.677 0.069 -0.295 0.167 0.201 0.026 0.031 0.619 0.042 0.073 0.223 0.325 0.173 0.243 -0.584 -0.613 0.134 -0.185 0.241 0.229 -0.273 -0.207 0.247 7 > U = suicide.sva$u[,1:2] > V = suicide.sva$v[,1:2] > U[,1] = delta[1]*U[,1]/sqrt(suicide.rowtot) > U[,2] = delta[2]*U[,2]/sqrt(suicide.rowtot) > V[,1] = delta[1]*V[,1]/sqrt(suicide.coltot) > V[,2] = delta[2]*V[,2]/sqrt(suicide.coltot) > CA = rbind(U,V) > inertia = sum(delta^2) > inertia [1] 0.19 > per1 = delta[1]^2/inertia > per1 [1] 0.52 > per2 = delta[2]^2/inertia > per2 [1] 0.38 > options(digits=5) > + > > > > plot(CA[,1],CA[,2],type="n",xlab = paste("coord 1% inertia =",format(per1*100)), ylab = paste("coord 2% inertia =",format(per2*100))) text(CA[,1],CA[,2],labels=c(dimnames(Suicide)[[1]],dimnames(Suicide)[[2]])) abline(h=0,lty=2) abline(v=0,lty=2) title(main="Correspondence Analysis for West German Suicides (1974 - 1977)") 8 General Function for Conducting Correspondence Analysis (corresp) Here is function that performs all of the above operations given a two-way contingency table as input. You can change the title to whatever you would like to be. > corresp = function(x,title="Correspondence Analysis") { x.prop <- as.matrix(x/sum(x)) x.row <- apply(x.prop,1,sum) x.col <- apply(x.prop,2,sum) rdiag <- diag(1/sqrt(x.row)) cdiag <- diag(1/sqrt(x.col)) x.row <- as.matrix(x.row) x.col <- as.matrix(x.col) E <- rdiag%*%(x.prop - x.row%*%t(x.col))%*%cdiag x.sva <- svd(E) delta <- x.sva$d U <- x.sva$u[,1:2] V <- x.sva$v[,1:2] U[,1] <- delta[1]*U[,1]/sqrt(x.row) U[,2] <- delta[2]*U[,2]/sqrt(x.row) V[,1] <- delta[1]*V[,1]/sqrt(x.col) V[,2] <- delta[2]*V[,2]/sqrt(x.col) U <- rbind(U,V) inertia <- sum(delta[delta>0]*delta[delta>0]) per1 <- (delta[1]*delta[1]/inertia)*100 per2 <- (delta[2]*delta[2]/inertia)*100 dim1 <- dim(x)[1] ds <- as.integer(dim1+1) dim2 <- dim(x)[2] dt <- dim1 + dim2 plot(U[,1],U[,2],type="n",xlab=paste("coord 1 - % inertia =",format(per1)), ylab=paste("coord 2 - % inertia =",format(per2))) text(U[1:dim1,1],U[1:dim1,2],labels=dimnames(x)[[1]],col=2) text(U[ds:dt,1],U[ds:dt,2],labels=dimnames(x)[[2]],col=4) abline(h=0,v=0,lty=2) title(title) } > corresp(Suicide,title=”Correspondence Analysis of Suicides in West Germany (1974-1979)”) 9 Correspondence Analysis in JMP Right-click on the cells in the mosaic plot and select Cell Labeling > Show Percents. 10 SVD details 3-D Correspondence Analysis - uses JMP 12 (not released yet) 11 Multiple Correspondence Analysis (MCA) in R (will be available in JMP 12) Not all tables are two dimensional. As a simple example consider the following data dealing with graduate school admissions at the University of California – Berkley. The categorical variables are gender of the applicant, the program they are applying to, and whether or not they were admitted to the program. Thus we have three categorical/nominal variables to consider in our analysis that can stored in a data array. > UCBAdmissions , , Dept = A Gender Admit Male Female Admitted 512 89 Rejected 313 19 , , Dept = B Gender Admit Male Female Admitted 353 17 Rejected 207 8 , , Dept = C Sexual discrimination at UCB? Gender Admit Male Female Admitted 120 202 Rejected 205 391 , , Dept = D Gender Admit Male Female Admitted 138 131 Rejected 279 244 , , Dept = E Gender Admit Male Female Admitted 53 94 Rejected 138 299 , , Dept = F Gender Admit Male Female Admitted 22 24 Rejected 351 317 It appears that there is gender discrimination in graduate school admissions at UCB. However there is third dimension that is being ignored here, namely the programs (departments) students are applying to. 12 If we take department applied to into account this what we see. Conditioning on the program applied to, it is not clear there is gender discrimination. A higher percentage of female applicants are admitted in 4 of the 6 programs! This is an example of what is commonly referred to as Simpson’s Paradox. Multiple correspondence analysis will allow us to construct a 2-D display, which represents a dimension reduction, showing the relationship between the three categorical variables in these data. 13 > ucb.mca = mjca(UCBAdmissions) > ucb.mca = mjca(UCBAdmissions) > summary(ucb.mca) Principal inertias (eigenvalues): dim 1 2 3 4 5 value % cum% 0.114945 80.5 80.5 0.005694 4.0 84.5 00000000 0.0 84.5 00000000 0.0 84.5 00000000 0.0 84.5 -------- ----Total: 0.142840 scree plot ************************ * Columns: 1 2 3 4 5 6 7 8 9 10 name mass | Admit:Admitted | 129 | Admit:Rejected | 204 | Gender:Female | 135 | Gender:Male | 198 | Dept:A | 69 | Dept:B | 43 | Dept:C | 68 | Dept:D | 58 | Dept:E | 43 | Dept:F | 53 qlt 911 911 863 863 838 829 731 832 812 737 inr 93 59 95 65 117 124 108 106 117 116 | | | | | | | | | | k=1 365 -231 -399 272 512 573 -270 -110 -384 -355 cor 875 875 845 845 837 824 594 828 787 547 ctr 150 95 187 127 156 123 43 6 55 58 k=2 cor ctr | 74 36 123 | | -47 36 78 | | 59 19 84 | | -40 19 57 | | 13 1 2 | | -45 5 15 | | 130 137 199 | | 7 3 0 | | 69 25 35 | | -210 190 406 | Ho How does this plot help explain the apparent admission bias? 14 More examples of MCA International survey on attitudes towards science in the form of four Likert scale items (1 = strongly disagree,…, 5 = strongly agree (items A-D)) and demographic info for the respondents (sex, age, and education level). > head(wg93) # this data set is the ca library – notice there is a line for each respondent! A B C D sex age edu 1 2 3 4 3 2 2 3 2 3 4 2 3 1 3 4 3 2 3 2 4 2 3 2 4 2 2 2 2 1 2 3 5 3 3 3 3 1 5 2 6 3 4 4 5 1 3 2 > ?wg93 > dim(wg93) [1] 871 7 > wg93.mca = mjca(wg93) > plot(wg93.mca) > summary(wg93.mca) Principal inertias (eigenvalues): dim 1 2 3 4 5 6 7 8 9 10 11 12 value % cum% 0.028924 36.5 36.5 0.017038 21.5 58.1 0.005828 7.4 65.4 0.004047 5.1 70.6 0.002142 2.7 73.3 0.000845 1.1 74.3 0.000676 0.9 75.2 0.000383 0.5 75.7 0.000283 0.4 76.0 8.8e-050 0.1 76.1 2.9e-050 0.0 76.2 1e-06000 0.0 76.2 -------- ----Total: 0.079144 scree plot ************ ******* ** ** * 15 Columns: name 1 | A:1 2 | A:2 3 | A:3 4 | A:4 5 | A:5 6 | B:1 7 | B:2 8 | B:3 9 | B:4 10 | B:5 11 | C:1 12 | C:2 13 | C:3 14 | C:4 15 | C:5 16 | D:1 17 | D:2 18 | D:3 19 | D:4 20 | D:5 21 | sex:1 22 | sex:2 23 | age:1 24 | age:2 25 | age:3 26 | age:4 27 | age:5 28 | age:6 29 | edu:1 30 | edu:2 31 | edu:3 32 | edu:4 33 | edu:5 34 | edu:6 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mass 20 53 33 29 8 12 29 34 46 23 25 52 32 25 9 10 38 33 37 25 70 73 15 34 26 24 20 23 6 62 40 15 8 11 qlt 822 639 719 658 727 768 683 648 564 686 754 569 566 657 686 656 85 504 74 644 625 625 108 364 55 125 166 611 64 586 255 386 471 500 inr 33 23 29 30 37 38 29 27 25 36 36 23 29 32 37 34 26 29 26 30 18 18 33 27 28 28 30 31 34 22 25 34 32 32 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | k=1 306 135 -57 -243 -519 464 201 104 -144 -350 358 93 -93 -285 -414 169 -21 -15 -31 32 -85 82 -138 -104 -43 -18 69 249 131 148 -54 -237 -204 -222 > attributes(wg93.mca) $names [1] "sv" "lambda" [8] "levels.n" "nd" [15] "rowcoord" "rowpcoord" [22] "colinertia" "colcoord" [29] "Burt" "Burt.upd" cor ctr k=2 cor ctr 554 63 | 213 268 52 | 627 33 | -18 12 1 | 46 4 | -218 673 94 | 622 60 | 59 36 6 | 540 73 | 306 188 43 | 480 87 | 360 288 88 | 576 40 | -87 107 13 | 247 13 | -133 401 35 | 465 33 | -66 99 12 | 450 97 | 253 236 86 | 494 110 | 260 260 99 | 281 15 | -94 289 27 | 111 10 | -189 455 67 | 640 71 | 47 17 3 | 365 51 | 389 321 76 | 132 10 | 337 524 65 | 14 1 | -48 71 5 | 3 0 | -189 501 70 | 42 1 | -27 32 2 | 12 1 | 234 632 80 | 501 17 | -42 124 7 | 501 17 | 41 124 7 | 101 10 | 36 7 1 | 281 13 | 57 84 7 | 54 2 | -5 1 0 | 10 0 | -60 115 5 | 96 3 | -58 69 4 | 610 50 | 11 1 0 | 62 4 | 21 2 0 | 557 47 | -34 29 4 | 120 4 | -57 136 8 | 266 30 | 160 121 23 | 374 12 | 104 97 5 | 441 19 | 81 60 4 | "inertia.e" "nd.max" "rowctr" "colpcoord" "subinertia" "inertia.t" "rownames" "rowcor" "colctr" "JCA.iter" "inertia.et" "rowmass" "colnames" "colcor" "indmat" "levelnames" "rowdist" "colmass" "colsup" "call" "factors" "rowinertia" "coldist" "subsetcol" $class [1] "mjca" > attributes(wg93.mca) $names [1] "sv" "lambda" [8] "levels.n" "nd" [15] "rowcoord" "rowpcoord" [22] "colinertia" "colcoord" [29] "Burt" "Burt.upd" "inertia.e" "nd.max" "rowctr" "colpcoord" "subinertia" "inertia.t" "rownames" "rowcor" "colctr" "JCA.iter" "inertia.et" "rowmass" "colnames" "colcor" "indmat" "levelnames" "rowdist" "colmass" "colsup" "call" "factors" "rowinertia" "coldist" "subsetcol" 16 > plot3d(wg93.mca$rowcoord[,1:3],type="n",xlab="Dim1",ylab="Dim2",zlab="Dim3") > text3d(wg93.mca$rowcoord[,1:3],texts=wg93.mca$levelnames,col="blue") 17

Correspondence Analysis - Winona State University

Related documents

Products

Support

Correspondence Analysis - Winona State University

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib