Power comparison of nonparametric test for latent root of covariance matrix in two populations Shin-ichi Tsukada Faculty of Physical Sciences and Engineering, Meisei University, 2-1-1 Hodokubo, Hino, Tokyo 191-8506, Japan. tsukada@ge.meisei-u.ac.jp 1 Introduction. We compare the actual significance levels and the powers of nonparametric test for the hypothesis that the α-th largest latent root of covariance matrix for two populations is equivalent. In principal component analysis(PCA), the α-th largest latent root of covariance matrix represents a contribution of the α-th principal component. Although there are many books on PCA, we have hardly seen the hypothesis that the latent root for two populations is equivalent. Sugiyama and Ushizawa [SU98] propose a procedure applying Ansari-Bradley test which is testing the equivalence of variances, and simulate the accuracy. There are Kamat’s test, Jackknife test and Permutation test for the equivalence of variances. We investigate the suitability for procedures applying Kamat’s test, Jackknife test and Permutation test. We compare the actual significance levels and the powers of the above procedure and our procedures, and show that the procedure applying Permutation test is superior by simulation. 2 Test Procedure. (g) Suppose that {xi ; i = 1, . . . , ng }(g = 1, 2) are random observations from p-variate (g) population Λp (µg , Σ (g) ) with mean µg and covariance matrix Σ (g) . Let λα and (g) (g) γ α be the α-th largest latent root and the latent vector corresponding to λα , respectively. We consider the following hypothesis: (1) (2) H0 : λα = λα (= λα ), (1) (2) H1 : λα 6= λα . (g) We calculate the sample mean x̄g , the sample latent root lα (g) latent vector hα . Let (1) ′ Y i = h(1) α (x i Zj = (2) ′ h(2) α (x j − x̄1 ) ≡ (yαi − ȳα ), (i = 1, . . . , n1 ), − x̄2 ) ≡ (zαj − z̄α ), (j = 1, . . . , n2 ). and the sample 1714 Shin-ichi Tsukada Under the null hypothesis, the variance and the covariance of Y and Z are as follows: Var[Y ] = λα + O(n−1 1 ), Var[Z] = λα + O(n−1 2 ), Cov[Y , Z] = 0. The scores Y and Z have asymptotically the same distribution and do not correlate each other. Under the alternative hypothesis, the variance and the covariance are as follows: −1 Var[Y ] = λ(1) α + O(n1 ), −1 Var[Z] = λ(2) α + O(n2 ), Cov[Y , Z] = 0. The scores Y and Z do not have asymptotically the same distribution and do not correlate each other. Therefore we deal with testing the above hypothesis as the equivalence for variance of Y and Z. Sugiyama and Ushizawa [SU98] adopt Ansari-Bradley test for the equivalence of variances. We propose procedures applying Kamat’s test, Jackknife test and Permutation test. Nonparametric test requires the independence of each sample. But yαi and yαj are no longer independent, zαi and zαj are similar. Now we evaluate the degree of dependence. We omit the suffix representing the population and let E(xi ) = 0 without less of generality. When Σ = diag(λ1 , . . . , λp ) and λk is simple, the covariance of yαi and yαj is as follows: " E[yαi yαj ] = E p p X X # huα hvα xui xvj u=1 v=1 =− 2 n21 1 + 2 n1 + + 1 n21 1 n21 p X l6=α 21 λ2lα κ21 αl καl − p X l,u6=α u6=l p X 1 n21 p X 21 3 12 λ2uα (κ21 αu καu + κα καu ) u6=α 21 111 111 λuα λlα (κ21 ul καl + κulα κulα ) − 1 n21 p X 21 21 λ2vα (κ3α κ12 αv + καv καv ) v6=α 111 21 21 λvα λlα (κ111 vlα κvlα + κvl καl ) v,l6=α v6=l p X 12 111 111 −3 λuα λvα (κ12 ) αu καv + καuv καuv ) + O(n (1) u,v6=α where λαβ = (λα − λβ )−1 . The third moments denote κ3i = E(xi xi xi ), κ21 ij = 111 E(xi xi xj ), κ12 ij = E(xi xj xj ) and κijk = E(xi xj xk ). Though we do not express terms of higher order for the above expansion, the expansion consists of odd-order moments. For a symmetric population, E[yαi yαj ] = 0, the degree of dependence is very weak. This expansion shows that the degree of dependence is weak when the sample size is sufficiently large. Therefore, for large sample we may ignore the influence of dependence. But the degree of dependence is influenced by the third moments for an asymmetric population. This appears in the simulation result. Nonparametric test for latent root in two populations 1715 3 Simulation. First we simulate the actual significance levels for multivariate normal population, contaminated normal population and log-normal population. We set α = 1, the sample size as n1 = n2 = 20, 50 and 100, and the number of simulation as a hundred thousand. As the population we select multivariate normal distribution: (g) (g) (g) N (0, Σ3 ), N (0, Σ5 ) and N (0, Σ7 ) contaminated normal distribution: (g) (g) (g) (g) 0.4N (0, 2.0Σ3 ) + 0.6N (0, Σ3 ), 0.4N (0, 2.0Σ5 ) + 0.6N (0, Σ5 ) (g) (g) and 0.4N (0, 2.0Σ7 ) + 0.6N (0, Σ7 ), and log-normal distribution: LN (0, diag(1.1890, 0.6931, 0.4812)), LN (0, diag(2.0449, 1.5110, 0.9406, 0.6932, 0.4812)) and LN (0, diag(2.1840, 1.9234, 1.5110, 1.2156, 0.9406, 0.6931, 0.4812)), (g) (g) (g) = = diag(52.0, 16.0, 4.0, 2.0, 1.0), Σ7 = diag(7.5, 2.0, 1.0), Σ5 where Σ3 diag(70.0, 40.0, 16.0, 8.0, 4.0, 2.0, 1.0) and g=1,2. Table1, Table2 and Table3 represent the actual significance levels for the normal population, the contaminated normal population and the log-normal population, respectively. We indicate A as Ansari-Bradley test, K as Kamat’s test, J as Jackknife test, P1 as Permutation test by Good [GO00] and P2 as Permutation test by Aly [AL90] in each Table. The number in parentheses is the standard error and the significance level is 0.05. Table 1. Actual Significance Levels (Normal Population, Significance Level 5% n A K J P1 P2 3-dimension 20 50 100 .0444 .0484 .0501 (.0015) (.0019) (.0022) .0203 .0369 .0237 (.0012) (.0014) (.0015) .0424 .0482 .0490 (.0019) (.0026) (.0019) .0428 .0474 .0489 (.0018) (.0015) (.0031) .0697 .0662 .0620 (.0018) (.0018) (.0022) 5-dimension 20 50 100 .0447 .0492 .0488 (.0020) (.0027) (.0016) .0187 .0369 .0239 (.0011) (.0015) (.0011) .0417 .0477 .0487 (.0014) (.0018) (.0022) .0426 .0462 .0494 (.0016) (.0018) (.0025) .0695 .0648 .0622 (.0018) (.0025) (.0024) 7-dimension 20 50 100 .0298 .0412 .0449 (.0011) (.0020) (.0031) .0108 .0314 .0220 (.0009) (.0011) (.0014) .0215 .0354 .0436 (.0015) (.0012) (.0018) .0257 .0370 .0448 (.0017) (.0015) (.0014) .0488 .0550 .0580 (.0022) (.0019) (.0025) As a whole, the actual significance levels of Kamat’s test are not satisfying. The actual significance levels of Ansari-Bradley test, Jackknife test and Permutation test 1716 Shin-ichi Tsukada Table 2. Actual Significance Levels (Contaminated Normal Population, Significance Level 5% n A K J P1 P2 3-dimension 20 50 100 .0488 .0501 .0502 (.0016) (.0023) (.0030) .0180 .0366 .0221 (.0012) (.0021) (.0018) .0513 .0554 .0517 (.0026) (.0022) (.0021) .0407 .0469 .0488 (.0019) (.0022) (.0026) .0648 .0641 .0609 (.0030) (.0023) (.0027) 5-dimension 20 50 100 .0469 .0501 .0504 (.0019) (.0015) (.0019) .0182 .0343 .0241 (.0013) (.0010) (.0017) .0498 .0539 .0533 (.0026) (.0022) (.0013) .0402 .0460 .0474 (.0022) (.0021) (.0022) .0643 .0633 .0609 (.0018) (.0024) (.0019) 7-dimension 20 50 100 .0367 .0468 .0482 (.0018) (.0022) (.0019) .0081 .0240 .0185 (.0006) (.0016) (.0013) .0255 .0344 .0421 (.0016) (.0023) (.0017) .0227 .0333 .0415 (.0013) (.0014) (.0016) .0458 .0519 .0548 (.0026) (.0024) (.0025) Table 3. Actual Significance Levels (Log Normal Population, Significance Level 5% n A K J P1 P2 3-dimension 20 50 100 .2976 .4531 .5462 (.0037) (.0052) (.0057) .2652 .3997 .4637 (.0042) (.0063) (.0032) .0629 .0777 .0863 (.0016) (.0020) (.0031) .0167 .0285 .0387 (.0013) (.0018) (.0020) .0680 .0533 .0543 (.0025) (.0024) (.0015) 5-dimension 20 50 100 .3506 .5309 .6356 (.0065) (.0042) (.0050) .3121 .4082 .4893 (.0045) (.0033) (.0080) .0551 .0669 .0726 (.0018) (.0028) (.0027) .0104 .0184 .0242 (.0009) (.0009) (.0015) .0404 .0306 .0301 (.0016) (.0016) (.0010) 7-dimension 20 50 100 .2703 .4274 .5214 (.0045) (.0059) (.0044) .2221 .2619 .3358 (.0041) (.0057) (.0048) .0324 .0387 .0434 (.0009) (.0017) (.0023) .0048 .0083 .0121 (.0007) (.0011) (.0010) .0280 .0227 .0225 (.0015) (.0009) (.0019) by Good are satisfying in normal population. In contaminated normal population the ones of Ansari-Bradley test converge to 0.05 faster than the ones of Jackknife test and Permutation test by Good. In log-normal population, the ones of AnsariBradley test are not satisfying and the ones of Permutation test by Aly are satisfying slightly. Because of the covariance (1) of yαi and yαj , these results are expected. The expansion for the covariance of yαi and yαj consists of odd-order moments, and oddorder moments are zero in a symmetric population. Therefore, when the population is symmetric the degree of dependency for each score is weak. The large sample size is necessary to reduce the degree of dependency, when the population is asymmetric. The convergence to 0.05 of the actual significance levels depends on the proportion of the latent root lα . Next, we investigate the power of tests by simulation when the sample size is 20, 50 and 100. We make the alternative hypotheses as follows: Nonparametric test for latent root in two populations 1717 √ (2) (1) H3i : Σ3 = diag(7.5, 2.0, 1.0), Σ3 = diag(7.5 − δ3i / n2 , 2.0, 1.0), (1) H5i : Σ5 = diag(52.0, 16.0, 4.0, 2.0, 1.0), √ (2) Σ5 = diag(52.0 − δ5i / n2 , 16.0, 4.0, 2.0, 1.0), (1) H7i : Σ7 = diag(70.0, 40.0, 16.0, 8.0, 4.0, 2.0, 1.0), √ (2) Σ7 = diag(70.0 − δ7i / n2 , 40.0, 16.0, 8.0, 4.0, 2.0, 1.0), (i = 1, 2). As simulation for the actual significance levels, we set α = 1, the sample size as n1 = n2 = 20, 50 and 100, and the number of simulation as a hundred thousand. We substitute (δ31 , δ32 , δ51 , δ52 , δ71 , δ72 ) for (15, 20, 60, 80, 100, 130), (20, 30, 150, 200, 150, 200) and (30, 40, 250, 300, 200, 270) in n2 =20, 50 and 100, respectively. The upper part of Table 4 and Table 5 represents the power of test for the alternative hypotheses H31 , H51 and H71 , and the lower part does the power for the hypotheses H32 , H52 and H72 . Table 4 and Table 5 represent the power of test in normal population and contaminated normal population, respectively. Table 4. Powers of Test (Normal Population, Significance Level 5% n A K J P1 P2 A K J P1 P2 3-dimension 20 50 100 .1367 .2340 .4924 (.0029) (.0032) (.0083) .0835 .2047 .3061 (.0021) (.0044) (.0051) .1782 .3449 .6917 (.0025) (.0054) (.0057) .2753 .4674 .7965 (.0063) (.0058) (.0029) .3412 .5152 .8165 (.0065) (.0049) (.0036) 5-dimension 20 50 100 .0659 .2762 .6986 (.0022) (.0042) (.0048) .0337 .2374 .4895 (.0024) (.0027) (.0048) .0718 .4027 .8843 (.0026) (.0041) (.0029) .1233 .5275 .9364 (.0038) (.0042) (.0019) .1728 .5752 .9434 (.0044) (.0050) (.0022) 7-dimension 20 50 100 .0463 .1060 .2026 (.0018) (.0035) (.0030) .0195 .0871 .1073 (.0014) (.0020) (.0042) .0422 .1375 .3024 (.0024) (.0037) (.0051) .0851 .2302 .4312 (.0037) (.0052) (.0062) .1361 .2818 .4698 (.0024) (.0051) (.0072) .2472 (.0054) .1755 (.0027) .3462 (.0048) .4796 (.0093) .5419 (.0083) .0877 (.0026) .0481 (.0025) .1038 (.0037) .1752 (.0043) .2338 (.0050) .0609 (.0021) .0274 (.0017) .0619 (.0030) .1196 (.0036) .1791 (.0026) .5724 (.0037) .5294 (.0042) .7804 (.0025) .8628 (.0033) .8802 (.0021) .8185 (.0047) .6259 (.0051) .9560 (.0018) .9796 (.0013) .9816 (.0009) .7633 (.0049) .7302 (.0056) .9348 (.0023) .8179 (.0033) .8423 (.0035) .8942 (.0028) .7285 (.0042) .9845 (.0012) .9936 (.0009) .9943 (.0009) .1631 (.0042) .1371 (.0023) .2320 (.0044) .3544 (.0071) .4125 (.0066) .3413 (.0051) .1902 (.0063) .5172 (.0069) .6521 (.0059) .6851 (.0061) As a whole, the power of Kamat’s test is smaller than the other powers because the actual significance levels of Kamat’s test are not satisfying. Kamat’s test is not useful. Because the actual significance levels of Ansari-Bradley test, Jackknife test and Permutation test are comparatively preserved, we compare the powers of them. There is a similar tendency in normal polulation and contaminated-normal population. The two powers of Permutation test are a close value in the alternative 1718 Shin-ichi Tsukada Table 5. Powers of Test (Contaminated Normal Population, Significance Level 5% n A K J P1 P2 A K J P1 P2 3-dimension 20 50 100 .1245 .2070 .4220 (.0032) (.0052) (.0068) .0485 .1134 .1577 (.0019) (.0028) (.0028) .1198 .2265 .4771 (.0038) (.0047) (.0059) .1864 .3217 .6043 (.0044) (.0061) (.0056) .2572 .3915 .6707 (.0038) (.0055) (.0064) 5-dimension 20 50 100 .0641 .2372 .6151 (.0023) (.0042) (.0068) .0241 .1235 .2467 (.0014) (.0034) (.0034) .0655 .2538 .6725 (.0024) (.0040) (.0037) .0929 .3662 .7880 (.0029) (.0046) (.0029) .1387 .4428 .8378 (.0029) (.0059) (.0030) 7-dimension 20 50 100 .0508 .1032 .1856 (.0031) (.0027) (.0038) .0112 .0404 .0456 (.0009) (.0029) (.0027) .0340 .0783 .1704 (.0013) (.0039) (.0037) .0563 .1380 .2687 (.0022) (.0041) (.0041) .1043 .1981 .3372 (.0031) (.0048) (.0057) .2148 (.0042) .0894 (.0028) .2000 (.0048) .3107 (.0045) .4003 (.0037) .0834 (.0026) .0311 (.0019) .0818 (.0027) .1240 (.0035) .1798 (.0043) .0623 (.0036) .0147 (.0009) .0424 (.0013) .0742 (.0027) .1313 (.0033) .4996 (.0063) .2760 (.0044) .5320 (.0055) .6696 (.0033) .7431 (.0029) .7450 (.0038) .3399 (.0046) .8005 (.0036) .8863 (.0035) .9213 (.0031) .6893 (.0053) .3920 (.0049) .7080 (.0048) .6098 (.0049) .6912 (.0056) .8325 (.0040) .4020 (.0062) .8737 (.0034) .9378 (.0017) .9609 (.0019) .1508 (.0039) .0574 (.0033) .1208 (.0042) .2048 (.0046) .2808 (.0052) .3028 (.0022) .0749 (.0041) .2845 (.0039) .4194 (.0063) .5030 (.0051) hypothesis that the powers are large. They are larger than the power of AnsariBradley test. The power of Jackknife test is larger than the powers of Permutation tests under H52 and n1 = n2 = 50. But the powers of Permutation tests are larger than the powers of Ansari-Bradley test and Jackknife test in other wide alternative hypothesis. By simulation, we find that Permutation test by Good is superior to Ansari-Bradley test in a symmetric population. 4 Conclusions. In this paper, we show the actual significance levels and powers of procedures applying nonparametric test that is the equivalence of variances for two populations. By simulation, the test applying Ansari-Bradley test, Jackknife test and Permutation test may be useful for a symmetric population. One of the reasons may be that the degree of dependency for each score is weak under a symmetric population. We may recommend to use the procedure applying Permutation test from results of the power comparison. But we need to simulate in various situations. For example, when α = 2, when the sample size is different and when the other latent roots except λα are different, and so on. For an asymmetric population, all tests are not useful. It is necessary to develop a new method that is applicable for an asymmetric population. Nonparametric test for latent root in two populations 1719 5 Appendix. Kamat’s test Let {x1 , . . . , xm } and {y1 , . . . , yn } be random samples from two populations. We permute all samples, it assume that a a′ b b′ : : : : the the the the number number number number of of of of y x y x that is that is that is that is larger than xmax , larger than ymax , smaller than xmin , smaller than ymin . We test the equivalence of variance using S = a + b − (a′ + b′ ). Jackknife test Let x̄i = m X xk m−1 k6=i and Di2 = m X (xk − x̄i )2 m−2 k6=i be the sample average and the sample variance for the sample {x1 , . . . , xi−1 , xi+1 , . . . , xm }. In the same fashion, let ȳi = n X yk n−1 k6=i and Ei2 = n X (yk − ȳi )2 n−2 k6=i be the sample average and the sample variance for the sample {y1 , . . . , yi−1 , yi+1 , . . . , yn }. Let " m X (xk − x̄0 )2 S0 = log k=1 " T0 = log m−1 n X (yk − ȳ0 )2 k=1 n−1 # , Si = log Di2 (i = 1, . . . , m), x̄0 = m X xk k=1 # , Tj = log Ej2 (j = 1, . . . , n), ȳ0 = m n X xk k=1 n , . Compute Aj = mS0 − (m − 1)Si , (i = 1, . . . , m), Bj = nT0 − (m − 1)Tj , (j = 1, . . . , n). (2) We test the equivalence of variance using Ā − B̄ , Q= √ V1 + V2 where Ā = m X Ai i=1 m , V1 = m X (Ai − Ā)2 i=1 m(m − 1) , B̄ = n X Bj j=1 n , V2 = n X (Bj − B̄)2 j=1 n(n − 1) . The criterion Q is asymptotically distributed as the standard normal distribution. 1720 Shin-ichi Tsukada References [AL90] [MI68] [SU98] [GO00] [HW99] [PE01] Aly, EE. AA.: Simple test for dispersive ordering, Stat. Prob. Lett., 9, 323–325 (1990) Miller, R.G., Jr.: Jackknifing variances, Ann. Math. Stat., 38, 567–582 (1968) Sugiyama, T. and Ushizawa, K. : A non-parametric method to test equality of intermediate latent roots of two populations in a principal component analysis, J. Japan Statist. Soc., 28, 227–235 (1998) Good, PI.: Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses. Springer-Verlag, New York (2000). Hollander, M. and Wolfe, D.: Nonparametric Statistical Methods. Wiley, New York (1999) Pesarin, F.: Multivariate Permutation Tests: With Applications in Biostatistics. Wiley, New York (2001)