Comparison of Several Multivariate Means Suggested Reading : Chapter Six Kazeem Adepoju, PhD July 9, 2019 Outlines • Univariate Analysis of Variance • Multivariate Analysis of Variance One way Analysis of Variance (ANOVA) Comparing k Populations The F test – for comparing k means Situation • We have k normal populations • Let mi and s denote the mean and standard deviation of population i. • i = 1, 2, 3, … k. • Note: we assume that the standard deviation for each population is the same. s1 = s2 = … = sk = s We want to test H 0 : m1 m2 m3 mk against H A : mi m j for at least one pair i, j The F statistic 1 k 1 F n x x k 1 N k 2 i i 1 k nj i x i 1 j 1 ij xi 2 where xij = the jth observation in the i th sample. i 1,2,, k and j 1,2,, ni ni xi x j 1 ij ni mean for i th sample i 1,2,, k k N ni Total sample size i k1 x ni x i 1 j 1 N ij Overall mean The ANOVA table Source S.S SS B ni xi x k Between 2 k 1 i 1 SSW xij xi k Within d.f, nj i 1 j 1 2 M.S. MS B N k MSW F 1 k 1 n x x 1 N k x k i 1 k F 2 i nj i 1 j 1 i MS B MSW ij xi 2 The ANOVA table is a tool for displaying the computations for the F test. It is very important when the Between Sample variability is due to two or more factors Computing Formulae: Compute ni 1) 2) Ti xij Total for sample i j 1 k k G Ti xij Grand Total i 1 k 3) i 1 ni x ij i 1 j 1 k 5) i 1 j 1 N ni Total sample size k 4) ni 2 Ti i 1 ni 2 The data • Assume we have collected data from each of k populations • Let xi1, xi2 , xi3 , … denote the ni observations from population i. • i = 1, 2, 3, … k. Then 1) SS Between 3) 2 Ti G N i 1 ni k 2) 2 k ni k 2 Ti SSW ithin xij i 1 j 1 i 1 ni 2 SS Between k 1 F SSW ithin N k Anova Table Source d.f. Sum of Squares Between k-1 SSBetween Mean Square MSBetween Within N-k SSWithin MSWithin Total N-1 SSTotal SS MS df F-ratio MSB /MSW Example In the following example we are comparing weight gains resulting from the following six diets 1. Diet 1 - High Protein , Beef 2. Diet 2 - High Protein , Cereal 3. Diet 3 - High Protein , Pork 4. Diet 4 - Low protein , Beef 5. Diet 5 - Low protein , Cereal 6. Diet 6 - Low protein , Pork Gains in weight (grams) for rats under six diets differing in level of protein (High or Low) and source of protein (Beef, Cereal, or Pork) Diet Mean Std. Dev. x x2 1 73 102 118 104 81 107 100 87 117 111 100.0 15.14 1000 102062 2 98 74 56 111 95 88 82 77 86 92 85.9 15.02 859 75819 3 94 79 96 98 102 102 108 91 120 105 99.5 10.92 995 100075 4 90 76 90 64 86 51 72 90 95 78 79.2 13.89 5 107 95 97 80 98 74 74 67 89 58 83.9 15.71 792 839 64462 72613 6 49 82 73 86 81 97 106 70 61 82 78.7 16.55 787 64401 Thus Ti 2 G 2 5272 2 SS Between 467846 4612.933 N 60 i 1 ni 2 k ni k T 2 SSW ithin xij i 479432 467846 11586 i 1 j 1 i 1 ni k SS Between k 1 4612.933 / 5 922.6 F 4.3 SSW ithin N k 11586 / 54 214.56 F0.05 2.386 with 1 5 and 2 54 Thus since F > 2.386 we reject H0 Anova Table Source d.f. Sum of Squares Between 5 4612.933 Mean Square 922.587 F-ratio 4.3** (p = 0.0023) SS Within 54 11586.000 Total 59 16198.933 214.556 * - Significant at 0.05 (not 0.01) ** - Significant at 0.01 Equivalence of the F-test and the t-test when k = 2 the t-test xy t 1 1 sPooled n m sPooled n 1sx2 m 1s 2y nm2 the F-test k 2 Between 2 Pooled s F s n x x 2 i i 1 i k n 1s i 1 2 i i k 1 k ni k i 1 n1 x1 x n2 x1 x n1 1s12 n1 1s12 n1 n2 2 2 denominato r s 2 2 pooled numerator n1 x1 x n2 x1 x 2 2 n1 x1 n2 x2 n1 x1 x n1 x1 n1 n2 2 n1n2 2 x x 1 2 2 n1 n2 2 n2 x2 x 2 n1 x1 n2 x2 n2 x2 n1 n2 n12 n2 2 x1 x2 2 n1 n2 2 2 n1 x1 x n2 x2 x 2 2 nn n n n1 n 2 1 2 2 2 1 2 2 x1 x2 2 n1n2 x1 x2 2 n1 n2 Hence F 1 1 1 n1 n2 1 x1 x2 2 x1 x2 2 1 1 s Pooled n1 n2 2 t2 The model Note: yij mi yij mi mi ij m mi m ij m i ij ij yij mi where 1 k m mi k i 1 i mi m a Note: i 1 i 0 has N(0,s2) distribution (overall mean effect) (Effect of Factor A) by their definition. Model 1: yij (i = 1, … , a; j = 1, …, n) are independent Normal with mean mi and variance s2. Model 2: yij mi ij where ij (i = 1, … , a; j = 1, …, n) are independent Normal with mean 0 and variance s2. Model 3: yij m i ij where ij (i = 1, … , a; j = 1, …, n) are independent Normal with mean 0 and variance s2 and a i 1 i 0 MANOVA Multivariate Analysis of Variance One way Multivariate Analysis of Variance (MANOVA) Comparing k p-variate Normal Populations The F test – for comparing k means Situation • We have k normal populations • Let mi and denote the mean vector and covariance matrix of population i. • i = 1, 2, 3, … k. • Note: we assume that the covariance matrix for each population is the same. 1 2 k We want to test H 0 : m1 m2 m3 mk against H A : mi m j for at least one pair i, j The data • Assume we have collected data from each of k populations • Let xi1 , xi 2 , , xin denote the n observations from population i. • i = 1, 2, 3, … k. Computing Formulae: Compute n 1) Ti xij Total vector for sample i j 1 n x 1ij T j 1 1i n x Tpi pij j 1 G1 k k 2) G Ti xij Grand Total vector i 1 i 1 j 1 G p ni 3) N kn Total sample size k n 2 x1ij i 1 j 1 k n 4) xij xij k n i 1 j 1 x1ij x pij i 1 j 1 5) 1 k 2 n T1i i 1 k 1 TT i i n i 1 k 1 T1iTpi n i 1 x1ij x pij i 1 j 1 k n 2 x pij i 1 j 1 k n 1 k T1iTpi n i 1 k 1 2 T pi n i 1 Let 1 k 1 H TT GG i i n i 1 N 1 k 2 G12 T1i N n i 1 k 1 T T G1G p 1i pi n N i 1 G1G p 1 k T1iTpi n i 1 N 1 k 2 G12 T1i n i 1 N k 2 n x x 1i 1 i 1 k n x1i x1 x pi x p i 1 n x1i x1 x pi x p i 1 k 2 n x pi x p i 1 k = the Between SS and SP matrix k Let n 1 k E xij xij TT i i n i 1 i 1 j 1 k n 1 k 2 2 x1ij n T1i i 1 i 1 j 1 k n 1 k x1ij x pij T1iTpi n i 1 i 1 j 1 k n 2 x1ij x1i i 1 j 1 k n x1ij x1i x pij x pi i 1 j 1 1 k x x T T 1ij pij 1i pi n i 1 j 1 i 1 k n k 1 2 x Tpi2 pij n i 1 i 1 j 1 k n x1ij x1i x pij x pi i 1 j 1 k n 2 x pij x pi i 1 j 1 k n = the Within SS and SP matrix The Manova Table Source Between Within SS and SP matrix h11 H h1 p e11 E e1 p h1 p hpp e1 p e pp There are several test statistics for testing H 0 : m1 m2 m3 mk against H A : mi m j for at least one pair i, j 1. Roy’s largest root 1 largest eigenvalue of HE1 This test statistic is derived using Roy’s union intersection principle 2. Wilk’s lambda (L) E 1 L H E HE1 I This test statistic is derived using the generalized Likelihood ratio principle 3. Lawley-Hotelling trace statistic T02 trHE1 sum of the eigenvalues of HE1 4. Pillai trace statistic (V) V trH H E 1 Example In the following study, n = 15 first year university students from three different School regions (A, B and C) who were each taking the following four courses (Math, biology, English and Sociology) were observed: The marks on these courses is tabulated on the following slide: The data Student 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Math 62 54 53 48 60 55 76 58 75 55 72 72 76 44 89 A Biology 65 61 53 56 55 52 71 52 71 51 74 75 69 48 71 English 67 75 53 73 49 34 35 58 60 69 64 51 69 65 59 Sociology Student 76 1 70 2 59 3 81 4 60 5 41 6 40 7 46 8 59 9 75 10 59 11 47 12 57 13 65 14 67 15 Educational Region B Math Biology English Sociology Student 65 55 35 43 1 87 81 59 64 2 75 67 56 68 3 74 70 55 66 4 83 71 40 52 5 59 48 48 57 6 61 47 46 54 7 81 77 51 45 8 77 68 42 49 9 82 84 63 70 10 68 64 35 44 11 60 53 60 65 12 94 88 51 63 13 96 88 67 81 14 84 75 46 67 15 Math 47 57 65 41 56 63 43 28 47 42 50 46 74 63 69 C Biology 47 69 71 64 54 73 62 47 54 44 53 61 78 66 82 English Sociology 98 78 68 45 77 62 68 58 86 64 88 76 84 78 65 58 90 78 79 73 89 89 91 82 99 86 94 86 78 73 Summary Statistics xA SA xB SB 63.267 61.600 58.733 60.133 160.638 104.829 -32.638 -47.110 104.829 92.543 -4.900 -22.229 -32.638 -4.900 155.638 128.967 -47.110 -22.229 128.967 159.552 76.400 69.067 50.267 59.200 141.257 155.829 45.100 60.914 155.829 185.924 61.767 71.057 45.100 61.767 96.495 93.371 60.914 71.057 93.371 123.600 xC 52.733 61.667 83.600 72.400 SC 156.067 116.976 53.814 35.257 116.976 136.381 3.143 -0.429 53.814 3.143 116.543 114.886 35.257 -0.429 114.886 156.400 x 15 15 15 xA xB xC 45 45 45 64.133 S Pooled 64.111 64.200 63.911 14 14 14 S A S B SC 42 42 42 152.654 125.878 22.092 16.354 125.878 138.283 20.003 16.133 22.092 20.003 122.892 112.408 16.354 16.133 112.408 146.517 Computations : n 1) Ti xij Total vector for sample i j 1 G1 k k 2) G Ti xij Grand Total vector i 1 i 1 j 1 G p ni Totals Grand Totals A B C G Math Biology English Sociology 949 924 881 902 1146 1036 754 888 791 925 1254 1086 2886 2885 2889 2876 3) N kn Total sample size = 45 k n 2 x1ij i 1 j 1 k n 4) xij xij k n i 1 j 1 x1ij x pij i 1 j 1 = 195718 191674 180399 182865 191674 191321 184516 184542 x1ij x pij i 1 j 1 k n 2 x pij i 1 j 1 k n 180399 184516 199641 193125 182865 184542 193125 191590 1 k 2 n T1i i 1 k 1 TT i i n i 1 k 1 T1iTpi n i 1 5) = 189306.53 186387.13 179471.13 182178.13 186387.13 185513.13 183675.87 183864.40 1 k T1iTpi n i 1 1 k 2 Tpi n i 1 179471.13 183675.87 194479.53 188403.87 182178.13 183864.40 188403.87 185436.27 Now 1 k 1 H TT GG i i n i 1 N = 4217.733333 1362.466667 -5810.066667 -2269.333333 1362.466667 552.5777778 -1541.133333 -519.1555556 -5810.066667 -1541.133333 9005.733333 3764.666667 -2269.333333 -519.1555556 3764.666667 1627.911111 = the Between SS and SP matrix k Let n 1 k E xij xij TT i i n i 1 i 1 j 1 k n 1 k 2 2 x1ij n T1i i 1 i 1 j 1 k n 1 k x1ij x pij T1iTpi n i 1 i 1 j 1 = 6411.467 5286.867 927.867 686.867 5286.867 5807.867 840.133 677.600 1 k x x T T 1ij pij 1i pi n i 1 j 1 i 1 k n k 1 2 x Tpi2 pij n i 1 i 1 j 1 k n 927.867 840.133 5161.467 4721.133 = the Within SS and SP matrix 686.867 677.600 4721.133 6153.733