12.4 An Introduction to Analysis of Variance Analysis of Variance (ANOVA): a statistical technique can be used to test the hypothesis that the means of 3 or more populations are equal. Motivating example: Objective: we want to compare the mean scores of the employees at 3 different plants. Let 1 : the mean score for plant 1 2 : the mean score for plant 2 3 : the mean score for plant 3 n1 x1 6 x i 1 1, i n1 n2 x2 x 2,i i 1 n2 x3 i 1 n3 3, i 1, i i 1 6 79 : the sample mean score for plant 1 6 n3 x x x 2,i i 1 6 74 : the sample mean score for plant 2 6 x i 1 6 3, i 66 : the sample mean score for plant 3 n n1 n2 n3 , nT n1 n2 n3 18 . 1 x 6 s12 i 1 x1 2 1, i x 6 s22 i 1 x2 2 2,i 20 : the sample variance of the scores for plant 2. 5 x 6 s32 34 : the sample variance of the scores for plant 1. 5 i 1 x3 2 3, i 5 32 : the sample variance of the scores for plant 1 We want to test H 0 : 1 2 3 vs. H a : not all population means are equal 0.05 . with 3 assumptions for the above problem: 1. The scores for the employees in each plant are normally distributed. 2. The variance of the scores for 3 plants are the same. 3. The score for each employee must be independent of the scores for any other employees. Intuitively, as H 0 is true, the scores for the employees in 3 plants have the same distributions since they have the same means and variances. Thus, x1 , x2 , x3 use can be considered as 3 possible values of x1 , x2 , x3 as sample values of X X . Furthermore, we can . Then, the variance of X , can be estimated by 3 s X2 Since x i 1 x 2 i 3 1 43, x x1 x 2 x3 3 2 2 n X2 , the estimate of 2 is n 2 X 2 . X2 , ns X2 6 43 258 . ns X2 is referred to as the between-samples estimate of 2 . 2 2 Note: s X is only accurate as H 0 is true. That is , s X is not a good estimate of X2 . As H 0 is not true, s X2 will be larger (overestimate) than X2 . Thus, ns X2 might not be accurate as H 0 is not true. The other estimate of 2 , called the within-samples estimate of 2 , is n1 1s12 n2 1s 22 n3 1s32 n1 1 n2 1 n3 1 5s12 5s 22 5s32 s12 s 22 s32 555 3 34 20 32 28.67 3 . 2 Note: within-samples estimate of is unbiased (accurate) no matter H 0 is true or not. Within-samples estimate of 2 is in fact the pooled estimate of 2. The statistic between - samples estimate of 2 1 f within - samples estimate of 2 1 can be used to test H 0 . Thus, in this example, 3 as H 0 is true as H 0 is not true , ns X2 f 3 n 1s i 1 2 i i 258 9 28.67 nT 3 General Case: Suppose there are K populations. The data are the following Populations Samples 1 x1,1 , x1, 2 , , x1, n1 2 x2 ,1 , x2 , 2 , , x2, n2 k xk ,1 , xk , 2 , , xk , nk Let nT n1 n2 nk x j ,i , i 1,, n j ; j 1, , k : the i’th sample value form population k. nj xj x i 1 nj k x j ,i nj x j 1 i 1 x i 1 j ,i : the overall mean. nT nj s 2j , j 1,, k : the sample mean for population j. xj 2 j ,i nj 1 , j 1,, k : the sample variance for population j. 4 Two estimate of 2 , Mean Square Between (MSB) and Mean Square Within (MSW), can be used. MSB is the between-samples estimate 2 2 of while MSW is the within-samples estimate of . MSB and MSW are n j x j x k MSB 2 j 1 k 1 and n k MSW j 1 j 1s 2j n1 1 n2 1 nk 1 x k nj j 1 i 1 xj 2 j ,i . nT k As H 0 is not true, MSB might not be unbiased (accurate) On the other hand, MSW is an accurate estimate of 2 no matter H 0 is true or not. Thus, between - samples estimate of 2 MSB f within - samples estimate of 2 MSW n x k j 1 j n k j 1 j x 2 j 1s 2j k 1 nT k 5 can be used to test H 0 . MSB f MSW as H 0 is true 1 1 as H 0 is not true Next question: how large f must be to reject H 0 . 6 .