ANOVA In the last set of notes we looked at the pooled t-test which is used to compare two group means assuming they had equal variances. Next, we will explore a procedure called Analysis of Variance (ANOVA). The biggest advantage of ANOVA is that it allows you to compare as many group means as you want, whereas the pooled t-test limits you to only two. The ANOVA tests the following hypotheses: H0: Ha: The test helps to determine if the observed differences among the sample means could have happened by chance if the null hypothesis were true. Just like the pooled t-test, there are certain assumptions that must be met in order for the ANOVA to be valid. We will discuss these in more detail later. Assumptions: 1. The groups have the ______________ variances (standard deviations). 2. The response variables for each group are ___________________ distributed. 3. The observations were either __________________ selected from each group’s population, or in the case of designed experiments were randomly assigned to each of the groups. Idea behind ANOVA The ANOVA is a statistical method used to compare group (or treatment) ___________. So, why is it called analysis of variance? It’s because the ANOVA test compares two types of variability. Consider the following figures A, B and C. Each shows data distributions for two groups, where the length of the red box corresponds to the variability within each group and the green line in the middle of the box represents the mean of each group. Figure A Figure B Figure C 1 Questions: 1. What do you notice about the difference between the sample means in Figure A as compared to the difference between sample means in Figure B? 2. What do you notice about the variability within the groups in Figure A as compared to Figure B? 3. Which figure do you think provides more evidence that the group population means differ from each other: Figure A or Figure B? Explain your reasoning. 4. Which figure do you think provides more evidence that the group population means differ from each other: Figure B or Figure C? Explain your reasoning. 2 Your intuition should have led you in this direction: The data in Figure _______ provide stronger evidence that the group means differ than do the data in Figure _______ because the variability within each sample is smaller, even though the variability between groups is the same (i.e., the group means are the same). The data in Figure _______ provides stronger evidence the group means differ than do the data in Figure _______ because the variability between means is larger in this case. As you can see, both types of variability (within group and between groups) play a role in determining whether we have evidence the group means differ. In general, the ANOVA works this way: the larger the variability _________________ groups relative to the variability _______________ groups, the more evidence we have that the group means differ. The test statistic we will use is the F-statistic and is calculated as follows: F= Estimate of variability between groups Estimate of varibility within groups Questions: 5. What does it mean if the F-statistic is small? 6. What does it mean if the F-statistic is large? Example: Consider the example dealing with the coagulation rate of rabbits from the last set of notes. Recall, an experiment was conducted to investigate how two diets (A and B) affect coagulation rate. The data is given below. Diet A B Coagulation Rate Time in Seconds Mean 68, 72 70 56, 60 58 3 Estimating the variability WITHIN groups To calculate this quantity, we will compare each observation to its group mean. The distance between each observation and its group mean is the __________ of interest. Diet A average = __________ Diet B average = __________ Diet A Replicate 1 Time 68 A 2 72 B 1 56 B 2 60 Group Mean Error Error2 The measure the variability within groups we consider the sum of squared error terms. We call this sum _____, or the Error Sum of Squares. SSE = 4 To get our estimate of variability within groups, we divide the SSE by the degrees of freedom (df) for error. Let N = the total number of observations and t = the number of groups. df for error = N – t = When we divide the SSE by the df for error we call this Mean Square Error (_____). MSE = We can also think of this another way…. Because the sample sizes for both diets are equal in this case, the estimate of the variability within groups (the MSE) is simply the average of the two variances: MSE = s12 s 22 = 2 Note that if the sample sizes are different, the estimate of the variability within groups would be a weighted average of the sample variances. In general, MSE = n1 1 s12 n 2 1 s 22 n1 n 2 n t 1 st2 nt t Estimating the variability BETWEEN groups To calculate this quantity, we will compare each observation’s _______________ mean to the overall mean. So, the distance between each observation’s group mean and the overall mean is the deviation of interest when calculating the variability between groups. 5 Diet A average = __________ Diet B average = __________ Overall Average = __________ Diet A Replicate 1 Time 68 A 2 72 B 1 56 B 2 60 Group Mean Overall Mean Deviance2 To measure the variability between groups, we consider the sum of the squared deviances from the above table. We call this __________, or the Treatment Sum of Squares. SSTrt = Once again, to get our estimate of the variability between groups, we divide SSTrt by the df for treatment. df for treatment = t – 1 = Finally, when we divide SSTrt by the df for treatment we call this the Mean Square for Treatment (MSTrt). MSTrt = We can also think of this another way… To measure the variation between groups, we consider the following quantity: n1 y1 y n 2 y2 y 2 MSTrt = 2 n t yt y 2 t 1 Where y i represents the mean of the ith treatment group and y represents the overall mean. 6 Calculating the F-statistic and p-value As mentioned earlier, the F-statistic is calculated as follows: F= Estimate of variability between groups Estimate of varibility within groups When the null hypothesis (the group/treatment means are equal) is true, this test statistic follows the Fdistribution with numerator df = __________ and denominator df = __________. As mentioned earlier, the larger the variability between groups relative to the variability within groups, the more evidence we have that the group means are different. So, the larger the F-statistic, the more evidence we have __________________ the null hypothesis. Recall, the p-value is the probability of obtaining a test statistic at least as extreme as our observed result, given the null hypothesis is true. To find the p-value associated with the F-statistic, we find the probability of obtaining our F-statistic or one that is even larger. We can get the p-value from JMP. 7 Carrying out the ANOVA in JMP. First, you’ll want to enter the data in the following manner: Next, choose Analyze Fit Model. Then enter the following: Click Run. The output appears as follows: 8 Questions: 7. Find the sums of squares in the output: SSTrt = SSE = 8. Find the degrees of freedom in the output: df Error = df Treatment = 9. Find the mean squares in the output: MSTrt = MSE = 10. Find the F-statistic in the output: 11. Find the p-value in the output: 12. What do we conclude from this study? 9