STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA) Fall 2013 Introduction Suppose we wish to compare k population means ( k 2 ). This situation can arise in two ways. If the study is observational, we are obtaining independently drawn samples from k distinct populations and we wish to compare the population means for some numerical response of interest. If the study is experimental, then we are using a completely randomized design to obtain our data from k distinct treatment groups. In a completely randomized design the experimental units are randomly assigned to one of k treatments and the response value from each unit is obtained. The mean of the numerical response of interest is then compared across the different treatment groups. There two main questions of interest: 1) Are there at least two population means that differ? 2) If so, which population means differ and how much do they differ by? More formally: H o : 1 2 ... k i.e. all the population means are equal and have a common mean ( ) H a : i j for some i j, i.e. at least two population means differ. If we reject the null then we use comparative methods to answer question 2 above. Basic Idea: The test procedure compares the variation in observations between samples to the variation within samples. If the variation between samples is large relative to the variation within samples we are likely to conclude that the population means are not all equal. The diagrams below illustrate this idea... Between Group Variation >> Within Group Variation (Conclude population means differ) Between Group Variation Within Group Variation (Fail to conclude the population means differ) The name analysis of variance gets its name because we are using variation to decide whether the population means differ. 197 STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA) Fall 2013 Computational Details for Equal Sample Size Case ( n1 n2 nk n ) Preliminary notation: Yij jth obs. from population i, i 1,..., k , j 1,..., n n Yi sample mean for the sample from population i , Yi k Y grand mean of all obs. n Yij i 1 j 1 N Y j 1 n ij , i 1,...,k k Y i 1 i k N n1 n2 ... nk , i.e. N is the total sample size for the experiment TWO ESTIMATES OF THE COMMON VARIANCE ( 2 ) Estimate 1: Using Between Group Variation If the null hypothesis is true each of the Yi ' s represents a randomly selected observation from a normal distribution with mean (common mean) and std. deviation = the standard error of the mean, i.e. from the sampling distribution of 𝑌̅~𝑁(𝜇, 𝜎 √𝑛 n , i.e. ). Thus if we find the sample standard deviation of the Yi ' s we get an estimate of n , or in terms of the sample variance we have, k Sample variance of the Yi ' s (Y i 1 i Y ) 2 k 1 is an estimate of 2 n . Therefore if multiply this sample variance by n, k n (Yi Y ) 2 i 1 k 1 we obtain an estimate of 2 , if and only if the null hypothesis is true. However if the alternative hypothesis is true, this estimate of 2 will be too BIG. This formula will be large when there is substantial between group variation. This measure of between group variation is called the Mean Square for Treatments and is denoted MS Treat . 198 STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA) Fall 2013 Estimate 2: Using Within Group Variation Another estimate of the common variance ( 2 ) can be found by looking at the variation of the observations within each of the k treatment groups. By extending the pooledvariance from the two population case we have the following: (n 1) s12 (n2 1) s22 ... (nk 1) sk2 Pooled estimate of 2 1 n1 n2 ... nk k which simplifies to the mean of the k sample variances when the sample sizes are all equal. Another way to write this is as follows: k Pooled estimate of 2 ni (Y i 1 j 1 ij Yi ) 2 N k SS Error MS Error N k This will be an estimate of the common population variance ( 2 ) regardless of whether the null hypothesis is true or not. This is called the Mean Square for Error ( MS Error ) . Thus if the null hypothesis is true we have two estimates of the common variance ( 2 ) , namely the mean square for treatments (MSTreat ) and the Mean Square Error ( MS Error ) . If MSTreat MS Error we reject H o , i.e. the between group variation is large relative to the within group variation. If MS Treat MS Error we fail to reject H o , i.e. the between group variation is NOT large relative to the within group variation. Test Statistic The test statistic compares the mean squares for treatment and error in the form of a ratio, F MS Treat ~ F - distribution with numerator df k 1 and denominator df N - k MS Error Large values for F give small p-values and lead to rejection of the null hypothesis. 199 STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA) Fall 2013 EXAMPLE 13.1 - Weight Gain in Anorexia Patients Data File: Anorexia.JMP available on the course website These data give the pre- and post-weights of patients being treated for anorexia nervosa. There are actually three different treatment plans being used in this study, and we wish to compare their performance. The variables in the data file are: Treatment – Family, Standard, Behavioral prewt - weight at the beginning of treatment postwt - weight at the end of the study period Weight Gain - weight gained (or lost) during treatment (postwt-prewt) We begin our analysis by examining comparative displays for the weight gained across the three treatment methods. To do this select Fit Y by X from the Analyze menu and place the grouping variable, group, in the X box and place the response, Weight Gain, in the Y box and click OK. Here boxplots, mean diamonds, normal quantile plots, and comparison circles have been added. Things to consider from this graphical display: Do there appear to be differences in the mean weight gain? Are the weight changes normally distributed? Is the variation in weight gain equal across therapies? 200 STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA) Fall 2013 Checking the Equality of Variance Assumption To test whether it is reasonable to assume the population variances are equal for these three therapies select UnEqual Variances from the Oneway Analysis pull down-menu. Equality of Variance Test H o : 12 22 k2 H a : the population variances are not all equal We have no evidence to conclude that the variances/standard deviations of the weight gains for the different treatment programs differ (p >> .05). ONE-WAY ANOVA TEST FOR COMPARING THE THERAPY MEANS To test the null hypothesis that the mean weight gain is the same for each of the therapy methods we will perform the standard one-way ANOVA test. To do this in JMP select Means, Anova/t-Test from the Oneway Analysis pull-down menu. The results of the test are shown in the Analysis of Variance box. MS Treat 307.22 MS Error 56.68 The mean square for treatments is 5.42 times larger than the mean square for error! This provides strong evidence against the null hypothesis. The p-value contained in the ANOVA table is .0065, thus we reject the null hypothesis at the .05 level and conclude statistically significant differences in the mean weight gain experienced by patients in the different therapy groups exist. 201 STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA) Fall 2013 MULTIPLE COMPARISONS Because we have concluded that the mean weight gains across treatment method are not all equal it is natural to ask the secondary question: Which means are significantly different from one another? We could consider performing a series of two-sample t-Tests and constructing confidence intervals for independent samples to compare all possible pairs of means, however if the number of treatment groups is large we will almost certainly find two treatment means as being significantly different. Why? Consider a situation where we have k = 7 different treatments that we wish to compare. To compare all possible pairs of means (1 vs. 2, 1 vs. 3, …, 6 vs. 7) k (k 1) 21 two-sample t-Tests. If we used would require performing a total of 2 P(Type I Error) .05 for each test we expect to make 21(.05) 1 Type I Error, i.e. we expect to find one pair of means as being significantly different when in fact they are not. This problem only becomes worse as the number of groups, k, gets larger. Experiment-wise Error Rate (EER) Another way to think about this is to consider the probability of making no Type I Errors when making our pair-wise comparisons. When k = 7 for example, the probability of making no Type I Errors is (.95) 21 .3406 , i.e. the probability that we make at least one Type I Error is therefore .6596 or a 65.96% chance. Certainly this unacceptable! Why would conduct a statistical analysis when you know that you have a 66% of making an error in your conclusions? This probability is called the experiment-wise error rate. Bonferroni Correction There are several different ways to control the experiment-wise error rate (EER). One of the easiest ways to control experiment-wise error rate is use the Bonferroni Correction. If we plan on making m comparisons or conducting m significance tests the Bonferroni Correction is to simply use as our significance level m rather than . This simple correction guarantees that our experiment-wise error rate will be no larger than . This correction implies that our p-values will have to be less than m , rather than , to be considered statistically significant. 202 STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA) Fall 2013 Multiple Comparison Procedures for Pairwise Comparisons of k Pop. Means When performing pair-wise comparison of population means in ANOVA there are several different methods that can be employed. These methods depend on the types of pair-wise comparisons we wish to perform. The different types available in JMP are summarized briefly below: Compare each pair using the usual two-sample t-Test for independent samples. This choice does not provide any experiment-wise error rate protection! (DON’T USE) Compare all pairs using Tukey’s Honest Significant Difference (HSD) approach. This is best choice if you are interested comparing each possible pair of treatments. Compare with the means to the “Best” using Hsu’s method. The best mean can either be the minimum (if smaller is better for the response) or maximum (if bigger is better for the response). Compare each mean to a control group using Dunnett’s method. Compares each treatment mean to a control group only. You must identify the control group in JMP by clicking on an observation in your comparative plot corresponding to the control group before selecting this option. Multiple Comparison Options in JMP 203 STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA) Fall 2013 EXAMPLE 13.1 - Weight Gain in Anorexia Patients (cont’d) For these data we are probably interested in comparing each of the treatments to one another. For this we will use Tukey’s multiple comparison procedure for comparing all pairs of population means. Select Compare Means from the Oneway Analysis menu and highlight the All Pairs, Tukey HSD option. Beside the graph you will now notice there are circles plotted. There is one circle for each group and each circle is centered at the mean for the corresponding group. The size of the circles are inversely proportional to the sample size, thus larger circles will drawn for groups with smaller sample sizes. These circles are called comparison circles and can be used to see which pairs of means are significantly different from each other. To do this, click on the circle for one of the treatments. Notice that the treatment group selected will be appear in the plot window and the circle will become red & bold. The means that are significantly different from the treatment group selected will have circles that are gray. These color differences will also be conveyed in the group labels on the horizontal axis. In the plot below we have selected Standard treatment group. The results of the pairwise comparisons are also contained the output window (shown below). The matrix labeled Comparison for all pairs using Tukey-Kramer HSD identifies pairs of means that are significantly different using positive entries in this matrix. Here we see only treatments 2 and 3 significantly differ. I generally turn this option off as the other methods for displaying significant differences are better. The next table conveys the same information by using different letters to represent populations that have significantly different means. Notice treatments 2 and 3 are not connected by the same letter so they are significantly different. 204 STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA) Fall 2013 Finally the CI’s in the Ordered Differences section give estimates for the differences in the population means. Here we see that mean weight gain for patients in treatment 3 is estimated to be between 2.09 lbs. and 13.34 lbs. larger than the mean weight gain for patients receiving treatment 2 (see highlighted section below). 205 STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA) Fall 2013 Example 13.2: Study of Diet and Growth of Western Rock Lobsters The western rock lobster is recognized as a valued fishery estimated at 300 million dollars a year. Therefore, recent studies have focused on protecting this commodity and increasing the production. In one particular study, a large sample of juvenile lobsters were collected at various locations and held in a tank at the Fisheries Marine Research Lab for two months as an acclimation period. During this period, the lobsters were fed a diet of mussels and prawn pellets. Subsequently, juvenile lobsters were fed for nine weeks on either the fresh mussel diet (D1) or one of four artificial diets: two in semimoist form (D2 and D3) and two in dry-pelleted form (D4 and D5). At the end of the nine weeks, 10 animals were sampled at random from each diet treatment, and their Muscle Somatic Index (MSI) was measured. The MSI is the weight of wet tail muscle expressed as a percentage of the weight of the whole animal, wet. Question(s) of Interest: Is there evidence that lobsters perform as well on the artificial diets as they do on the natural diet? Source: Tsvetneko, E., J. Brown, B.D. Glencross, and L.H. Evans. 1999. Measures of condition in dietary studies on western rock lobster post-pueruli. Proceedings, International Symposium on Lobster Health Management, Adelaide, September 1999. Step 0: Check the assumptions to be sure that the ANOVA is valid. Make sure that the two groups are independent Check the normality assumption. 206 STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA) Fall 2013 Check whether we have any evidence that the group variances differ (the ANOVA assumes equal variances). Step 1: Convert the research question into Ho and Ha. H0: Ha: Step 2: Find the test statistic and p-value from your data. In JMP, select Means/Anova. 207 STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA) Fall 2013 If you question the assumption of equal variances: If you question the assumption of normality: Step 3: Write a conclusion in the context of the problem. 208 STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA) Fall 2013 Multiple Comparisons for the Western Rock Lobster data: The researchers regarded the fresh mussel diet (D1) as a control treatment. Therefore, we should use Dunnett’s test to compare the treatments. Using Dunnett’s test in JMP: Select Compare Means > With Control, Dunnett’s. First, you need to identify which group is the control group. Click on D1, the control diet. 209 STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA) Fall 2013 Questions: 1. Where do the significant differences exist? 2. Do the artificial diets appear to promote or retard body condition (based on the muscle somatic index)? 3. Which, if any, of the artificial diets would you recommend? 210 STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA) Fall 2013 EXAMPLE 13.3 – Butter Fat Content in Cow Milk Data File: Butterfat-cows.JMP This data set comes from the butter fat content for five breeds of dairy cows. The variables in the data file are: Breed – Ayshire, Canadian, Guernsey, Holstein-Fresian, Jersey Age Group – two different age class of cows (1 = younger, 2 = older) Butterfat - % butter fat found in the milk sample We begin our analysis by examining comparative displays for the butter fat content across the five breeds. To do this select Fit Y by X from the Analyze menu and place the grouping variable, Breed, in the X box and place the response, Butterfat, in the Y box and click OK. The resulting plot simply shows a scatter plot of butter fat content versus breed. In many cases there will numerous data points on top of each other in such a display making the plot harder to read. To help alleviate this problem we can stagger or jitter the points a bit from vertical by selecting Jitter Points from the Display Options menu. To help visualize breed differences we could add quantile boxplots, mean diamonds, or mean with standard error bars to the graph. To do any or all of these select the appropriate options from the Display Options menu. The display below has both quantile boxplots, sample mean with error bars, and normal quantile plots added. 211 STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA) Fall 2013 We can clearly see the butter fat content is largest for Jersey and Guernsey cows and lowest for Holstein-Fresian. There also appears to be some difference in the variation of butter fat content as well. The summary statistics on the following page confirm our findings based on the graph above. To obtain these summary statistics select the Quantiles and Means, Std Dev, Std Err options from the Oneway Analysis menu at the top of the window. Summary Statistics for Butter Fat Content by Breed To test the null hypothesis that the butter fat content is the same for each of the breeds we can perform a one-way ANOVA test. H o : Ayshire Canadian ... Jersey H a : at least two breeds have different mean percent butter fat content in their milk Again the assumptions for one-way ANOVA are as follows: 1.) The samples are drawn independently or come from using a completely randomized design. Here we can assume the cows sampled from each breed were independently sampled. 2.) The variable of interest is normally distributed for each population. 3.) The population variances are equal across groups. Here this means: 2 Ayshire 2 Canadian ... 2 Jersey If this assumption is violated we can use Welch’s ANOVA, which allows for inequality of the population variances or we can transform the response (see below). 212 STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA) Fall 2013 To check these assumptions in JMP select UnEqual Variances and Normal Quantile Plot > Actual by Quantile (the 1st option). To conduct the one-way ANOVA test select Means, Anova/t-Test from the Oneway Analysis pull-down menu. The graphical results and the results of the test are shown below. Butter Fat Content by Breed Normality appears to be satisfied 213 STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA) Fall 2013 The equality of variance test results are shown below. We have strong evidence against the equality of the population variances/standard deviations. (Use Welch’s ANOVA) We conclude at least two means differ. Because we have strong evidence against equality of the population variances we could use Welch’s ANOVA to test the equality of the means. This test allows for the population variances/standard deviations to differ when comparing the population means. For Welch’s ANOVA we see that the p-value < .0001, therefore we have strong evidence against the equality of the means and we conclude these dairy breeds differ in the mean butter fat content of their milk. 214 STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA) Fall 2013 Variance Stabilizing Transformations Another approach that is often times used when we encounter situations where there is evidence against the equality of variance assumption is to transform the response. Often times we find that the variation appears to increase as the mean increases. For these data we see that this is the case, the estimated standard deviations increase as the estimated means increase. To correct this we generally consider taking the log of the response. Sometimes the square root and reciprocal are used but these transformations do not allow for any meaningful interpretation in the original scale, whereas the log transform. As we can see, as the samples means for the groups increase so do the sample standard deviations. To stabilize the variances/standard deviations we can try a log transformation. The results of the analysis using the log response are shown below. Analysis of the log(Percent Butter Fat) Content for the Five Dairy Cow Breeds 215 STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA) Fall 2013 Normality is still satisfied, there is no evidence against the equality of the variances, and we have strong evidence that at least two means differ. Because we have concluded that the mean/median log butter fat content differs across breed (p < .0001), it is natural to ask the secondary question: Which breeds have typical values that are significantly different from one another? To do this we again use multiple comparison procedures. In JMP select Compare Means from the Oneway Analysis menu and highlight the All Pairs, Tukey HSD option. The results of Tukey’s HSD are shown below. Below is the plot showing the results when clicking on the circle for Guernsey’s. We can see that the mean log-Butterfat Content for Guernsey’s and Jersey’s do not significantly differ from each other, but they both differ significantly from the 216 STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA) Fall 2013 other three breeds. Here we see only Guernsey and Jersey do not significantly differ. The table using letters conveys the information, using different letters to represent populations that have significantly different means. The CI’s in the Ordered Differences section give estimates for the differences in the population means/medians in the log scale. As was the case with using the log transformation with two populations, we interpret the results in the original scale in terms of ratios of medians. For example, back transforming the interval comparing Jersey to Holstein-Fresian, we estimate that the median butter fat content found in Jersey milk is between 1.33 and 1.55 times larger than the median butter fat content in Holstein-Fresian milk. We could also state this in terms of percentages as follows: we estimate that the typical butter fat content found in Jersey milk is between 33% and 55% higher than the butter fat content found in Holstein-Fresian milk. 217 STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA) Fall 2013 ONE-WAY ANOVA MODEL (FYI only) In statistics we often use models to represent the mechanism that generates the observed response values. Before we introduce the one-way ANOVA model we first must introduce some notation. Let, ni sample size from group i , (i 1,..., k ) Yij the j th response value for i th group , ( j 1,..., ni ) mean common to all groups assuming the null hypothesis is true i shift in the mean due to the fact the observation came from group i ij random error for the j th observatio n from the i th group . Note: The test procedure we use requires that the random errors are normally distributed with mean 0 and variance 2 . Equivalently the test procedure requires that the response is normally distributed with a common standard deviation for all k groups. One-way ANOVA Model: Yij i ij i 1,...,k and j 1,...,ni The null hypothesis equivalent to saying that the i are all 0, and the alternative says that at least one i 0 , i.e. H o : i 0 for all i vs. H a : i 0 for some i . We must decide on the basis of the data whether we evidence against the null. We display this graphically as follows: Estimates of the Model Parameters: ̂ Y ~ the grand mean ˆi Yi Y ~ the estimate ith treatment effect Yˆ ˆ ˆ Y ~ these are called the fitted values. ij i i Note: Yij ˆ ˆi ˆij Yij Y (Yi Y ) (Yij Yi ) ˆij Yij Yˆij Yij Yi ~ these are called the residuals. 218 STAT 305: Chapter 13 – Comparing Several Population Means (One-way ANOVA) Fall 2013 An Alternative Derivation of the Test Statistic Based on the Model Starting with Yij ˆ ˆi ˆij Yij Y (Yi Y ) Yij Yi ) we obtain (Yij Y ) (Yi Y ) (Yij Yi ) After squaring both sides we have, (Yij Y ) 2 (Yi Y ) 2 (Yij Yi ) 2 2(Yi Y )(Yij Yi ) Finally after summing overall of the observations and simplying we obtain. k ni k k ni (Yij Y ) 2 ni (Yi Y ) 2 (Yij Yi ) 2 i 1 j 1 i 1 i 1 j 1 These measures of variation are called “Sum of Squares” and are denoted SS. The above expression can be written SSTotal SSTreat SS Error The degrees of freedom associated with each of these SS’s have the same relationship and are given below: df total = df treatment + df error. (n – 1) = (k – 1) + (n – k) The mean squares (i.e. variance estimates) discussed above are found by taking the sum of squares and dividing by their associated degrees of freedom, i.e. k MS Treat niˆi2 SSTreat 2 which is an estimate of the common pop. variance ( ) only if Ho is true i 1 k 1 k 1 and MS Error SS Error = ˆ 2 which is an estimate of the common population 2 regardless of Ho. nk Thus when testing H o : i 0 for all i. H a : i 0 for some i. we reject Ho when MSTreat >> MSError , i.e. when the estimated treatment effects are large! 219