COMPARING TWO OR MORE POPULATION MEANS: INDEPENDENT SAMPLES The two sample t-test procedures which we have discussed in the previous examples can be used to compare the means of two populations. However, these methods will not work if a study is intended to evaluate more than two groups. Next, we will discuss a method which allows us to compare two OR MORE groups using independent samples. Recall the data used in Example 6.3. We used a pooled t-test to determine whether the average birth weight was lower when mothers were classified as smokers. Questions: 1. What is the p-value for the two-tailed pooled t-test? 2. What is the p-value given in the analysis of variance section? For two samples, the two-tailed pooled t-test and the analysis of variance will always give identical results. The difference is that the F-test used in the analysis of variance (ANOVA) can be used to compare more than two populations. pg. 10 The Main Idea Behind an Analysis of Variance When we look for differences in a set of sample means, it doesn’t matter how far apart those sample means are. What matters is how far apart they are relative to the variability of the observations. Consider the following scenarios. Figure A Figure B Questions: 1. What can you say about the centers of the boxplots for each group in Figure A as compared to Figure B? 2. What can you say about the variation within each group in Figure A? 3. What can you say about the variation within each group in Figure B? 4. Which scenario illustrates an example where variation between groups is LARGE compared to variation within groups? Comments: 1. In Figure A, there is a lot of variation within each group. Therefore, the observed differences among the sample means could easily happen by random chance alone. 2. In Figure B, the centers of the boxplots are the same as in Figure A, but there is much less variation within each group. Therefore, it is not likely that any sample from Group C would have a mean as small as the mean from Group A, for example. pg. 11 3. If the variation between groups is large relative to the variation within each group, then we have evidence that something other than random chance is at work. Therefore, our test statistic is as follows: Variation Between Groups F Variation Within Groups Note that this F-statistic takes on values between zero and positive infinity. Questions: 1. What does it mean if the F-statistic is zero? 2. What does it mean if the F-statistic is very large? We will use JMP to carry out the ANOVA. Therefore, JMP will calculate the F-statistic and its associated p-value. The ANOVA procedure for comparing two or more means (assuming independent samples) is outlined below. Example 6.5: Jaffe, Parker and Wilson have investigated the concentration of several hydrophobic organic substances (such as hexachlorobenzene, chlordane, heptachlor, aldrin, dieldrin, endrin) in the Wolf River in Tennessee. Measurements were taken downstream of an abandoned dump site that had previously been used by the pesticide industry to dispose of its waste products. It was expected that these hydrophic substances might have a nonhomogeneous vertical distribution in the river because of differences in density between these compounds and water and because of the adsorption of these compounds on sediments, which could lead to higher concentrations on the bottom. It is important to check this hypothesis because the standard procedure of sampling at six-tenths of the depth could miss the bulk of these pollutants if the distribution were not uniform. Grab samples were taken with a La Motte-Vandorn water sampler of 1 liter capacity at various depths of the river. This sampler consists of a horizontal plexiglas tube of 7 centimeters diameter and a plunger of each side which shuts the sampler when the sampler is at the desired depth. Ten surface, 10 middepth and 10 bottom samples were collected, all within a relatively short period. The data can be found in the file WolfRiver.JMP (a small portion of the data set is shown below). Source: Jaffe, P. R., Parker, F. L., and Wilson, D. J. (1982). Distribution of toxic substances in rivers. Journal of the Environmental Engineering Division, 108, 639-649. pg. 12 In JMP, select Analyze > Fit Y by X. Move Aldrin to the Y, Response box and Depth to the X, Factor box. JMP returns side-by-side dotplots, and we can select Quantiles from the red drop-down arrow next to the variable name to construct side-by-side boxplots. The green lines represent the means for each group. To determine whether there is a statistically significant difference in the Aldrin concentration across depth, we will use a one-way analysis of variance (ANOVA). pg. 13 Question of Interest: Is there a difference in the Aldrin concentration across depth (Bottom, Mid-depth, or Surface)? Step 0: Check the assumptions to be sure that the ANOVA is valid. Make sure that the two groups are independent. Check the normality assumption. Check whether we have any evidence that the group variances differ (the ANOVA assumes equal variances). Step 1: Convert the research question into Ho and Ha. H0: pg. 14 Ha: Step 2: Find the test statistic and p-value from your data. In JMP, select Means/Anova from the red drop-down arrow. JMP returns the following: Step 3: Write a conclusion in the context of the problem. pg. 15