• • • The first and foremost question that we ask is whether the populations have different means? • Chapter 8 deals with inferences about means μ1, μ2, . . . , μt from t > 2 populations. • In Chapter 6 we studied inferences about the difference between two population means μ1 − μ2 based on a random sample from each population. • In Chapter 5 we studied inferences about a population mean μ using a single sample from the population. • Inference about more than Two Population Central Values s2p (ni − 1)s2i + (nj − 1)s2j = ni + nj − 2 P (|tij | > tα/2 | when H0 is true) = α. 3 Suppose each test is made at the same level α. Thus for each pair (i, j) ȳi − ȳj tij = , sp n1i + n1j To test each H0 : μi − μj = 0, a t-statistic used is of the form μ1 − μ2 , μ1 − μ3 , μ2 − μ3 . All possible differences μi − μj are: 1 If all pairwise tests of means fail to reject the null hypotheses, then one may conclude that all means μi are equal. This procedure has a serious flaw which will now be described. Take the case t = 3, so that we have three populations with means μ1, μ2, μ3. A sample from each population yields sample means and variances ȳ1, ȳ2, ȳ3, s21, s22, s23. • • • • Now let us ask the question – What is the probability of making one or more Type I errors when we make all three tests? ( We shall call this the overall error rate.) Theory says that if the three t random variables were statistically independent then the overall error rate is 1 − (1 − α)3 [this works out to 0.14 if α = 0.05, much larger than .05] The three test statistics are not independent. For example t12 and t13 both involve ȳ1. Furthermore, all three have the same denominator. So they are not statistically independent. • • • 4 In other words, the Type I error probability for each test is α. We shall call α the per-comparison error rate. • 2 A natural approach to think of when faced with the task of determining whether evidence points to μ1 = μ2 = μ3 = · · · = μt or not, is to draw on knowledge acquired in Chapter 6 and perform t-tests of H0 : μi = μj for all pairs of means. • This is the flaw in the multiple testing approach to testing H0 : μ 1 = μ 2 = · · · = μ t . One solution is to use a very small α for each test in order to have a reasonably small overall error rate. For example, in the previous situation, if we had α = .01 then the overall error rate would be less than .03. • • • A simplistic view is that an experiment is a planned way to observe the effect of treatments that are applied to experimental units. Examples • • 7 (a) Different amounts of a headache drug (treatments) are given to people with headache (experimental units) to observe the effect. We can, at this point, begin talking in terms of experiments and their outcome. Recall that an experiment is a planned data collection activity. • The Analysis of Variance and the F -test In general, for c tests made at level α, the overall error rate is larger than 1 − (1 − α)c. • 5 Thus the overall error rate is not exactly 0.14 for our three test situation, and computing its exact value is difficult in theory. • The alternative to making multiple t-tests is to make a single F -test of H0 : μ1 = μ2 = · · · = μt versus Ha : at least one μj is different in value. This is called the analysis of variance F -test. • • We imagine applying each treatment i to all elements in the parent population to obtain a treated population, say Ti. We will assume the treated population Ti has mean μi, possibly different than μ, but its variance is σ2, i.e., the treatment doesn’t affect the variance among experimental units, σ2. • • 8 In this context we can imagine theoretically a parent population of experimental units whose measure of interest has mean μ and variance σ2. • (b) Different furnace temperatures (treatments) are used to temper steel rods (experimental units) to see the extent of tempering for each temperature. 6 We would like to avoid doing this if possible, because we will lose power by using small α for each test. • • - x T1 T2 s s s s s s s T3 s s s s s s - x This expectation gets larger than σ2 as the μi’s get further apart in value. Thus the ratio: • • Mean Square between treatment groups Mean Square within treatment groups Note that the expected value of the mean square among treatment groups is σ2 when μ1 = μ2 = · · · = μt. • 12 T3 μ1 = μ2 = μ3 s s s s s s s s This is the motivation for analyzing variances when looking for differences in population means. • 11 T2 s s s s s s 6 has expected (average) value 1 when μ1 = μ2 = · · · = μt and greater than one otherwise. T1 s s s s s s s s y If some of the μj are different, then the variance in the overall sample will be inflated due to differences in population means, while the within sample variances would remain the same for each sample. • 10 If this were the case, the variation among n samples taken from each treated population T1, T2, . . . , Tt (called the within sample variance) would be the same as the variation among all nt sample values. If all of the treatments have the population mean μ then μ1 = μ2 = · · · = μt = μ. • • Each of these populations has mean μi and variance σ2, i = 1, 2, . . . , t. • This suggests that to investigate whether μ1 = μ2 = · · · = μt we may wish to compare the observed variation within samples from the populations Ti to the variation among (or between) samples. μ3 μ1 μ2 s s s s s s s We can, equivalently, characterize this design as one that acquires a simple random sample of size n from each of t different treated populations T1, T2, . . . , Tt. • 6 Here we have (say) t treatments and (say) nt experimental units. These experimental units are divided into t sets of n and the sets are assigned at random, each to a treatment. • y The simplest experimental design is named the completely randomized design (CRD). • 9 In the past we have talked about populations without worrying about how they might have materialized. Here we are simply saying that we can imagine some populations as arising through experimentation. • When the μi are not all equal, the distribution of this ratio will be shifted to the right. Thus the observed value of the above ratio will be larger than the percentile value from the F-table at a specified level α. • • • Then the above ratio is an F random variable whenever μ1 = μ2 = · · · = μt . • ȳ1. ȳ2. .. ȳt. 15 Let nT = n1 + n2 + · · · nt denote total number of observations. Treatment 1: y11, y12, . . . , y1n1 Treatment 2: y21, y22, . . . , y2n2 .. .. Treatment t: yt1, yt2, . . . , ytnt Data: (Note: the sample sizes in the t treatment groups need not be the same.) The Analysis of Variance (AOV) Table for a CRD The solution to this problem is to assume the treated populations are Normally distributed. • 13 It will be difficult for us to decide how much larger than 1 the ratio must be before we declare that at least one μi is different from the others. • • • • • • • df − ȳ..) MSE SSW where = j 2 2 2 SSW/(nT − t)≡ MSE 2 F 2 MSB/MSE 16 (yij − ȳi.) = (n1 − 1)s1 + (n2 − 1)s2 + · · · + (nt − 1)st 2 SSB/(t − 1) MS SSW/[(n1 − 1) + (n2 − 1) + · · · + (nt − 1)] = s i ni t j 2 (yij − ȳ..) nT − 1 Total i SSW= i=1 j=1 2 (yij − ȳi.) ni(ȳi. i=1 ni t SS nT − t SSB= t Error = Treatment t − 1 Source The AOV table in this case is: 14 Square Between Groups using the statistic F = Mean Mean Square Within Groups Since our test involves essentially all mean values ȳ’s, the central limit theorem tells us that we will have approximately an F random variable if the treated populations are not dramatically different than normal populations. The F-test is carried out by constructing an analysis of variance table discussed below. Section 8.2 in the text gives details about computation of a relevant analysis of variance table for this case. Here we summarize the results. H0 : μ1 = μ2 = · · · μt vs. Ha : at least one μi different, Thus we may test the hypothesis • • Variety 1 2 3 Phosphorous .35 .40 .58 .65 .70 .90 .60 .80 .75 Content .50 .47 .84 .79 .73 .66 Sample Size 5 5 5 Sample Mean 0.460 0.776 0.798 Sample Variance .00795 .01033 .00617 19 Data: A horticulturist was investigating the phosphorous content of tree leaves from three different varieties of apple trees(1,2, and 3). Random samples of five leaves from each of the three varieties were analyzed for phosphorous content. The data are: Example 8.1(Old Edition): 17 V ar(ij ) = σ2, and the ij ’s are all Here αi is the effect due to treatment i. where E(ij ) = 0, independent. yij = μ + αi + ij , i = 1, 2, . . . , t; j = 1, 2, . . . , n M.S. .138 .008 F 17.25 20 It appears that the mean phosphorus content for Variety 1 is smaller than those for Varieties 2 and 3. Techniques for testing such hypotheses will be discussed in the Chapter 9. S.S. .277 .0978 .3748 • d.f. 2 12 14 Since the computed F value exceeds F.05,2,12 = 3.89, the null hypothesis that the mean phosphorus content for the three varieties are all equal is rejected. Source of Variation Variety Error Total Analysis of Variance Table: • • 18 We may talk of treatment means or treatment effects in the single treatment factor CRD experiments irrespective of sample size. Model for data from a CRD • • If none of the treatments has an effect then α1, α2, · · · αt are all zero which is equivalent to μ1 = μ2 = · · · = μt = μ for some μ. • MSE is the pooled estimator of the population variance, σ2. • The model of the CRD which we will use is: Thus the treated population means are μi = μ + αi in this notation. • SSW is simply the familiar pooled SS that we saw for the 2 sample case back in Chapter 6, extended to the t > 2 case. • i = 1, 2, . . . , t; j = 1, 2, . . . , ni When heterogeneity appears to be present, a variance stabilizing transformation can sometimes be found. Such are discussed in Section 8.5 along with Hartley’s test. Finding a useful variance stabilizing transformation is not, in general, easy. • • • 23 Hartley’s test is very sensitive to departures from normality. An alternative test, Levine’s test was also discussed in Chapter 7. • 21 When sample sizes are nearly equal, heterogeneity of variance is not as great a problem unless variances are severely different. Such cases are usually detectable by simply looking at the sample variances. H0 : σ12 = σ22 = · · · = σt2 vs. Ha : not all σi2s are equal A problem occurs if there is textcolormagentaheterogeneity of variance. where ij ’s are independent, normally distributed and for each i the population parameters are (μi, σ2). yij = μ + αi + j To summarize, our model for the multiple population case is (possibly unequal sample size) • • • Checking the Equal Variance Assumption • • • • • 22 24 If an approximate relationship between σ 2 and μ is found by examining the sample means ȳi’s and the corresponding sample variances Si2, then it is possible to determine an appropriate transformation using theoretical arguments. For example, if this relationship is of the form σ 2 = kμ √ for some constant k, then the transformations yT = y is suggested. Or, if the relationship is of the form σ 2 = kμ2 for some constant k, then the transformation yT = log(y + 1) is recommended. (See Table 8.15 for other possibilities). Then the transformed data are analyzed using the usual method. See Examples 8.4, 8.5, 8.6. If sample sizes are not equal use the largest ni. The test statistic presumes equal sample sizes. Recall that Table 12 gives critical values, for α = 0.05 and α = 0.01 for df = n − 1. • max(S2i) min(S2i) This uses the largest and smallest of the population sample variances. Fmax = A test of this hypothesis, proposed by Hartley (1940) was discussed in Chapter 7. The test is the F -max test • •