Briand CSBS 320 Notes Caldwell Ch.10 - 1 Ch.10 Analysis of Variance IN-CLASS READING ASSIGNMENT 03/08/07 Definition ANOVA: Analysis Of Variance (one-way ANOVA, i.e. one variable) Extension of the difference of means test since it's based on a comparison of sample means But it involves the comparison of different estimates of population variance -- hence the name ANOVA ANOVA is appropriate for situations involving 3 or more samples and a variable measured at the interval/ratio level of measurement. e.g. educational psychologist wanting to know if students exposed to 3 different treatment conditions or learning environments (positive sanction, negative sanction, sanction neutral) exhibit different test scores. If test scores are based on interval/ratio scale of measurement, ANOVA is appropriate e.g. a geographer is interested in the growth rates of 4 types of cities -- manufacturing centers, government centers, retail centers, and financial centers. e.g. a market researcher wants to determine if there's a significant difference between the response rates to 5 different marketing campaign e.g. a sociologist wants to determine if different types of school personnel (teachers, counselors, and coaches) vary in their abilities to recognize risk factors for youth suicide. e.g. we want to know if scores on an aptitude test actually vary for students in different types of schooling environments -- home schooling, public schooling, and private schooling. Research problems like the ones above could be thought to be approached with t tests on all possible pairs of sample means; problems with that: with e.g. of 4 types of cities, we would need to run 6 t-tests probability of Type I error would be magnified e.g. from Spatz p. 225 Ch. 10. Suppose you have 15 samples that all come from the same population (Because there is just one population, the null hypothesis is clearly true). These 15 sample means will vary from one another as a result of chance factors. Now suppose you calculate every possible t test (all 105 of them), retaining and rejecting each null hypothesis at the .05 level. How many times would you reject the null hypothesis? The answer is about 5. When the null hypothesis is true (as in this example) and is .05, 100 t test will produce about 5 type I errors. Back to the reporting of the experiment. Suppose you conducted a 15 group experiment, ran 105 tests, and found five significant differences. If you then pulled out those 5 and said they were reliable difference (i.e. differences that are not due to chance), you don't understand statistics. You can protect yourself from making such a mistake by using a statistical technique that keeps the overall risk of Type I error at an acceptable level (such as .05 and .01) Briand CSBS 320 Notes Caldwell Ch.10 - 2 Application we’ll be working with: We're interested in urban unemployment and whether or not the unemployment levels in cities vary by region of the country. We've used a random sampling technique to select cities in 4 different regions and we've recorded unemployment levels. Table 10.1 Levels of unemployment by region North 3.8 7.1 9.6 8.4 5.1 11.6 6.2 7.9 9.0 10.3 n North 10 79 X North 7.9 10 South 4.2 6.5 4.4 8.1 7.6 5.8 4.0 7.3 5.2 4.8 nSouth 10 57.9 X South 5.79 10 East 8.8 5.1 12.7 6.4 9.8 6.3 10.2 8.5 11.9 8.6 n East 10 88.3 X East 8.83 10 West 4.8 1.2 8.0 9.4 3.6 8.7 6.5 nW est 7 42.2 X W est 6.03 7 Each sample mean or group mean is simply the average of the unemployment levels in each group. We can also compute an overall mean or grand mean based on all the data in all groups: X all X North X South X East X W est X grand ntotal n North n South n East nW est 3.8 7.1 ....4.2 6.5 ...8.8 5.1 ... 3.6 8.7 ... 79 57.9 88.3 42.2 7.23 37 10 10 10 7 NOTE: (1) The number of cities in each sample or group is not necessarily the same. (2) Because each group has a different number of cases, you can’t just simply take the average of the group means to compute the grand mean: X X South X East X W est grand mean North , where K is the number of groups K (in the example above K = 4) (3) To compute the grand mean, you could compute a weighted average of the group means: n X nSouth X South n East X East nW est X W est grand mean North North n North nSouth n East nW est Briand CSBS 320 Notes Caldwell Ch.10 - 3 The null hypothesis H0 simply states that the means of the regions are equal. H0: 1 2 3 4 Logic of ANOVA We want to look at the variation of scores within each group, i.e. how far away scores from each group deviate from their group mean. We want to compare this variation of scores within groups to the variation of the group means or group mean scores from their own mean – the grand mean. If there is more variation between groups than within groups, then there’s support for the assertion that unemployment levels in cities vary by region. To compare the variation of scores within groups to the variation of scores between group, we compute a Fstatistic, which is the ratio of an estimate of between-groups variance over an estimate of within-groups variance. F ratio estimate of between groups variance estimate of within groups variance If there is more variation between groups than within groups, our F-ratio or F-statistic is large and that gives us ground to reject H0. How large does our F-ratio needs to be for us to reject H0? We qualify our F-ratio as being large by comparing it to a critical F-ratio (sounds familiar?). If F-statistics FC, reject H0. If F-statistics < FC, fail to reject H0. NOTE: F stands for Fisher, after Sir Ronald A. Fisher (1890-1962) who invented ANOVA. Fisher wrote the book on statistics: Statistical Methods for Research Workers first published in 1925. The F-ratio 2 If you recall the definition of variance for a sample: s = X X 2 , you’ll see similarity between s2 and the n 1 estimate of between-groups variance and within-groups variance used in computing the F-ratio. NOTE: the sum of squared deviations, X X , will be referred to, from now on, as sum of squares. 2 Briand CSBS 320 Notes Caldwell Ch.10 - 4 F estimate of between groups variance estimate of within groups variance F MS B MS W where: MSB is the mean square between (or estimate of the between-groups variance) SS MS B B df B where: SSB is the between-groups sum of squares 2 2 2 SS B n1 X 1 X grand n2 X 2 X grand ... nk X k X grand dfB are the between-groups degrees of freedom dfB = K – 1, where K is the number of groups or samples MSW is the mean square within (or estimate of the within-groups variance) SS MS W W df W where: SSW is the within-groups sum of squares 2 2 2 SSW X 1 X 1 X 2 X 2 ... X k X k or dfW are the within-groups degrees of freedom dfW = ntotal – K, where ntotal is the total number of cases across samples dfW = (nNorth - 1) + (nSouth - 1) + (nEast - 1) + (nWest - 1) Briand CSBS 320 Notes Caldwell Ch.10 - 5 Calculating the F-ratio Table 10.1 Levels of unemployment by region North X North X X North (3.8-7.9)2=16.81 (7.1-7.9)2=0.64 (9.6-7.9)2=2.89 (8.4-7.9)2=0.25 (5.1-7.9)2=7.84 (11.6-7.9)2=13.69 (6.2-7.9)2=2.89 (7.9-7.9)2=0 (9.0-7.9)2=1.21 (10.3-7.9)2=5.76 2 North 3.8 7.1 9.6 8.4 5.1 11.6 6.2 7.9 9.0 10.3 X North X North X North 2 = 79 =51.98 n North 10 79 X North 7.9 10 South X North X X North (4.2-5.79)2=2.53 (6.5-5.79)2=0.50 (4.4-5.79)2=1.93 (8.1-5.79)2=5.34 (7.6-5.79)2=3.28 (5.8-5.79)2=0 (4.0-5.79)2=3.20 (7.3-5.79)2=2.28 (5.2-5.79)2=0.35 (4.8-5.79)2=0.98 2 North East X North X X North (8.8-8.83)2=0 (5.1-8.83)2=13.91 (12.7-8.83)2=14.98 (6.4-8.83)2=5.90 (9.8-8.83)2=0.94 (6.3-8.83)2=6.40 (10.2-8.83)2=1.88 (8.5-8.83)2=0.11 (11.9-8.83)2=9.42 (8.6-8.83)2=0.05 2 North 4.2 8.8 6.5 5.1 4.4 12.7 8.1 6.4 7.6 9.8 5.8 6.3 4.0 10.2 7.3 8.5 5.2 11.9 4.8 8.6 2 X North X North X North X North X North X North 2 = 57.9 = 88.3 =20.39 =53.60 nSouth 10 n East 10 57.9 88.3 X South 5.79 X East 8.83 10 10 X grand 7.23 West X North 4.8 1.2 8.0 9.4 3.6 8.7 6.5 X North X X North (4.8-6.03)2=1.51 (1.2-6.03)2=23.32 (8.0-6.03)2=3.89 (9.4-6.03)2=11.37 (3.6-6.03)2=5.90 (8.7-6.03)2=7.14 (6.5-6.03)2=0.22 2 North X X North 2 North = 42.2 =53.33 nW est 7 42.2 X W est 6.03 7 Briand CSBS 320 Notes Caldwell Ch.10 - 6 - SS B n North X North X grand nSouth X South X grand n East X East X grand nW est X W est X grand 2 2 2 2 10 7.9 7.23 10 5.79 7.23 10 8.83 7.23 7 6.03 7.23 60.88 2 2 2 2 dfB = K – 1 = 4 – 1 = 3 MS B => SS B 60.88 20.29 = 3 df B SSW X North X North X South X South X East X East X W est X W est 2 2 2 2 51.98 20.39 53.60 53.33 179.30 dfW = ntotal – K = 37 – 4 = 33 => MS W => F SSW 179.30 5.43 = 33 df W MS B 20.29 3.74 = 5.43 MS W Comparing F to FC, and interpreting results F= 3.74 Look for FC in Appendix D or Appendix E p. 305 and 306 of your textbook. Let = 5% dfB = 3 and dfW = 33 Choose dfW = 30 (lowest degree of freedom available) since there is no dfW = 33. => FC = 2.92 F>FC, reject H0: Our result suggests that levels of unemployment in cities do vary by region. NOTE: ANOVA allows us to determine whether or not there's a significant difference across groups or samples, but it doesn't tell us whether there is a difference between each one of those groups. Briand CSBS 320 Notes Caldwell Ch.10 - 7 Ch.10 Analysis of Variance IN-CLASS READING ASSIGNMENT 03/09/07 As we pointed out earlier, ANOVA allows us to determine whether or not there's a significant difference across groups or samples, but it doesn't tell us whether there is a difference between each one of those groups. In our urban unemployment example, our null hypothesis was that the means of the unemployment levels in cities were equal across regions. Our implicit alternative hypothesis was that at least one of the regions had a different unemployment level than the others. When we rejected H0, we were able to conclude that the levels of unemployment in cities do vary by region. But we were not able to tell which region had a different unemployment level than the others, nor were we able to tell whether one or more regions had a different unemployment level than the others. Tukey’s Honestly Significant Difference (HSD) The Tukey’s Honestly Significant Difference test, henceforth called HSD, is a procedure that allows us to determine between which regions the levels of unemployment differ. NOTE: The HSD test is used ONLY after significant results are found. In other words, it is used only if H0 was rejected; if H0 wasn’t rejected, no additional test is needed. How does the HSD test work? The HSD test is equivalent to doing successive difference of means tests. We compare two sample means at a time, and compute a Q statistics for each comparison or for each pair. In our example of regional unemployment, we have four sample means, and thus we’ll have 6 pair wise comparisons: North-South North-West North-East South-West South-East West-East We compare each Q statistics to a critical Q value to determine whether or not sample means are significantly different for each pair. If Q-statistics QC, the two means are significantly different. If Qstatistics < QC, the two means are not significantly different. Briand CSBS 320 Notes Caldwell Ch.10 - 8 - Q-statistics (1) When all sample sizes are equal: Q X1 X2 , where MS W n X 1 and X 2 are any two means n: # of cases in each sample (2) When all sample sizes are unequal: Q X1 X2 MS W n~ , where X 1 and X 2 are any two means n~ : harmonic mean n~ K 1 1 1 ... n1 n2 nk K: number of samples or groups Calculating the Q-statistics Since our regional unemployment example involves unequal sample sizes, we’ll use equation (2) above. Denominator of Q statistics: n~ K 1 n North 1 nSouth 1 nW est 1 n East MS W 5.43 (previous result) => MSW 5.43 0.60 0.77 ~ n 9.09 = 4 1 1 1 1 10 10 10 7 1 4 9.09 .10 .10 .10 .14 .44 Briand CSBS 320 Notes Caldwell Ch.10 - 9 - Q statistics: North-South comparison: Q North-West comparison: Q North-East comparison: Q South-West comparison: Q South-East comparison: Q West-East comparison: Q X North X South MS W n~ X North X W est MS W n~ X North X East MSW n~ X South X W est MS W n~ X South X East MS W n~ X W est X East MS W n~ 7.9 5.79 2.11 2.74 0.77 0.93 1.21 0.77 1.87 2.43 0.77 0.77 7.9 8.83 0.77 7.9 6.03 0.77 5.79 8.83 0.77 5.79 6.03 0.77 8.83 6.03 0.77 3.04 3.95 0.77 0.24 0.31 0.77 2.80 3.64 0.77 Comparing Q statistics to QC, and interpreting results Look for QC in Appendix F or Appendix G p. 307 and 308 of your textbook. Let = 5% K = 4 and dfW = 33 (previous result) Choose dfW = 30 (lowest degree of freedom available) since there is no dfW = 33. => QC = 3.85 North-South comparison: North-West comparison: North-East comparison: South-West comparison: South-East comparison: West-East comparison: Q 2.74 < QC Q 1.21 < QC Q 2.43 < QC Q 3.95 > QC Q 0.31 < QC Q 3.64 < QC => => => => => => the means are not significantly different the means are not significantly different the means are not significantly different the means are significantly different the means are not significantly different the means are not significantly different Briand CSBS 320 Notes Caldwell Ch.10 - 10 - ANNOUNCEMENT: (1) (2) (3) (4) For practice problems, use end of the chapter ones Monday, March 12: short-exam on Ch.10 OPEN BOOK For any question, please email me at gbriand@ewu.edu HAVE A GOOD WEEKEND!