Statistics for Managers Using Microsoft® Excel 5th Edition Chapter 11 Analysis of Variance Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-1 Learning Objectives After completing this chapter, you should be able to: Recognize situations in which to use analysis of variance (ANOVA) Understand different analysis of variance designs Evaluate assumptions of the model Perform a single-factor ANOVA and interpret the results Conduct and interpret a Tukey-Kramer post-analysis to determine which means are different Analyze two-factor analysis of variance tests Conduct and interpret a Tukey-Kramer post-analysis procedure to determine which factors are different Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-2 General ANOVA Analysis Investigator controls one or more independent variables Called factors or treatment variables One factor contains three or more levels or groups or categories/classifications Other factors contains two or more levels or groups or categories/classifications Experimental design: the plan used to test the hypothesis Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-3 One-Factor ANOVA Also known as Completely Randomized Design and One-way ANOVA Experimental units (subjects) are assigned randomly to treatments Subjects are assumed homogeneous Only one factor or independent variable With three or more treatment levels Analyzed by one-factor analysis of variance Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-4 One-Factor Analysis of Variance Evaluates the difference among the means of three or more groups Examples: Accident rates for 1st, 2nd, and 3rd shift Expected mileage for five brands of tires Assumptions Populations are normally distributed (test with Box plot or Normal Probability Plot) Populations have equal variances (use Levene’s Test for Homogeneity of Variance) Samples are randomly and independently drawn Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-5 Why Analysis of Variance? We could compare the means in pairs using a t test for difference of means Each t test contains Type 1 error The total Type 1 error with k pairs of means is 1- (1 - a) k If there are 5 means and you use a = .05 Must perform 10 comparisons Type I error is 1 – (.95) 10 = .40 40% of the time you will reject the null hypothesis of equal means in favor of the alternative even when the null is true! Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-6 Hypotheses: One-Factor ANOVA H0 : μ1 μ2 μ3 μc All population means are equal i.e., no treatment effect (no variation in means among groups) H1 : Not all of the population means are the same At least one population mean is different i.e., there is a treatment (groups) effect Does not mean that all population means are different (at least one of the means is different from the others) Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-7 Hypotheses: One-Factor ANOVA H0 : μ1 μ2 μ3 μc H1 : Not all μ j are the same All Means are the same: The Null Hypothesis is True (No Group Effect) μ1 μ 2 μ 3 Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-8 Hypotheses: One-Factor ANOVA At least one mean is different: The Null Hypothesis is NOT true (Treatment Effect is present) H0 : μ1 μ 2 μ 3 μ c H1 : Not all μj are the same or μ1 μ2 μ3 Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. μ1 μ2 μ3 Chap 11-9 One-Factor ANOVA Table Source of Variation df SS MS (Variance) P-value Between Groups c-1 SSA MSA P(X=F) Within Groups n-c SSW MSW Total n-1 SST = SSA+SSW F-Ratio F MSA MSW c = number of groups n = sum of the sample sizes from all groups df = degrees of freedom Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-10 One-Factor ANOVA Test Statistic H0: μ1= μ2 = … = μc H1: At least two population means are different Test statistic MSA F MSW MSA is mean squares among variances MSW is mean squares within variances Degrees of freedom df1 = c – 1 df2 = n – c (c = number of groups) (n = sum of all sample sizes) Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-11 One-Factor ANOVA Test Statistic The F statistic is the ratio of the among variance to the within variance The ratio must always be positive df1 = c -1 will typically be small df2 = n - c will typically be large Decision Rule: Reject H0 if F > FU, otherwise do not reject H0 a = .05 0 Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Do not reject H0 Reject H0 FU Chap 11-12 One-Factor ANOVA F Test Example You want to see if three different golf clubs yield different distances. You randomly select five measurements from trials on an automated driving machine for each club. At the .05 significance level, is there a difference in mean driving distance? Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Club 1 254 263 241 237 251 Club 2 234 218 235 227 216 Club 3 200 222 197 206 204 Chap 11-13 One-Way ANOVA Example Distance 270 Club 1 254 263 241 237 251 Club 2 234 218 235 227 216 Club 3 200 222 197 206 204 x1 249.2 x 2 226.0 x 3 205.8 x 227.0 Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. 260 250 240 230 220 • •• X 1 • • •• • •• X2 210 •• •• 200 190 • 1 2 Club X X3 3 Chap 11-14 ANOVA -- Single Factor: Excel Output EXCEL: Tools | Data Analysis | ANOVA: Single Factor SUMMARY Groups Count Sum Average Variance Club 1 5 1246 249.2 108.2 Club 2 5 1130 226 77.5 Club 3 5 1029 205.8 94.2 ANOVA Source of Variation SS df MS F P-value 25.275 4.99E-05 Between Groups 4716.4 2 2358.2 Within Groups 1119.6 12 93.3 Total 5836.0 14 Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. F crit 3.89 Chap 11-15 One-Factor ANOVA Example Solution H0: μ1 = μ2 = μ3 H1: μi not all equal a = .05 p-value: 4.99E-05 Decision: Reject H0 at a = 0.05 Conclusion: There is evidence that at least one μi differs from the rest Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-16 The Tukey-Kramer Procedure Tells which population means are significantly different e.g.: μ1 = μ2 ≠ μ3 Done after rejection of equal means in ANOVA Allows pair-wise comparisons Compare absolute mean differences with critical range μ1= μ2 Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. μ3 x Chap 11-17 Tukey-Kramer Critical Range Critical Range QU MSW 1 1 2 n j n j' where: QU = Value from Studentized Range Distribution with c and n - c degrees of freedom for the desired level of a (see appendix E.9 table) MSW = Mean Square Within nj and nj’ = Sample sizes from groups j and j’ Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-18 The Tukey-Kramer Procedure: Example Club 1 254 263 241 237 251 Club 2 234 218 235 227 216 Club 3 200 222 197 206 204 1. PhStat computes the absolute mean differences: x1 x 2 249.2 226.0 23.2 x1 x 3 249.2 205.8 43.4 x 2 x 3 226.0 205.8 20.2 2. You find a QU value from the table in appendix E.9 with c = 3 (across the table) and n – c = 15 – 3 = 12 degrees of freedom (down the table) for the desired level of a (a = .05 used here): QU 3.77 Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-19 The Tukey-Kramer Procedure: Example (continued) 3. PhStat computes the Critical Range: Critical Range QU MSW 1 1 93.3 1 1 3.77 16.285 2 n j n j' 2 5 5 4. Compare: 5. All of the absolute mean differences are greater than critical range. Therefore there is a significant difference between each pair of means at 5% level of significance. x1 x 2 23.2 x1 x 3 43.4 x 2 x 3 20.2 PhStat does all the calculations for you but you must input the Q value Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-20 Tukey-Kramer in PHStat Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-21 ANOVA Assumptions Levene’s Test Tests the assumption that the variances of each group are equal. First, define the null and alternative hypotheses: H0: σ21 = σ22 = …=σ2c H1: Not all σ2j are equal Second, compute the absolute value of the difference between each value and the median of each group. Third, perform a one-way ANOVA on these absolute differences. Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-22 Two-Factor ANOVA Examines the effect of Two factors of interest on the dependent variable e.g., Percent carbonation and line speed on soft drink bottling process Interaction between the different levels of these two factors e.g., Does the effect of one particular carbonation level depend on which level the line speed is set? Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-23 Two-Factor ANOVA Assumptions Populations are normally distributed Populations have equal variances Independent random samples are selected Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-24 Two-Factor ANOVA Sources of Variation Two Factors of interest: A and B r = number of levels of factor A c = number of levels of factor B n/ = number of replications for each cell n = total number of observations in all cells (n = rcn/) Xijk = value of the kth observation of level i of factor A and level j of factor B Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-25 Two-Factor ANOVA Summary Table With Replication Source of Variation Degrees of Freedom Sum of Squares Mean Squares F Statistic p-value Sample Factor A (Row) r–1 SSA MSA = SSA/(r – 1) MSA/ MSE f (FA) Columns Factor B c–1 SSB MSB = SSB/(c – 1) MSB/ MSE f (FB) SSAB MSAB = SSAB/ [(r – 1)(c – 1)] MSAB/ MSE f (FA&B) SSE MSE = SSE/[rc (n’ – 1)] Interaction (r – 1)(c – 1) (AB) Within Error Total rc (n’ – 1) rc n’ – 1 SST Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-26 Two-Factor ANOVA With Replication As production manager, you want to see if 3 filling machines have different mean filling times when used with 5 types of boxes. At the .05 level, is there a difference in machines, in boxes? Is there an interaction? Box Machine1 Machine2 Machine3 1 25.40 23.40 20.00 26.40 24.40 21.00 2 26.31 21.80 22.20 25.90 23.00 22.00 3 24.10 23.50 19.75 24.40 22.40 19.00 4 23.74 22.75 20.60 25.40 23.40 20.00 5 25.10 21.60 20.40 26.20 22.90 21.90 Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-28 Summary Table Source of Variation Sample (Boxes) Columns (Machines) Degrees of Sum of Mean Freedom Squares Square F P-Value 5-1=4 7.4714 1.8678 3.6868 3-1=2 106.298 53.149 104.908 1.52E-09 Interaction (5-1)(3-1) = 8 9.7032 1.2129 7.5994 .5066 Within (Error) 5·3·(2-1)=15 Total 3·5·2 -1 = 29 131.0720 Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. 2.3941 .0277 .0690 Chap 11-29 The Tukey-Kramer Procedure: With Replication Factor A critical range Qr ,rc( n ' 1) MSW cn' 1. No Macro - compute by formula only 2. MSW (Within) from ANOVA Printout 3. Q from Table E.9 in the Book page 860 alpha = .05 or .01 r is the number of levels of factor A (across table) rc(n’-1) (down the table) c is the number of levels of factor B n’ is the number of replications Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-30 The Tukey-Kramer Procedure: With Replication Factor B MSW critical range Qc ,rc( n ' 1) rn ' 1. No Macro compute by formula only 2. MSW (Within) from ANOVA Printout 3. Q from Table E.9 in the Book page 860 alpha = .05 or .01 c is the number of levels of factor B (across table) rc(n’-1) (down the table) r is the number of levels of factor A n’ is the number of replications Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-31 Chapter Summary Described one-factor analysis of variance ANOVA assumptions ANOVA test for difference in c means The Tukey-Kramer procedure for multiple comparisons Described two-factor analysis of variance Examined effects of multiple factors Examined interaction between factors for the model with replicated observations The Tukey-Kramer procedure for multiple comparisons for both factor A and factor B for the model with replication Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc. Chap 11-32