Analysis of Variance ANOVA - 1 Multivariate Studies Observational Study: conditions to which subjects are exposed are not controlled by the investigator. (no attempt is made to control or influence the variables of interest) Statistics Analysis of Variance: Comparing More Than 2 Means ANOVA - 1 Experimental Study: conditions to which subjects are exposed to are controlled by the investigator. (treatments are used in order to observe the response) ANOVA - 2 Experiment Examples of Experiments 1. Investigator Controls One or More Independent Variables Called Treatment Variables or Factors Contain Two or More Levels (Subcategories) 2. Observes Effect on Dependent Variable Response to Levels of Independent Variable 3. Experimental Design: Plan Used to Test Hypotheses ANOVA - 3 1. Thirty Locations Are Randomly Assigned 1 of 4 (Levels) Health Promotion Banners (Independent Variable) to See the Effect on Using Stairs (Dependent Variable). 2. Two Hundred Consumers Are Randomly Assigned 1 of 3 (Levels) Brands of Juice (Independent Variable) to Study Reaction (Dependent Variable). ANOVA - 4 Completely Randomized Design Experimental Designs 1. Experimental Units (Subjects) Are Assigned Randomly to Treatments Experimental Experimental Designs Designs Completely Completely Randomized Randomized Randomized Randomized Block Block Factorial Factorial One-Way One-Way ANOVA ANOVA Randomized Randomized Block Block FF Test Test Two-Way Two-Way ANOVA ANOVA Subjects are Assumed Homogeneous 2. One Factor or Independent Variable 2 or More Treatment Levels or Classifications 3. Analyzed by One-Way ANOVA ANOVA - 5 ANOVA - 6 Analysis of Variance ANOVA - 2 Randomized Design Example One-Way ANOVA F-Test Factor Factor (Training (Training Method Method) Level Level Level Level 11 Level Level Factor Factor levels levels (Treatments) (Treatments) 22 1. Tests the Equality of 2 or More (t) Population Means (µ1=µ2= …=µt ) 2. Variables 33 Experimental Experimental units units Dependent Dependent variable variable 21 21 hrs. hrs. 17 17 hrs. hrs. 31 31 hrs. hrs. 27 27 hrs. hrs. 25 25 hrs. hrs. 28 28 hrs. hrs. (Response) (Response) 29 29 hrs. hrs. 20 20 hrs. hrs. 22 22 hrs. hrs. ANOVA - 7 One Nominal Scaled Independent Variable One Interval or Ratio Scaled Dependent Variable 2 or More (t) Treatment Levels or Classifications 3. Used to Analyze Completely Randomized Experimental Designs ANOVA - 8 One-Way ANOVA F-Test Assumptions One-Way ANOVA F-Test Hypotheses H0: µ1 = µ2 = µ3 = ... = µt 1. Randomness & Independence of Errors Independent Random Samples are Drawn 2. Normality Populations have Equal Variances ANOVA - 9 y µ1 = µ2 = µ3 Ha: Not All µj Are Equal Populations are Normally Distributed 3. Homogeneity of Variance (σ1=σ2= …=σt ) f(y) All Population Means are Equal No Treatment Effect At Least 1 Pop. Mean is Different Treatment Effect µ1 ≠ µ2 ≠ ... ≠ µt Is Wrong f(y) y µ1 = µ 2 µ 3 ANOVA - 10 Why Variances? Why Variances? Example: Hourly wage for three ethnic group CASE I 1 2 Case I 3 1 2 Case II 6.0 CASE II 8 5.8 3 7 5.6 5.01 5.90 6.31 4.52 5.50 5.00 4.42 3.54 6.93 5.89 5.50 4.99 7.51 4.73 4.48 5.91 5.49 4.98 7.89 7.20 5.55 5.88 5.50 5.02 3.78 5.72 3.52 5.90 5.50 5.00 5.90 5.50 5.00 6 CASE2 5.51 5.92 CASE1 5.90 5.4 5 5.2 4 5.0 4.8 0.0 Average ANOVA - 11 3 1.0 2.0 GROUPID ANOVA - 12 3.0 4.0 0.0 1.0 2.0 GROUPID 3.0 4.0 Analysis of Variance ANOVA - 3 One-Way ANOVA Basic Idea Why Variances? Same treatment variation Different treatment variation Different random variation Same random variation A Pop 1 Pop 2 Pop 3 Pop 4 Pop 5 B Pop 1 Pop 2 Pop 6 Variances WITHIN differ ANOVA - 13 Pop 3 Pop 5 Pop 4 Pop 6 Variances AMONG differ Possible to conclude means are equal! 1. Compares 2 Types of Variation to Test Equality of Means 2. Comparison Basis Is Ratio of Variances 3. If Treatment Variation Is Significantly Greater Than Random Variation then Means Are Not Equal 4. Variation Measures Are Obtained by ‘Partitioning’ Total Variation ANOVA - 14 One-Way ANOVA Partitions Total Variation Total variation Variation due to treatment Variation due to random sampling Sum of Squares Among Sum of Squares Between Sum of Squares Treatment Among Groups Variation Sum of Squares Within Sum of Squares Error Within Groups Variation ANOVA - 15 Notations yij : y i⋅ : the j-th element from the i-th treatment y ⋅⋅ : the overall sample mean the i-th treatment mean n T : the total sample size (n1 + n2 + … + nt) ANOVA - 16 Total Variation Treatment Variation TSS = ( y11 − y⋅⋅ )2 + ( y21 − y⋅⋅ )2 + + ( yij − y⋅⋅ )2 t ni SSB = n1( y1⋅ − y⋅⋅ )2 + n2 ( y2⋅ − y⋅⋅ )2 + + nt ( yt⋅ − y⋅⋅ )2 t = ∑ ni ( yi⋅ − y⋅⋅ ) 2 = ∑ ∑ ( yij − y⋅⋅ ) 2 i =1 i =1 j =1 Response, y Response, y y3 y y y1 Group 1 ANOVA - 17 Group 2 Group 3 Group 1 ANOVA - 18 y2 Group 2 Group 3 Analysis of Variance ANOVA - 4 One-Way ANOVA F-Test Test Statistic Random (Error) Variation SSW = ( y11 − y1⋅ )2 + ( y21 − y2⋅ )2 + l + ( ytj − yt⋅ )2 t ni t i =1 j =1 i =1 1. Test Statistic = ∑ ∑ ( yij − yi⋅ ) 2 = ∑ (ni − 1) si2 F = MSB / MSW Response, y y3 2. Degrees of Freedom y2 y1 ν1 = t -1 ν2 = nT - t Group 1 Group 2 Group 3 ANOVA - 19 Source of Degrees Sum of Squares of Variation Freedom Treatment t-1 SSB F Mean Square (Variance) MSB MSB = SSB/(t - 1) MSW Error MSW = SSW/(n SSW/(nT - t) (Between samples) Total t = # Populations, Groups, or Levels nT = Total Sample Size ANOVA - 20 One-Way ANOVA Summary Table (Within samples) MSB Is Mean Square for Treatment MSW Is Mean Square for Error nT - t SSW One-Way ANOVA F-Test Critical Value If means are equal, F = MSB / MSW ≈ 1. Only reject large F ! Reject H0 α Do Not Reject H0 F 0 Fα (t-1, n nT - 1 TSS = SSB+SSW T –t) Always OneOne-Tail! © 19841984-1994 T/Maker Co. ANOVA - 21 ANOVA - 22 One-Way ANOVA F-Test Example As production manager, you want to see if 3 filling machines have different mean filling times. You assign 15 similarly trained & experienced workers, 5 per machine, to the machines. At the .05 level, is there a difference in mean filling times? ANOVA - 23 Mach1 25.40 26.31 24.10 23.74 25.10 Mach2 23.40 21.80 23.50 22.75 21.60 Mach3 20.00 22.20 19.75 20.60 20.40 One-Way ANOVA F-Test Solution H0: µ1 = µ2 = µ3 Ha: Not All Equal α = .05 ν 1 = 2, ν 2 = 12 Critical Value(s): Test Statistic: F= α = .05 0 ANOVA - 24 3.89 F MSB 23.5820 = 25.6 = .9211 MSW Decision: Reject at α = .05 Conclusion: There Is Evidence Pop. Means Are Different Analysis of Variance ANOVA - 5 Summary Table Solution Source of Degrees of Sum of Variation Freedom Squares Treatment (Machines) 3-1=2 Mean F Square (Variance) 47.1640 23.5820 25.60 Error 15 - 3 = 12 11.0532 Total 15 - 1 = 14 58.2172 .9211 From Computer ANOVA - 25 One-Way ANOVA F-Test Thinking Challenge You’re a trainer for Microsoft Corp. Is there a difference in mean learning times of 12 people using 4 different training methods (α α =.05)? M1 M2 M3 M4 10 11 13 18 9 16 8 23 5 9 9 25 Use the following table. ANOVA - 26 Summary Table Solution* One-Way ANOVA F-Test Solution* H0: µ1 = µ2 = µ3 = µ4 v.s. Source of Degrees of Sum of Freedom Squares Variation Treatment (Methods) 4-1=3 348 Ha: Not All Equal Error 12 - 4 = 8 80 α = .05 Total 12 - 1 = 11 428 ANOVA - 27 10 α = .05 4.07 F SPSS Error Bar Chart Test Statistic: 40 MSB 116 = = 116 11.6 MSW 10 p-value = .003 30 F= 95% CI SCORE H0: µ1 = µ2 = µ3 = µ4 Ha: Not All Equal α = .05 ν1=3 ν2=8 Critical Value(s): ANOVA - 29 F Mean Square (Variance) 116 11.6 ANOVA - 28 One-Way ANOVA F-Test Solution* 0 © 1984-1994 T/Maker Co. Decision: Reject at α = .05 20 10 0 -10 Conclusion: There Is Evidence Pop. Means Are Different N= 3 3 3 3 1.00 2.00 3.00 4.00 METHOD ANOVA - 30 Multiple Comparisons Analysis of Variance Linear Model for CRD Let yij be the j-th sample observation from the population i, yij = µ + αi + εij µ : over all mean αi : i-th treatment effect εij : error term, or random variation of yij about µi where µi = µ + αi ANOVA - 31 ANOVA - 6 One-Way ANOVA F-Test Hypotheses H0: µ1 = µ2 = µ3 = ... = µt All Population Means are Equal No Treatment Effect is equivalent to H0: α1 = α2 = α3 = ... = αt ANOVA - 32 Error Term Assumptions For parametric F test, εij’s are independent and normally distributed with constant variance σε2. The normality assumption can be checked by using the estimates (residuals) eij = yij − yi⋅ ANOVA - 33 Equal variances assumption can be verified by using Hartley’s test (very sensitive to normality) or Levine’s Test. Levine’s test can be done by applying yi⋅ ANOVA on zij = yij − ~ ~ where yi⋅ is the sample median of the i-th sample. ANOVA - 34 What if the assumptions are not satisfied? Try a nonparametric method: Kruskal-Wallis Test ANOVA - 35 Error Term Assumptions