ANOVA (Analysis of Variance) Martina Litschmannová martina.litschmannova@vsb.cz K210 The basic ANOVA situation ๏ง Two variables: 1 Categorical, 1 Quantitative ๏ง Main Question: Do the (means of) the quantitative variables depend on which group (given by categorical variable) the individual is in? ๏ง If categorical variable has only 2 values: ๏ง null hypothesis: ๐1 − ๐2 = ๐ท ๏ง ANOVA allows for 3 or more groups An example ANOVA situation ๏ง Subjects: 25 patients with blisters ๏ง Treatments: Treatment A, Treatment B, Placebo ๏ง Measurement: # of days until blisters heal ๏ง Data [and means]: A: 5,6,6,7,7,8,9,10 B: 7,7,8,9,9,10,10,11 P: 7,9,9,10,10,10,11,12,13 ๏ง Are these differences significant? [7.25] [8.875] [10.11] Informal Investigation ๏ง Graphical investigation: ๏ง side-by-side box plots ๏ง multiple histograms ๏ง Whether the differences between the groups are significant depends on ๏ง the difference in the means ๏ง the standard deviations of each group ๏ง the sample sizes ๏ง ANOVA determines p-value from the F statistic Side by Side Boxplots 13 12 11 days 10 9 8 7 6 5 A B treatment P What does ANOVA do? At its simplest (there are extensions) ANOVA tests the following hypotheses: H0: The means of all the groups are equal. HA: Not all the means are equal. • Doesn’t say how or which ones differ. • Can follow up with “multiple comparisons”. Note: We usually refer to the sub-populations as “groups” when doing ANOVA. Assumptions of ANOVA ๏ง each group is approximately normal ๏ผ check this by looking at histograms and/or normal quantile plots, or use assumptions ๏ผ can handle some nonnormality, but not severe outliers ๏ผ test of normality ๏ง standard deviations of each group are approximately equal ๏ผ rule of thumb: ratio of largest to smallest sample st. dev. must be less than 2:1 ๏ผ test of homoscedasticity Normality Check We should check for normality using: • assumptions about population • histograms for each group • normal quantile plot for each group • test of normality (Shapiro-Wilk, Liliefors, AndersonDarling test, …) With such small data sets, there really isn’t a really good way to check normality from data, but we make the common assumption that physical measurements of people tend to be normally distributed. Shapiro-Wilk test ๏ง One of the strongest tests of normality. [Shapiro, Wilk] Online computer applet (Simon Ditto, 2009) for this test can be found here. Standard Deviation Check Variable days treatment A B P N 8 8 9 Mean 7.250 8.875 10.111 Median 7.000 9.000 10.000 StDev 1.669 1.458 1.764 Compare largest and smallest standard deviations: ๏ง largest: 1,764 ๏ง smallest: 1,458 ๏ง 1,764/1,458=1,210<2 OK Note: Std. dev. ratio greather then 2 signs heteroscedasticity. ANOVA Notation 1 2 … k ๐11 ๐12 … ๐1๐ โฎ โฎ โฎ ๐๐1 1 ๐๐22 ๐๐๐ ๐ Sample Size ๐1 ๐2 … ๐๐ Sample average ๐1 ๐2 … ๐๐ Sample Std. Deviation ๐ 1 ๐ 2 … ๐ ๐ Sample Group Number of Individuals all together : Sample means: Grand mean: Sample Standard Deviations: ๐ ๐=1 ๐๐ , 1 ๐๐ ๐ , ๐๐ ๐=1 ๐๐ 1 ๐ ๐๐ ๐= ๐ , ๐ ๐=1 ๐=1 ๐๐ 1 ๐๐ ๐ ๐2 = ๐ − ๐๐ −1 ๐=1 ๐๐ ๐= ๐๐ = ๐๐ 2 . Levene Test Null and alternative hypothesis: H0: ๐12 = ๐22 = โฏ = ๐๐2 , HA: ๐ป0 Test Statistic: ๐น๐ = where ๐๐๐ = ๐๐๐ − ๐๐ , ๐๐ = ๐๐๐๐ต = ๐ ๐=1 ๐๐ 2 ๐๐๐๐ต ๐−1 ๐๐๐๐ ๐−๐ ๐๐ ๐ ๐=1 ๐๐ , ๐๐ ๐๐ − ๐ , ๐๐๐๐ = ,๐ = ๐ ๐=1 ๐๐ ๐๐๐ ๐=1 ๐ , 2 ๐๐ ๐=1 ๐๐๐ − ๐๐ . ๐ ๐=1 p-value: ๐−๐ฃ๐๐๐ข๐ = 1 − ๐น0 ๐ฅ๐๐ต๐ , where ๐น0 ๐ฅ is CDF of Fisher-Snedecor distribution with ๐ − 1, ๐ − ๐ degrees of freedom. How ANOVA works (outline) ANOVA measures two sources of variation in the data and compares their relative sizes. How ANOVA works (outline) ๏ง Sum of Squares between Groups, 2 ๐๐๐ต = ๐๐=1 ๐๐ ๐๐ − ๐ , resp. Mean of Squares – between groups ๐๐๐ต = ๐๐๐ต , ๐−1 Difference between Means where ๐ − 1 is degrees of freedom ๐๐๐ต . ๏ง Sum of Squares – errors ๐๐๐ = ๐ ๐=1 ๐๐ ๐=1 ๐๐๐ − ๐๐ 2 = resp. Mean of squares - error ๐๐๐ = ๐๐๐ , ๐−๐ where ๐ − ๐ is degrees of freedom ๐๐๐ . ๐ ๐=1 ๐๐ − 1 ๐ ๐2 , Difference within Groups The ANOVA F-statistic is a ratio of the Between Group Variaton divided by the Within Group Variation: ๐๐๐ต ๐น= ๐๐๐ A large F is evidence against H0, since it indicates that there is more difference between groups than within groups. ANOVA Output Source of Variation Between Groups Within Groups Total SS 34,74 59,26 94,00 DF 2 22 24 MS 17,37 2,69 F 6,45 p-value 0,006 How are these computations made? Source of Variation Degrees of Freedom Sum of Squares Mean of Squares ๐ Between Groups Within Groups ๐๐๐ต = ๐๐ ๐๐ − ๐ ๐๐๐ = ๐๐ − 1 ๐ ๐2 ๐=1 ๐ ๐๐ ๐๐๐ − ๐ ๐=1 ๐=1 2 ๐−๐ฃ๐๐๐ข๐ 1 − ๐น0 ๐ฅ๐๐ต๐ ๐๐๐ต = ๐ − 1 ๐๐๐ต = ๐๐๐ต ๐๐๐ต ๐๐๐ต ๐๐๐ ๐๐๐ = ๐ − ๐ ๐๐๐ = ๐๐๐ ๐๐๐ --- --- ๐๐๐ = ๐ − 1 --- --- --- ๐=1 ๐ ๐๐๐ = Total 2 ๐น Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI. Count Average Variance -----------------------------------------------------------------------------<35 let 53 25,0796 10,3825 35 - 50 let 123 25,9492 16,2775 >50 let 76 26,0982 12,3393 ------------------------------------------------------------------------------Total 252 25,8113 13,8971 58 BMI 48 38 28 18 ménฤ neลพ 35 let od 35 do 50 let více neลพ 50 let Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI. Assumptions: 1. Normality 2. Homoskedasticita H0: ๐12 = ๐22 = ๐32 , HA: ๐ป0 , ๐ − ๐ฃ๐๐๐ข๐ = 0,129 (Levene test) Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI. Null and alternative hypothesis: H0: ๐1 = ๐2 = ๐3 , HA: ๐ป0 Calculating of p-value: Count Average Variance -----------------------------------------------------------------------------<35 let 53 25,0796 10,3825 35 - 50 let 123 25,9492 16,2775 >50 let 76 26,0982 12,3393 ------------------------------------------------------------------------------Total 252 25,8113 13,8971 ๐๐๐ต = ๐ ๐=1 ๐๐ ๐๐๐ = ๐ ๐=1 ๐๐ − ๐ 2 = 53 โ 25,1 − 25,8 2 + 123 โ 25,9 − 25,8 +76 โ 26,1 − 25,8 2 =34,0 2 ๐๐ − 1 ๐ ๐2 = 52 โ 10,4 + 122 โ 16,3 + 75 โ 12,3 = 3451,9 + Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI. Null and alternative hypothesis: H0: ๐1 = ๐2 = ๐3 , HA: ๐ป0 Calculating of p-value: Source of Variation Degrees of Freedom Sum of Squares Mean of Squares ๐ Between Groups Within Groups ๐๐๐ต = ๐๐ ๐๐ − ๐ ๐๐๐ = ๐๐ − 1 ๐ ๐2 ๐=1 ๐ ๐๐ ๐๐๐ − ๐ ๐=1 ๐=1 2 ๐−๐ฃ๐๐๐ข๐ 1 − ๐น0 ๐ฅ๐๐ต๐ ๐๐๐ต = ๐ − 1 ๐๐๐ต = ๐๐๐ต ๐๐๐ต ๐๐๐ต ๐๐๐ ๐๐๐ = ๐ − ๐ ๐๐๐ = ๐๐๐ ๐๐๐ --- --- ๐๐๐ = ๐ − 1 --- --- --- ๐=1 ๐ ๐๐๐ = Total 2 ๐น Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI. Null and alternative hypothesis: H0: ๐1 = ๐2 = ๐3 , HA: ๐ป0 Calculating of p-value: Source of Variation Between Groups Within Groups Sum of Squares Degrees of Freedom 34,0 ๐๐๐ต = ๐ − 1 3451,9 ๐๐๐ = ๐ − ๐ ๐ Total ๐๐ ๐๐๐ = ๐๐๐ − ๐ ๐=1 ๐=1 2 ๐๐๐ = ๐ − 1 Mean of Squares ๐๐๐ต ๐๐๐ต = ๐๐๐ต ๐๐๐ ๐๐๐ = ๐๐๐ ๐น ๐−๐ฃ๐๐๐ข๐ ๐๐๐ต ๐๐๐ 1 − ๐น0 ๐ฅ๐๐ต๐ --- --- --- --- --- Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI. Null and alternative hypothesis: H0: ๐1 = ๐2 = ๐3 , HA: ๐ป0 Calculating of p-value: Source of Variation Between Groups Within Groups Total Sum of Squares Degrees of Freedom 34,0 ๐๐๐ต = ๐ − 1 3451,9 ๐๐๐ = ๐ − ๐ 3485,9 ๐๐๐ = ๐ − 1 Mean of Squares ๐๐๐ต ๐๐๐ต = ๐๐๐ต ๐๐๐ ๐๐๐ = ๐๐๐ --- ๐น ๐−๐ฃ๐๐๐ข๐ ๐๐๐ต ๐๐๐ 1 − ๐น0 ๐ฅ๐๐ต๐ --- --- --- --- Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI. Null and alternative hypothesis: H0: ๐1 = ๐2 = ๐3 , HA: ๐ป0 Calculating of p-value: Source of Variation Between Groups Within Groups Total Sum of Squares Degrees of Freedom 34,0 ๐๐๐ต = ๐ − 1 3451,9 ๐๐๐ = ๐ − ๐ 3485,9 ๐๐๐ = ๐ − 1 k … number of sanmples ๐ = 3 n … total sample si๐ง๐ ๐ = 252 Mean of Squares ๐๐๐ต ๐๐๐ต = ๐๐๐ต ๐๐๐ ๐๐๐ = ๐๐๐ --- ๐น ๐−๐ฃ๐๐๐ข๐ ๐๐๐ต ๐๐๐ 1 − ๐น0 ๐ฅ๐๐ต๐ --- --- --- --- Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI. Null and alternative hypothesis: H0: ๐1 = ๐2 = ๐3 , HA: ๐ป0 Calculating of p-value: Source of Variation Between Groups Within Groups Total Degrees of Freedom Sum of Squares 34,0 3451,9 3485,9 / / 2 249 251 = = Mean of Squares ๐๐๐ต ๐๐๐ต = ๐๐๐ต ๐๐๐ ๐๐๐ = ๐๐๐ --- ๐น ๐−๐ฃ๐๐๐ข๐ ๐๐๐ต ๐๐๐ 1 − ๐น0 ๐ฅ๐๐ต๐ --- --- --- --- Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI. Null and alternative hypothesis: H0: ๐1 = ๐2 = ๐3 , HA: ๐ป0 Calculating of p-value: Source of Variation Between Groups Within Groups Total Sum of Squares Degrees of Freedom Mean of Squares ๐น ๐−๐ฃ๐๐๐ข๐ 34,0 2 17,0 ๐๐๐ต ๐๐๐ 1 − ๐น0 ๐ฅ๐๐ต๐ 3451,9 249 13,9 --- --- 3485,9 251 --- --- --- Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI. Null and alternative hypothesis: H0: ๐1 = ๐2 = ๐3 , HA: ๐ป0 Calculating of p-value: Source of Variation Between Groups Within Groups Total Sum of Squares Degrees of Freedom Mean of Squares ๐น ๐−๐ฃ๐๐๐ข๐ 34,0 2 17,0 1,23 0,294 3451,9 249 13,9 --- --- 3485,9 251 --- --- --- ๐ − ๐ฃ๐๐๐ข๐ = 1 − ๐น 1,23 = 0,294 , where F(x) is CDF of Fisher-Snedecor distribution with 2 , 249 degrees of freedom Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI. Null and alternative hypothesis: H0: ๐1 = ๐2 = ๐3 , HA: ๐ป0 Calculating of p-value: Source of Variation Between Groups Within Groups Total Sum of Squares Degrees of Freedom Mean of Squares ๐น ๐−๐ฃ๐๐๐ข๐ 34,0 2 17,0 1,23 0,294 3451,9 249 13,9 --- --- 3485,9 251 --- --- --- Result: We dont reject null hypothesis at the significance level 0,05. There is not a statistically significant difference between the means of BMI depended on the age. Where’s the Difference? Once ANOVA indicates that the groups do not all appear to have the same means, what do we do? Analysis of Variance for days Source DF SS MS treatmen 2 34.74 17.37 Error 22 59.26 2.69 Total 24 94.00 Level A B P N 8 8 9 Pooled StDev = Mean 7.250 8.875 10.111 1.641 StDev 1.669 1.458 1.764 F 6.45 P 0.006 Individual 95% CIs For Mean Based on Pooled StDev ----------+---------+---------+-----(-------*-------) (-------*-------) (------*-------) ----------+---------+---------+-----7.5 9.0 10.5 Clearest difference: P is worse than A (CI’s don’t overlap) Multiple Comparisons Once ANOVA indicates that the groups do not all have the same means, we can compare them two by two using the 2-sample t test. ๏ง We need to adjust our p-value threshold because we are doing multiple tests with the same data. ๏ง There are several methods for doing this. ๏ง If we really just want to test the difference between one pair of treatments, we should set the study up that way. Bonferroni method – post hoc analysis We reject null hypothesis if ๐ฅ๐ผ − ๐ฅ๐ฝ ≥ ๐ก๐−๐ 1 ๐ผ∗ − 2 ๐๐๐ where ๐ผ ∗ is correct significance level, ๐ผ ∗ = ๐ก ๐ผ∗ 1− 2 ๐ − ๐ is 1 ๐ผ∗ − 2 1 ๐๐ผ + ๐ผ ๐ 2 , 1 , ๐๐ฝ is quantile of Student distribution with ๐ − ๐ degrees of freedom. Kruskal-Wallis test ๏ง The Kruskal–Wallis test is most commonly used when there is one nominal variable and one measurement variable, and the measurement variable does not meet the normality assumption of an ANOVA. ๏ง It is the non-parametric analogue of a one-way ANOVA. Kruskal-Wallis test ๏ง Like most non-parametric tests, it is performed on ranked data, so the measurement observations are converted to their ranks in the overall data set: the smallest value gets a rank of 1, the next smallest gets a rank of 2, and so on. The loss of information involved in substituting ranks for the original values can make this a less powerful test than an anova, so the anova should be used if the data meet the assumptions. ๏ง If the original data set actually consists of one nominal variable and one ranked variable, you cannot do an anova and must use the Kruskal–Wallis test. 1. The farm bred three breeds of rabbits. An attempt was made (rabbits.xls), whose objective was to determine whether there is statistically significant (conclusive) the difference in weight between breeds of rabbits. Verify. http://vassarstats.net/anova1u.html 2. The effects of three drugs on blood clotting was determined. Among other indicators was determined the thrombin time. Information about the 45 monitored patients are recorded in the file thrombin.xls. Does the magnitude of thrombin time depend on the used preparation? http://vassarstats.net/kw3.html Study materials : ๏ง http://homel.vsb.cz/~bri10/Teaching/Bris%20Prob%20&%20Stat.pdf (p. 142 - p.154) ๏ง Shapiro, S.S., Wilk, M.B. An Analysis of Variance Test for Normality (Complete Samples). Biometrika. 1965, roฤ. 52, ฤ. 3/4, s. 591-611. Dostupné z: http://www.jstor.org/stable/2333709.