The Assumptions of ANOVA Dennis Monday Gary Klein Sunmi Lee May 10, 2005 Major Assumptions of Analysis of Variance • The Assumptions – Independence – Normally distributed – Homogeneity of variances • Our Purpose – Examine these assumptions – Provide various tests for these assumptions • Theory • Sample SAS code (SAS, Version 8.2) – Consequences when these assumptions are not met – Remedial measures Normality • Why normal? – ANOVA is an Analysis of Variance – Analysis of two variances, more specifically, the ratio of two variances – Statistical inference is based on the F distribution which is given by the ratio of two chi-squared distributions – No surprise that each variance in the ANOVA ratio come from a parent normal distribution • Calculations can always be derived no matter what the distribution is. Calculations are algebraic properties separating sums of squares. Normality is only needed for statistical inference. Normality Tests • Wide variety of tests we can perform to test if the data follows a normal distribution. • Mardia (1980) provides an extensive list for both the univariate and multivariate cases, categorizing them into two types – Properties of normal distribution, more specifically, the first four moments of the normal distribution • Shapiro-Wilk’s W (compares the ratio of the standard deviation to the variance multiplied by a constant to one) – Goodness-of-fit tests, • Kolmogorov-Smirnov D • Cramer-von Mises W2 • Anderson-Darling A2 Normality Tests proc univariate data=temp normal plot; var expvar; run; proc univariate data=temp normal plot; var normvar; run; Tests for Normality Tests for Normality Test --Statistic--- -----p Value------ Test --Statistic--- -----p Value------ Shapiro-Wilk Kolmogorov-Smirnov Cramer-von Mises Anderson-Darling W D W-Sq A-Sq Pr Pr Pr Pr Shapiro-Wilk Kolmogorov-Smirnov Cramer-von Mises Anderson-Darling W D W-Sq A-Sq Pr Pr Pr Pr 0.731203 0.206069 1.391667 7.797847 < > > > W D W-Sq A-Sq <0.0001 <0.0100 <0.0050 <0.0050 0.989846 0.057951 0.03225 0.224264 < > > > W D W-Sq A-Sq 0.6521 >0.1500 >0.2500 >0.2500 Normal Probability Plot 8.25+ | * | | | * | | * | + 4.25+ ** | ++++ ** +++ | *+++ | +++* | ++**** | ++++ ** | ++++***** | ++****** 0.25+* * ****************** +----+----+----+----+----+----+----+----+----+----+ Stem 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 Leaf 0 # 1 Boxplot * 1 1 * 2 5 4 588 3 59 00112234 56688 00011122223444 55555566667777778999999 000011111111111112222222233333334444444 ----+----+----+----+----+----+----+---- 1 1 1 3 1 2 8 5 14 23 39 * 0 0 0 0 | | | +--+--+ *-----* +-----+ Normal Probability Plot 2.3+ ++ * | ++* | +** | +** | **** | *** | **+ | ** | *** | **+ | *** 0.1+ *** | ** | *** | *** | ** | +*** | +** | +** | **** | ++ | +* -2.1+*++ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 Stem 22 20 18 16 14 12 10 8 6 4 2 0 -0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -20 Leaf 1 7 90 047 6779 469002 2368 005546 228880077 5233446 3458447 366904459 52871 884318651 98619 60 98557220 963 584 853 0 4 8 ----+----+----+----+ Multiply Stem.Leaf by 10**-1 # 1 1 2 3 4 6 4 6 9 7 7 9 5 9 5 2 8 3 3 3 1 1 1 Boxplot | | | | | | | +-----+ | | | | *-----* | + | | | | | +-----+ | | | | | | | | Consequences of Non-Normality • F-test is very robust against non-normal data, especially in a fixed-effects model • Large sample size will approximate normality by Central Limit Theorem (recommended sample size > 50) • Simulations have shown unequal sample sizes between treatment groups magnify any departure from normality • A large deviation from normality leads to hypothesis test conclusions that are too liberal and a decrease in power and efficiency Remedial Measures for NonNormality • Data transformation • Be aware - transformations may lead to a fundamental change in the relationship between the dependent and the independent variable and is not always recommended. • Don’t use the standard F-test. – Modified F-tests • Adjust the degrees of freedom • Rank F-test (capitalizes the F-tests robustness) – Randomization test on the F-ratio – Other non-parametric test if distribution is unknown – Make up our own test using a likelihood ratio if distribution is known Independence • Independent observations – No correlation between error terms – No correlation between independent variables and error • Positively correlated data inflates standard error – The estimation of the treatment means are more accurate than the standard error shows. Independence Tests • If we have some notion of how the data was collected, we can check if there exists any autocorrelation. • The Durbin-Watson statistic looks at the correlation of each value and the value before it – Data must be sorted in correct order for meaningful results – For example, samples collected at the same time would be ordered by time if we suspect results could depend on time Independence Tests proc glm data=temp; class trt; model y = trt / p; output out=out_ds r=resid_var; run; quit; proc glm data=temp; class trt; model y = trt / p; output out=out_ds r=resid_var; run; quit; data out_ds; set out_ds; time = _n_; run; proc gplot data=out_ds; plot resid_var * time; run; quit; data out_ds; set out_ds; time = _n_; run; proc gplot data=out_ds; plot resid_var * time; run; quit; First Order Autocorrelation 0.90931 Durbin-Watson D 0.12405 First Order Autocorrelation 0.00479029 Durbin-Watson D 1.96904290 Remedial Measures for Dependent Data • First defense against dependent data is proper study design and randomization – Designs could be implemented that takes correlation into account, e.g., crossover design • Look for environmental factors unaccounted for – Add covariates to the model if they are causing correlation, e.g., quantified learning curves • If no underlying factors can be found attributed to the autocorrelation – Use a different model, e.g., random effects model – Transform the independent variables using the correlation coefficient Homogeneity of Variances • Eisenhart (1947) describes the problem of unequal variances as follows – the ANOVA model is based on the proportion of the mean squares of the factors and the residual mean squares – The residual mean square is the unbiased estimator of 2, the variance of a single observation – The between treatment mean squares takes into account not only the differences between observations, 2, just like the residual mean squares, but also the variance between treatments – If there was non-constant variance among treatments, we can replace the residual mean square with some overall variance, a2, and a treatment variance, t2, which is some weighted version of a2 – The “neatness” of ANOVA is lost Homogeneity of Variances • The omnibus (overall) F-test is very robust against heterogeneity of variances, especially with fixed effects and equal sample sizes. • Tests for treatment differences like t-tests and contrasts are severely affected, resulting in inferences that may be too liberal or conservative. Tests for Homogeneity of Variances – Levene’s Test • computes a one-way-anova on the absolute value (or sometimes the square) of the residuals, |yij – ŷi| with t-1, N – t degrees of freedom • Considered robust to departures of normality, but too conservative – Brown-Forsythe Test • a slight modification of Levene’s test, where the median is substituted for the mean (Kuehl (2000) refers to it as the Levene (med) Test) – The Fmax Test • Proportion of the largest variance of the treatment groups to the smallest and compares it to a critical value table • Tabachnik and Fidell (2001) use the Fmax ratio more as a rule of thumb rather than using a table of critical values. – Fmax ratio is no greater than 10 – Sample sizes of groups are approximately equal (ratio of smallest to largest is no greater than 4) • No matter how the Fmax test is used, normality must be assumed. Tests for Homogeneity of Variances proc glm class model means run; quit; data=temp; trt; y = trt; trt / hovtest=levene hovtest=bf; Homogeneous Variances The GLM Procedure proc glm class model means run; quit; data=temp; trt; y = trt; trt / hovtest=levene hovtest=bf; Heterogenous Variances The GLM Procedure Levene's Test for Homogeneity of Y Variance ANOVA of Squared Deviations from Group Means Source DF Sum of Squares Mean Square TRT Error 1 98 10.2533 1663.5 10.2533 16.9747 Levene's Test for Homogeneity of y Variance ANOVA of Squared Deviations from Group Means F Value Pr > F Source DF Sum of Squares Mean Square 0.60 0.4389 trt Error 1 98 10459.1 27921.5 10459.1 284.9 Brown and Forsythe's Test for Homogeneity of Y Variance ANOVA of Absolute Deviations from Group Medians Source DF Sum of Squares Mean Square TRT Error 1 98 0.7087 124.6 0.7087 1.2710 F Value Pr > F 36.71 <.0001 Brown and Forsythe's Test for Homogeneity of y Variance ANOVA of Absolute Deviations from Group Medians F Value Pr > F Source DF Sum of Squares Mean Square 0.56 0.4570 trt Error 1 98 318.3 333.8 318.3 3.4065 F Value Pr > F 93.45 <.0001 Tests for Homogeneity of Variances • SAS (as far as I know) does not have a procedure to obtain Fmax (but easy to calculate) • More importantly: VARIANCE TESTS ARE ONLY FOR ONE-WAY ANOVA WARNING: Homogeneity of variance testing and Welch's ANOVA are only available for unweighted one-way models. Tests for Homogeneity of Variances (Randomized Complete Block Design and/or Factorial Design) • In a CRD, the variance of each treatment group is checked for homogeneity • In factorial/RCBD, each cell’s variance should be checked H0: σij2 = σi’j’2, For all i,j where i ≠ i’, j ≠ j’ Tests for Homogeneity of Variances (Randomized Complete Block Design and/or Factorial Design) • • Approach 1 – – Code each row/column to its own group Run HOVTESTS as before Approach 2 – – – data newgroup; set oldgroup; if block = 1 and treat = 1 then newgroup if block = 1 and treat = 2 then newgroup if block = 2 and treat = 1 then newgroup if block = 2 and treat = 2 then newgroup if block = 3 and treat = 1 then newgroup if block = 3 and treat = 2 then newgroup Recall Levene’s Test and BrownForsythe Test are ANOVAs based on residuals Find residual for each observation Run ANOVA proc sort data=oldgroup; by treat block; run; = 1; = 2; = 3; = 4; = 5; = 6; run; proc glm data=newgroup; class newgroup; model y = newgroup; means newgroup / hovtest=levene hovtest=bf; run; quit; proc means data=oldgroup noprint; by treat block; var y; output out=stats mean=mean median=median; run; data newgroup; merge oldgroup stats; by treat block; resid = abs(mean - y); if block = 1 and treat = 1 then newgroup = 1; ……… run; proc glm data=newgroup; class newgroup; model resid = newgroup; run; quit; Tests for Homogeneity of Variances (Repeated-Measures Design) • Recall the repeated-measures set-up: Treatment a1 a2 a3 s1 s1 s1 s2 s2 s2 s3 s3 s3 s4 s4 s4 Tests for Homogeneity of Variances (Repeated-Measures Design) • As there is only one score per cell, the variance of each cell cannot be computed. Instead, four assumptions need to be tested/satisfied – Compound Symmetry • Homogeneity of variance in each column – σa12 = σa22 = σa32 • Homogeneity of covariance between columns – σa1a2 = σa2a3 = σa3a1 – No A x S Interaction (Additivity) – Sphericity • Variance of difference scores between pairs are equal – σYa1-Ya2 = σYa1-Ya3 = σYa2-Ya3 Tests for Homogeneity of Variances (Repeated-Measures Design) • Usually, testing sphericity will suffice • Sphericity can be tested using the Mauchly test in SAS proc glm data=temp; class sub; model a1 a2 a3 = sub / nouni; repeated as 3 (1 2 3) polynomial / summary printe; run; quit; Sphericity Tests Variables Transformed Variates Orthogonal Components DF 2 2 Mauchly's Criterion Det = 0 Det = 0 Chi-Square 6.01 6.03 Pr > ChiSq .056 .062 Tests for Homogeneity of Variances (Latin-Squares/Split-Plot Design) • If there is only one score per cell, homogeneity of variances needs to be shown for the marginals of each column and each row – Each factor for a latin-square – Whole plots and subplots for split-plot • If there are repititions, homogeneity is to be shown within each cell like RCBD • If there are repeated-measures, follow guidelines for sphericity, compound symmetry and additivity as well Remedial Measures for Heterogeneous Variances • Studies that do not involve repeated measures – If normality is not violated, a weighted ANOVA is suggested (e.g., Welch’s ANOVA) – If normality is violated, the data transformation necessary to normalize data will usually stabilize variances as well – If variances are still not homogeneous, non-ANOVA tests might be your option • Studies with repeated measures – For violations of sphericity • modify the degrees of freedom have been suggested. – Greenhouse-Geisser – Huynh and Feldt • Only do specific comparisons (sphericity does not apply since only two groups – sphericity implies more than two) • MANOVA • Use an MLE procedure to specify variance-covariance matrix Other Concerns • Outliers and influential points – Data should always be checked for influential points that might bias statistical inference • Use scatterplots of residuals • Statistical tests using regression to detect outliers – DFBETAS – Cook’s D References • Casella, G. and Berger, R. (2002). Statistical Inference. United States: Duxbury. • Cochran, W. G. (1947). Some Consequences When the Assumptions for the Analysis of Variances are not Satisfied. Biometrics. Vol. 3, 22-38. • Eisenhart, C. (1947). The Assumptions Underlying the Analysis of Variance. Biometrics. Vol. 3, 1-21. • Ito, P. K. (1980). Robustness of ANOVA and MANOVA Test Procedures. Handbook of Statistics 1: Analysis of Variance (P. R. Krishnaiah, ed.), 199-236. Amsterdam: NorthHolland. • Kaskey, G., et al. (1980). Transformations to Normality. Handbook of Statistics 1: Analysis of Variance (P. R. Krishnaiah, ed.), 321-341. Amsterdam: North-Holland. • Kuehl, R. (2000). Design of Experiments: Statistical Principles of Research Design and Analysis, 2nd edition. United States: Duxbury. • Kutner, M. H., et al. (2005). Applied Linear Statistical Models, 5th edition. New York: McGraw-Hill. • Mardia, K. V. (1980). Tests of Univariate and Multivariate Normality. Handbook of Statistics 1: Analysis of Variance (P. R. Krishnaiah, ed.), 279-320. Amsterdam: North-Holland. • Tabachnik, B. and Fidell, L. (2001). Computer-Assisted Research Design and Analysis. Boston: Allyn & Bacon.