Residual Analysis for ANOVA Models KNNL – Chapter 18 Residuals Model Errors (unobserved): ^ eij Yij Y ij Yi Y i ni 1 ni n 1 n 1 s 2 eij MSE i s eij MSE i ni ni Semi-Studentized Residual (Residual divided by estimate of , trivial to compute): E eij 0 eij* 2 eij 2 eij MSE Studentized Residual (Residual divided by its standard error, messier to compute): rij ei n 1 MSE i ni Studentized Deleted Residual: 1/2 n r 1 T tij eij ni 1 2 SSE eij n i Model Departures Detected With Residuals and Plots • • • • • • Errors have non-constant variance Errors are not independent Existence of Outlying Observations Omission of Important Predictors Non-normal Errors Common Plots Residuals versus Treatment Residuals versus Treatment Mean Aligned Dot Plot (aka Strip Chart) Residuals versus Time Residuals versus Omitted Variables Box Plots, Histograms, Normal Probability Plots Tests for Constant Variance H0:12=...=t2 Hartley's Test: (Assumes normal data, equal sample sizes) H* max si2 min s 2 i Reject H 0 if H * H 1 ; r , n 1 where n1 ... nr n Brown-Forsythe Test: (Robust to non-normality, allows unequal sample sizes) ~ dij Yij Yi i 1,..., r j 1,..., ni Yi median Yi1 ,..., Yini ni d i * BF F d j 1 ni ~ r ij d MSTRBF MSEBF ni d i 1 j 1 nT n d r ij MSTRBF i 1 i i d r 1 Reject H 0 if F * F 1 ; r 1, nT r d r 2 MSEBF ni ij d i i 1 j 1 nT r 2 Remedial Measures • Normally distributed, Unequal variances – Use Weighted Least Squares with weights: wij = 1/si2 SSEw R SSEw F Fw* r 1 SSEw F nT r Conclude means not all equal if Fw* F 1 ; r 1, nT r • Non-normal data (with possibly unequal variances) – Variance Stabilizing Transformations and Box-Cox Transformation – – – – Variance proportional to mean: Y’=sqrt(Y) Standard Deviation proportional to mean: Y’=log(Y) Standard Deviation proportional to mean2: Y’=1/Y Response is a (binomial) proportion: Y’=2arcsin(sqrt(Y)) • Non-parametric tests – F-test based on ranks and Kruskal-Wallis Test Effects of Model Departures • Non-normal Data – Generally not problematic in terms of the F-test, if data are not too far from normal, and reasonably large sample sizes • Unequal Error Variances – As long as sample sizes are approximately equal, generally not a problem in terms of F-test. • Non-independence of error terms – Can cause problems with tests. Should use Repeated Measures ANOVA if same subject receives each treatment Nonparametric Tests Rank all observations across treatments from 1 to nT , assigning average ranks when ties occur ni R i r R j 1 ij R ni r ni ni R nT SSTOR Rij R i 1 j 1 ij i 1 j 1 2 1 ... nT nT nT 1 2 nT 1 nT nT 2 r SSTRR ni R i R i 1 2 r ni SSER Rij R i i 1 j 1 2 (Approximate) F test : SSTRR r 1 MSTRR FR* SSER nT r MSER Conclude means not all equal if FR* F 1 ; r 1, nT r Simultaneous CIs for Differences in Mean Ranks: R i R i ' z 1 / 2 g nT nT 1 1 1 12 ni ni ' Kruskal-Wallis Test (Directly computed in most software packages): X 2 KW r Ri2 SSTRR 12 3 nT 1 SSTOR nT nT 1 i 1 ni n 1 T 2 Conclude means not all equal if X KW 2 1 ; r 1