Chapters 8-9 1-Way Analysis of Variance Completely Randomized Design Comparing t > 2 Groups - Numeric Responses • Extension of Methods used to Compare 2 Groups • Independent Samples and Paired Data Designs • Normal and non-normal data distributions Data Design Normal Nonnormal Independent Samples (CRD) Paired Data (RBD) F-Test 1-Way ANOVA F-Test 2-Way ANOVA KruskalWallis Test Friedman’s Test Completely Randomized Design (CRD) • Controlled Experiments - Subjects assigned at random to one of the t treatments to be compared • Observational Studies - Subjects are sampled from t existing groups • Statistical model yij is measurement from the jth subject from group i: yij i ij i ij where is the overall mean, i is the effect of treatment i , ij is a random error, and i is the population mean for group i 1-Way ANOVA for Normal Data (CRD) • For each group obtain the mean, standard deviation, and sample size: y i. yij j ni si 2 ( y y ) ij i. j ni 1 • Obtain the overall mean and sample size N n1 ... nt n1 y1. ... nt y t . i j yij y.. N N Analysis of Variance - Sums of Squares • Total Variation TSS i 1 j 1 ( yij y.. ) 2 t ni dfTotal N 1 • Between Group (Sample) Variation SSB i 1 j 1 ( y i. y.. ) i 1 ni ( y i. y.. ) 2 t ni 2 t df B t 1 • Within Group (Sample) Variation SSW i 1 j 1 ( yij y i. ) i 1 (ni 1) si2 t ni TSS SSB SSW 2 t dfTotal df B dfW dfW N t Analysis of Variance Table and F-Test Source of Variation Treatments Error Total Sum of Squares SSB SSW TSS Degrees of Freedom t-1 nT-t nT -1 Mean Square MSB=SSB/(t-1) MSW=SSW/( nT -t) F F=MSB/MSW • Assumption: All distributions normal with common variance •H0: No differences among Group Means (1 t =0) • HA: Group means are not all equal (Not all i are 0) MSB T .S . : Fobs MSW R.R. : Fobs F ,t 1, N t P val : P ( F Fobs ) (Table 8) Expected Mean Squares • Model: yij = +i + ij with ij ~ N(0,s2), Si = 0: E ( MSW ) s 2 t E ( MSB) s 2 2 n ii i 1 t 1 t s 2 E ( MSB ) E ( MSW ) 2 n ii i 1 s t 1 1 2 t 2 n ii i 1 2 s (t 1) E ( MSB ) When H 0 : 1 t 0 is true, 1 E ( MSW ) E ( MSB ) otherwise (H a is true), 1 E ( MSW ) Expected Mean Squares • 3 Factors effect magnitude of F-statistic (for fixed t) – True group effects (1,…,t) – Group sample sizes (n1,…,nt) – Within group variance (s2) • Fobs = MSB/MSW • When H0 is true (1=…=t=0), E(MSB)/E(MSW)=1 • Marginal Effects of each factor (all other factors fixed) – As spread in (1,…,t) E(MSB)/E(MSW) – As (n1,…,nt) E(MSB)/E(MSW) (when H0 false) – As s2 E(MSB)/E(MSW) (when H0 false) 0.09 0.09 0.08 0.08 0.07 0.07 0.06 0.06 0.05 0.05 0.04 0.04 0.03 E ( MSB) E ( MSW ) 0.02 0.01 0 0 20 40 60 80 100 120 140 160 180 200 A) =100, 1=-20, 2=0, 3=20, s = 20 n 4 8 12 20 0.09 0.08 0.07 0.06 A 9 17 25 41 B 129 257 385 641 C 1.5 2 2.5 3.5 0.03 0.02 0.01 0 0 20 40 60 80 100 120 140 160 180 200 B) =100, 1=-20, 2=0, 3=20, s = 5 D 9 17 25 41 0.09 0.08 0.07 0.05 0.06 0.04 0.05 0.03 0.04 0.02 0.03 0.01 0.02 0 0.01 0 20 40 60 80 100 120 140 160 180 0 0 C) =100, 1=-5, 2=0, 3=5, s = 20 20 40 60 80 100 120 140 160 180 D) =100, 1=-5, 2=0, 3=5, s = 5 200 Example - Seasonal Diet Patterns in Ravens • “Treatments” - t = 4 seasons of year (3 “replicates” each) – – – – Winter: November, December, January Spring: February, March, April Summer: May, June, July Fall: August, September, October • Response (Y) - Vegetation (percent of total pellet weight) • Transformation (For approximate normality): Y Y ' arcsin 100 Source: K.A. Engel and L.S. Young (1989). “Spatial and Temporal Patterns in the Diet of Common Ravens in Southwestern Idaho,” The Condor, 91:372-378 Seasonal Diet Patterns in Ravens - Data/Means Y j=1 j=2 j=3 Winter(i=1) 94.3 90.3 83.0 Fall(i=2) 80.7 90.5 91.8 Summer(i=3) 80.5 74.3 32.4 Fall (i=4) 67.8 91.8 89.3 Y' j=1 j=2 j=3 Winter(i=1) 1.329721 1.254080 1.145808 Fall(i=2) 1.115957 1.257474 1.280374 Summer(i=3) 1.113428 1.039152 0.605545 Fall (i=4) 0.967390 1.280374 1.237554 1.329721 1.254080 1.145808 y 1. 1.24203 3 1.115957 1.257474 1.280374 y 2. 1.217935 3 1.113428 1.039152 0.605545 y 3. 0.919375 3 0.967390 1.280374 1.237554 y 4. 1.16773 3 1.329721 ... 1.237554 y .. 1.135572 12 Seasonal Diet Patterns in Ravens - Data/Means Plot of Transformed Data by Season 1.500000 1.400000 1.300000 Transformed % Vegetation 1.200000 1.100000 1.000000 0.900000 0.800000 0.700000 0.600000 0.500000 0 1 2 3 Season 4 5 Seasonal Diet Patterns in Ravens - ANOVA Total Variation: (dfTotal 12-1 11) TSS (1.329721 1.135572) 2 ... (1.27554 1.135572) 2 0.438425 Between Group Variation: (dfT 4-1 3) SSB 3 (1.24203 1.135572) 2 ... (1.161773 1.135572) 2 0.197387 Within Group Variation: (dfE 12-4 8) SSW (1.329721 1.243203) 2 ... (1.237554 1.161773) 2 0.241038 ANOVA Source of Variation Between Groups Within Groups SS 0.197387 0.241038 df 3 8 Total 0.438425 11 MS F P-value 0.065796 2.183752 0.167768 0.03013 F crit 4.06618 Do not conclude that seasons differ with respect to vegetation intake Seasonal Diet Patterns in Ravens - Spreadsheet Month NOV DEC JAN FEB MAR APR MAY JUN JUL AUG SEP OCT Season 1 1 1 2 2 2 3 3 3 4 4 4 Y' 1.329721 1.254080 1.145808 1.115957 1.257474 1.280374 1.113428 1.039152 0.605545 0.967390 1.280374 1.237554 Season MeanOverall Mean 1.243203 1.135572 1.243203 1.135572 1.243203 1.135572 1.217935 1.135572 1.217935 1.135572 1.217935 1.135572 0.919375 1.135572 0.919375 1.135572 0.919375 1.135572 1.161773 1.135572 1.161773 1.135572 1.161773 1.135572 Sum TSS 0.037694 0.014044 0.000105 0.000385 0.014860 0.020968 0.000490 0.009297 0.280928 0.028285 0.020968 0.010400 0.438425 Total SS Between Season SS (Y’-Overall Mean)2 (Group Mean-Overall Mean)2 SST 0.011584 0.011584 0.011584 0.006784 0.006784 0.006784 0.046741 0.046741 0.046741 0.000687 0.000687 0.000687 0.197387 SSE 0.007485 0.000118 0.009486 0.010400 0.001563 0.003899 0.037657 0.014346 0.098489 0.037785 0.014066 0.005743 0.241038 Within Season SS (Y’-Group Mean)2 Transformations for Constant Variance s 2 k s 2 k2 yT y or yT Used for Poisson Distribution k 1 y 0.375 yT ln y or yT ln y 1 s k 1 2 yT sin 1 y 1 Used for Binomial Distrbution k , n Box-Cox Transformation: Power Transformation used to obtain normality and often constant variance y 0 yT ln y 0 y ^ CRD with Non-Normal Data Kruskal-Wallis Test • Extension of Wilcoxon Rank-Sum Test to k > 2 Groups • Procedure: – Rank the observations across groups from smallest (1) to largest ( N = n1+...+nk ), adjusting for ties – Compute the rank sums for each group: T1,...,Tk . Note that T1+...+Tk = N(N +1)/2 Kruskal-Wallis Test • H0: The k population distributions are identical • HA: Not all k distributions are identical 2 12 k Ti T .S .: H 3( N 1) N ( N 1) i 1 ni R.R.: H ,k 1 2 P val : P( H ) 2 An adjustment to H is suggested when there are many ties in the data. H' H t 3j t j 1 j 3 N N t j number of observations in j th group of tied ranks Example - Seasonal Diet Patterns in Ravens Month NOV DEC JAN FEB MAR APR MAY JUN JUL AUG SEP OCT Season 1 1 1 2 2 2 3 3 3 4 4 4 Y' 1.329721 1.254080 1.145808 1.115957 1.257474 1.280374 1.113428 1.039152 0.605545 0.967390 1.280374 1.237554 H 0 : No seasonal difference Rank 12 8 6 5 9 10.5 4 3 1 2 10.5 7 • T1 = 12+8+6 = 26 • T2 = 5+9+10.5 = 24.5 • T3 = 4+3+1 = 8 • T4 = 2+10.5+7 = 19.5 H a : Seasonal Difference s (26) 2 (24.5) 2 (8) 2 (19.5) 2 12 T .S . : H 3(12 1) 44.12 39 5.12 12(12 1) 3 3 3 3 R.R.( 0.05) : H .205, 41 7.815 P value : P ( 2 H 5.12) .1632 Linear Contrasts • Linear functions of the treatment means (population and sample) such that the coefficients sum to 0. • Used to compare groups or pairs of treatment means, based on research question(s) of interest t Population Contrast: l a11 ... at t ai i where i 1 t a i 1 i t ^ Estimated Contrast: l a1 y1 ... at y t ai y i i 1 2 2 2 n a s s 2 2 i V l a12 s ... at i 1 ni n1 nt ^ ^ MSW t 2 n1 ... nt n V l ai n i 1 ^ ai2 V l MSW i 1 ni ^ ^ n 0 Orthogonal Contrasts & Sums of Squares n Two Contrasts: l1 ai i i 1 n l2 bi i i 1 n n a b 0 i 1 i i 1 i n ab l1 , l2 are orthogonal if: i i 0 i 1 ni n for balanced data, a b 0 i 1 2 ^ 2 n ai y i l Contrast Sum of Squares: SSC i 1 n 2 n 2 ai ai i 1 ni i 1 ni i i 2 ^ nl for balanced data, SSC n ai2 i 1 Among t treatments, we can obtain t 1 pairwise orthogonal Contrasts: l1 ,..., lt 1 Then, we can decompose the Between Treatment Sum of Squares into the Contrasts: SSB SSC1 ... SSCt 1 where each of the Contrast Sums of Squares has 1 degree of freedom For any contrast: Testing H 0 k : lk 0 H Ak : lk 0 TS : Fk SSCk MSW RR : Fk F ,1,nT t Simultaneous Tests of Multiple Contrasts • Using m contrasts for comparisons among t treatments • Each contrast to be tested at significance level, which we label as I for individual comparison Type I error rate • Probability of making at least one false rejection of one of the m null hypotheses is the experimentwise Type I error rate, which we label as E • Tests are not independent unless the error (Within Group) degrees are infinite, however Bonferroni inequality implies that E ≤ mI Choose I = E / m Scheffe’s Method for All Contrasts • Can be used for any number of contrasts, even those suggested by data. Conservative (Wide CI’s, Low Power) t l ai i i 1 ^ t l ai y i i 1 t s.t. ai 0 i 1 a ^ SE l MSW i i 1 ni t t t i 1 i 1 2 H 0 : l ai i 0 H A : l ai i 0 ^ Reject H 0 if l SE l (t 1) F E ,df1 ,df2 ^ df1 t 1, df 2 df Error N t ^ Simultaneous 1 100% Confidence Intervals: l SE l (t 1) F E ,df1 ,df2 ^ Post-hoc Comparisons of Treatments • If differences in group means are determined from the Ftest, researchers want to compare pairs of groups. Three popular methods include: – Fisher’s LSD - Upon rejecting the null hypothesis of no differences in group means, LSD method is equivalent to doing pairwise comparisons among all pairs of groups as in Chapter 6. – Tukey’s Method - Specifically compares all t(t-1)/2 pairs of groups. Utilizes a special table (Table 10, pp. 1110-1111). – Bonferroni’s Method - Adjusts individual comparison error rates so that all conclusions will be correct at desired confidence/significance level. Any number of comparisons can be made. Very general approach can be applied to any inferential problem Fisher’s Least Significant Difference Procedure • Protected Version is to only apply method after significant result in overall F-test • For each pair of groups, compute the least significant difference (LSD) that the sample means need to differ by to conclude the population means are not equal LSDij t / 2 1 1 MSE n n j i with df N t Conclude i j if y i. y j . LSDij Fisher' s Confidence Interval : y i. y j . LSDij Tukey’s W Procedure • More conservative than Fisher’s LSD (minimum significant difference and confidence interval width are higher). • Derived so that the probability that at least one false difference is detected is (experimentwise error rate) MSW Wij q (t , ) n q given in Table 10, pp. 1110-1111 with N - t Conclude i j if y i. y j . Wij Tukey's Confidence Interval: y i. y j . Wij 1 1 q (t , ) When the sample sizes are unequal, use Wij MSW n n 2 j i Bonferroni’s Method (Most General) • Wish to make C comparisons of pairs of groups with simultaneous confidence intervals or 2-sided tests •When all pair of treatments are to be compared, C = t(t-1)/2 • Want the overall confidence level for all intervals to be “correct” to be 95% or the overall type I error rate for all tests to be 0.05 • For confidence intervals, construct (1-(0.05/C))100% CIs for the difference in each pair of group means (wider than 95% CIs) • Conduct each test at =0.05/C significance level (rejection region cut-offs more extreme than when =0.05) • Critical t-values are given in table on class website, we will use notation: t/2,C, where C=#Comparisons, = df Bonferroni’s Method (Most General) 1 1 Bij t / 2,C ,v MSE n n j i (t given on class website with v N-t ) Conclude i j if y i. y j . Bij Bonferroni ' s Confidence Interval : y i. y j . Bij Example - Seasonal Diet Patterns in Ravens Note: No differences were found, these calculations are only for demonstration purposes MSE 0.03013 ni 3 t.025,8 2.306 q.05,t 4,df E 8 4.53 t.025,C 6,df E 8 3.479 1 1 LSDij 2.306 (0.03013) 0.3268 3 3 1 Wij 4.53 (0.03013) 0.4540 3 1 1 Bij 3.479 (0.03013) 0.4930 3 3 Comparison(i vs j) Group i Mean Group j MeanDifference 1 vs 2 1.243203 1.217935 0.025267 1 vs 3 1.243203 0.919375 0.323828 1 vs 4 1.243203 1.161773 0.081430 2 vs 3 1.217935 0.919375 0.298560 2 vs 4 1.217935 1.161773 0.056162 3 vs 4 0.919375 1.161773 -0.242398 Dunnett’s Method to Compare Trts to Control • Want to compare t-1 “active” treatments with a “control” condition simultaneously • Based on equal sample sizes of n subjects per treatment • Makes use of Dunnett’s d-table (Table 11, pp. 1112-15), indexed by k = t-1 comparisons, error degrees of freedom, = N – t, E, and whether the individual tests are 1-sided or 2-sided D d E k , 2 MSW n H Ai : i c Reject H 0i if H Ai : i c Reject H 0i if y i y c D1-sided H Ai : i c Reject H 0i if y i y c D1-sided H 0 i : i c y i y c D2-sided Nonparametric Comparisons • Based on results from Kruskal-Wallis Test • Makes use of the Rank-Sums for the t treatments • If K-W test is not significant, do not compare any pairs of treatments • Compute the t(t-1)/2 absolute mean differences • Makes use of the Studentized Range Table (table 10) • For large-samples, conclude treatments are different if: R i R j KWij q E t , 2 N N 1 1 1 12 ni n j