ANALYSIS OF VARIANCE One-Way ANOVA Assumptions: 1. Each of the k population or group distributions is normal. check with a Normal Quantile Plot (or boxplot) of each group 2. These distributions have identical variances (standard deviations). check if largest sd is > 2 times smallest sd 3. Each of the k samples is a random sample. 4. Each of the k samples is selected independently of one another. H0: 1 = 2 = . . . = k vs. HA: not all k means are equal (this will always be the hypotheses, the only difference is # of groups) no effect the effect of the ‘treatment’ is significant ANOVA Table: Source Group(between) Error(within) Total degrees of freedom k-1 N-k N-1 Sum of Squares ni( x i - x )2=SSG (ni – 1)si2 = SSE (xij- x )2= SSTot Mean Squares SSG/(k-1) = MSG SSE/(N-k) = MSE SSTot/(N-1)=MST F value p-value MSG/MSE = Fk-1,N-k Pr(F > Fk-1,N-k *) N = total number of observations = ni, where ni = number of observations for group i The F test statistic has two different degrees of freedom: the numerator = k –1, and the denominator = N – k Fk-1,N-k 2 2 2 NOTE: SSE/(N-k) = MSE = sp2 = (pooled sample variance) = (n1 1)s1 ... (nk 1) s k = ˆ = estimate for assumed (n1 1) ... (nk 1) equal variance this is the ‘average’ variance for each group SSTot/(N-1) = MSTOT = s2 = the total variance of the data (assuming NO groups) F variance of the (between) samples means divided by the ~average variance of the data, the larger the F (the smaller the pvalue) the more varied the means are so the less likely H0 is true. We reject when the p-value < . R2 = proportion of the total variation explained by the difference in means = SSG SSTot One-way ANOVA F-test Hypotheses: Ho: 1 = 2 = … = k (there is no effect on the means due to the categorical variable) HA: the means are NOT all equal (the effect due to the categorical variable is stat sig) F test statistic has 2 dfs: k1 and Nk, where k is the number of groups (means being tested) and N is the total number of obs Assumptions: 1. Each of the k population or group distributions is normal. check with a Normal Quantile Plot (or boxplot) of each group 2. These distributions have identical variances (standard deviations). check if largest sd is > 2 times smallest sd or use Levine’s test (MSE = pooled variance and is the estimate of the true variance within each group) 3. Each of the k samples is a random sample. 4. Each of the k samples is selected independently of one another. These assumptions are exactly the same as the pooled t-test. Type I error: We claim that there is a stat sig effect when there actually is not one. We claim the means are not all the same when actually they are. Type II error: We fail to prove there is a stat sig effect even though it does exist. We fail to prove the means are not all the same even though they are not all the same. p-value interpretation: How often we see at least this strong of an effect when there actually is not one. Conclusion: If p-value < , we reject H0 and conclude that there is a statistically significant effect. If p-value NOT < , we fail to reject H0 and cannot prove that there is a statistically significant effect.