Introduction to ANOVA •Research Designs for ANOVAs •Type I Error and Multiple Hypothesis Tests •The Logic of ANOVA •ANOVA vocabulary, notation, and formulas •The F table •Comparing pairs of means •Relationship between ANOVA and t-tests Research Designs Like the t-test, ANOVA is used when the independent variable in a study is categorical (e.g. treatment groups) and the dependent variable is continuous. A t-test is used when there are only two groups to compare. ANOVA is used when there are two or more groups Research Designs Terminology In analysis of variance, an independent variable is called a factor. (e.g. treatment groups) The groups that make up the independent variable are called levels of that factor. (e.g. low dose, high dose, control) A research study that involves only one factor is called a singlefactor design. A research study that involves more than one factor is called a factorial design (e.g. treatment group and gender) Research Designs Suppose that a psychologist wants to examine learning performance under three temperature conditions: 50o, 70o, 90o. Three samples of subjects are selected, one sample for each treatment condition. The subjects are taught some basic material and given a short test. Treatment 1 50o 0 1 3 1 0 X=1 Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 X=4 X=1 Factor: temperature Levels: 50, 70, 90 Research Designs Why don’t we just run three t-tests? compare 50o sample and 70o sample compare 50o sample and 90o sample compare 70o sample and 90o sample Type I Error and Multiple Tests With an = .05, there is a 5 % risk of a Type I error. Thus, for every 20 hypothesis tests, you expect to make one Type I error. The more tests you do, the more risk there is of a Type I error. Testwise alpha level Experimentwise alpha level The alpha level you select for each individual hypothesis test. The total probability of a Type I error accumulated from all of the separate tests in the experiment. e.g. three t-tests, each at = .05 -> an experimentwise alpha of .14 The Logic of ANOVA ANOVA uses one test with one alpha level to evaluate all the mean differences, thereby avoiding the problem of an inflated experimentwise alpha level Ho: 1 = 2 = 3 HA: at least one population mean is different The Logic of ANOVA Treatment 1 50o 0 1 3 1 0 X1 = 1 Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 X2 = 4 X3 = 1 There is some variability among the treatment means Is it more than we might expect from chance alone? The Logic of ANOVA t statistic Random + effect variation t = obtained difference between sample means difference expected by chance F statistic F = obtained variance between sample means variance expected by chance Random variation The Logic of ANOVA 7 6 5 4 3 2 1 0 0 1 2 Treatment Group 3 4 The Logic of ANOVA F statistic F = obtained variance between sample means 7 variance expected by chance 6 5 4 3 2 1 F = Between-Group Variance Within-Group Variance (sb2) 0 0 1 (sw2) Let’s talk about how to calculate this first 2 Treatment Group 3 4 The Logic of ANOVA Treatment 1 50o 0 1 3 1 0 X1 = 1 Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 X2 = 4 X3 = 1 Within-Groups Variance SS within s df within 2 w A summary of how much each datapoint varies from its own treatment mean The Logic of ANOVA Treatment 1 50o Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 0 1 3 1 0 X1 = 1 X2 = 4 X3 = 1 Within Groups variance is our estimate of random variation -use the variance of the datapoints around their own treatment mean -Pooled variance! The Logic of ANOVA Treatment 1 50o 0 1 3 1 0 X1 = 1 SS1 = 6 Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 X2 = 4 SS2 = 6 X3 = 1 SS3 = 4 Pooled variance SS1 SS 2 SS 3 6 6 4 1.33 s df1 df 2 df 3 4 4 4 2 p This comes from the “withintreatment” variabilities The Logic of ANOVA Treatment 1 50o 0 1 3 1 0 X1 = 1 SS1 = 6 Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 X2 = 4 SS2 = 6 X3 = 1 SS3 = 4 Pooled variance we now call Within-Groups variance SSi SS1 SS 2 SS 3 s df1 df 2 df 3 N k 2 w SS within df within 16 1.33 12 The Logic of ANOVA Treatment 1 50o 0 1 3 1 0 X1 = 1 Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 X2 = 4 X3 = 1 Within-Groups Variance SS within s df within 2 w A summary of how much each datapoint varies from its own treatment mean The Logic of ANOVA Treatment 1 50o 0 1 3 1 0 X1 = 1 Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 X3 = 1 X2 = 4 Now let’s calculate this F statistic F = Between-Groups Variance Within-Groups Variance (sb2) (sw2) The Logic of ANOVA Treatment 1 50o 0 1 3 1 0 X1 = 1 Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 X2 = 4 X3 = 1 Between-Group Variance SS between s df between 2 b A summary of how much each treatment mean varies from the grand mean The Logic of ANOVA Treatment 1 50o 0 1 3 1 0 X1 = 1 Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 X2 = 4 X3 = 1 So what is the variance between our treatment means? SS between s df between 2 b This is the “between-treatment” variability The Logic of ANOVA Treatment 1 50o Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 0 1 3 1 0 X1 = 1 We need to calculate SSbetween X2 = 4 X3 = 1 The Logic of ANOVA Treatment 1 50o 0 1 3 1 0 Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 X1 = 1 X2 = 4 X3 = 1 ( X1 X ) 1 ( X1 X ) 2 ( X1 X ) 1 SSbetween X 2 First look at the deviation between each treatment mean and the “grand mean”. The Logic of ANOVA Treatment 1 50o 0 1 3 1 0 Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 X1 = 1 X2 = 4 X3 = 1 ( X1 X ) 1 ( X1 X ) 2 ( X1 X ) 1 ( X1 X )2 1 ( X1 X )2 4 ( X1 X )2 1 SSbetween X 2 We need to square those deviations The Logic of ANOVA Treatment 1 50o 0 1 3 1 0 Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 X1 = 1 X2 = 4 X3 = 1 ( X1 X ) 1 ( X1 X ) 2 ( X1 X ) 1 ( X1 X )2 1 ( X1 X )2 4 ( X1 X )2 1 n2 ( X 2 X )2 20 n3 ( X 3 X ) 2 5 n1 ( X1 X ) 2 5 SSbetween X 2 And weight them by the sample size of each treatment group The Logic of ANOVA Treatment 1 50o 0 1 3 1 0 Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 X1 = 1 X2 = 4 X3 = 1 ( X1 X ) 1 ( X1 X ) 2 ( X1 X ) 1 ( X1 X )2 1 ( X1 X )2 4 ( X1 X )2 1 n2 ( X 2 X )2 20 n3 ( X 3 X ) 2 5 n1 ( X1 X ) 2 5 SSbetween X 2 SSbetween = 30 Now add them up to get SSbetween The Logic of ANOVA Treatment 1 50o 0 1 3 1 0 X1 = 1 n1 ( X1 X ) 2 5 Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 X2 = 4 n2 ( X 2 X )2 20 X3 = 1 n3 ( X 3 X ) 2 5 X 2 SSbetween = 30 Between-Groups Variance SS between s df between 2 b 2 n ( X X ) i i df between But what is the dfb? The Logic of ANOVA Treatment 1 50o 0 1 3 1 0 X1 = 1 n1 ( X1 X ) 2 5 Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 X2 = 4 n2 ( X 2 X )2 20 X3 = 1 n3 ( X 3 X ) 2 5 Between-Groups Variance sb2 2 n ( X X ) i i k 1 SSbetween 30 15 3 1 df between X 2 SSbetween = 30 The Logic of ANOVA Treatment 1 50o 0 1 3 1 0 X1 = 1 Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 X2 = 4 X3 = 1 Between-Group Variance SS between s df between 2 b A summary of how much each treatment mean varies from the grand mean The Logic of ANOVA Treatment 1 50o Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 0 1 3 1 0 X1 = 1 X2 = 4 X3 = 1 So what is our test statistic? F statistic 2 F = Between-Groups Variance (sb ) Within-Groups Variance (sw2) 15 11.3 1.33 The Logic of ANOVA Treatment 1 50o Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 0 1 3 1 0 X1 = 1 X2 = 4 What are its degrees of freedom? There are two! One for the numerator (dfbetween) One for the denominator (dfwithin) X3 = 1 The Logic of ANOVA F statistic F = obtained variance between sample means variance expected by chance F = between-treatment variance within-treatment variance F = sb2 sb2 sw2 sw2 SS between df between SS within df within These are the two dfs for the F test Step by Step Step 1: Calculate each treatment mean and the grand mean Step 2: Calculate the SSw, the dfw, and the sw2 Step 3: Calculate the SSb, the dfb, and the sb2 Step 4: Calculate F Step 5: Compare to critical values of F (using dfs and the F table) ANOVA table Source Between treatments Within treatments Total df s2 SSbetween df between SS b df b SS within df within SS w df w SS F sb2 sw2 Computations Treatment 1 50o 0 1 3 1 0 X1 = 1 Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 X2 = 4 X 2 X3 = 1 Step 1: Calculate each treatment mean and the grand mean Computations Treatment 1 50o 0 1 3 1 0 X1 = 1 SS1 = 6 Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 X3 = 1 SS3 = 4 X2 = 4 SS2 = 6 Step 2: Calculate the SSw, the dfw, and the sw2 SS w SS i s df w N k 2 w 664 1.33 15 3 X 2 Computations Treatment 1 50o Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 0 1 3 1 0 X1 = 1 n1 ( X1 X ) 2 5 X3 = 1 X2 = 4 n3 ( X 3 X ) 2 5 n2 ( X 2 X )2 20 Step 3: Calculate the SSb, the dfb, and the sb2 SSb s df b 2 b n (X i i X) k 1 2 5 20 5 15 3 1 X 2 Computations Treatment 1 50o Treatment 2 70o Treatment 3 90o 4 3 6 3 4 1 2 2 0 0 0 1 3 1 0 X1 = 1 X2 = 4 Step 4: Calculate F F2,12 sw2 2 sb 15 11.28 1.33 X3 = 1 X 2 ANOVA table Source SS df s2 F Between treatments Within treatments 30 2 15 11.28 16 12 1.33 Total ANOVA table Source Between treatments Within treatments Total df s2 SSbetween df between SS b df b SS within df within SS w df w SStotal df total SS F sb2 sw2 ANOVA table Treatment 1 50o 0 1 3 1 0 Treatment 2 70o 4 3 6 3 4 X1 = 1 Treatment 3 90o 1 2 2 0 0 SStotal = 46 SSbetween = 30 SSwithin = 16 X3 = 1 X2 = 4 Source SS df s2 F Between treatments Within treatments Total 30 2 15 11.28 16 12 1.33 46 14 “Partitioning of Variance” Total = Between + Within X ij X ( X i X ) ( X ij X i ) 7 6 5 4 3 2 1 0 0 1 2 Treatment Group 3 4 “Partitioning of Variance” Treatment 1 50o Treatment 2 70o Treatment 3 90o 0 1 3 1 0 4 3 6 3 4 1 2 2 0 0 X1 = 1 X2 = 4 X3 = 1 X 2 7 6 5 4 6 = 2 + (2) + (2) 3 2 X 1 0 0 1 2 Treatment Group 3 4 (Xi X ) ( X ij X i ) “Partitioning of Variance” Treatment 2 70o Treatment 1 50o 0 = 2 + (-1) + (-1) 1= 2 + (-1) + (0) 3= 2 + (-1) + (2) 1= 2 + (-1) + (0) 0= 2 + (-1) + (-1) X1 = 1 4 = 2 + (2) + (0) 3 = 2 + (2) + (-1) 6 = 2 + (2) + (2) 3 = 2 + (2) + (1) 4 = 2 + (2) + (0) X2 = 4 Treatment 3 90o 1 = 2 + (-1) + (0) 2 = 2 + (-1) + (1) 2 = 2 + (-1) + (1) 0 = 2 + (-1) + (-1) 0 = 2 + (-1) + (-1) X3 = 1 X 2 SSb sum of squared purples n( X i X ) 2 SSw sum of squared blues ( X ij X i )2 SSt sum of squared total deviations ( X ij X ) 2 ANOVA formulas Source SS df s2 F (MS) Between treatments n (X Within treatments Total ( X i ij (X ij k 1 SS b df b X i )2 N k SS w df w X )2 N 1 i X) 2 sb2 sw2 X ij an individual observatio n k the number of groups X i the mean of the ith group ni the number of subjects in the ith group X the grand mean N the number of subjects total The F Distribution F statistics • Because F-ratios are computed from two variances, F values will always be positive numbers. • When Ho is true, the numerator and denominator of the Fratio are measuring the same variance. In this case the two sample variances should be about the same size, so the ratio should be near 1. In other words, the distribution of F-ratios should pile up around 1.00. The F Table F Distribution – Positively skewed distribution: Concentration of values near 1, no values smaller than 0 are possible – Like t, F is a family of curves. Need to use two degrees of freedom (for sb2 and sw2). The F Table For an alpha of .05, the critical value of F with degrees of freedom 2,12 is Our F = 11.28 reject null = 3.88 Assumptions 1. The samples are independent simple random samples 2. The populations are normal 3. The populations have equal variance Exercise The data depicted below were obtained from an experiment designed to measure the effectiveness of three pain relievers (A, B, and C). A fourth group that received a placebo was also tested. Placebo Drug A Drug B Drug C 0 0 3 0 1 2 3 4 5 8 5 5 Is there any evidence for a significant difference between groups? Exercise Source Between treatments Within treatments Total Placebo Drug A Drug B Drug C 0 0 3 0 1 2 3 4 5 8 5 5 SS df MS Exercise Placebo Drug A Drug B Drug C 0 0 3 0 1 2 3 4 5 8 5 5 Source SS df MS Between treatments Within treatments 54 3 18 16 8 2 Total 70 11 F=9 F3,8* = 4.07, so reject null Comparisons of Specific Means The technique to use depends on the nature of the research No a priori Predictions Yes a priori Predictions If we do not have strong predictions about how the means will differ: If we do have strong predictions about how the means will differ: 1) Run an ANOVA 1) Run contrast tests on the expected differences 2) If ANOVA is significant, run post-hoc “multiple comparison” tests (e.g. LSD, Bonferroni, Tukey’s) conservative! Comparing F and t What is the relationship between F and t? Placebo Drug C 0 0 3 8 5 5 1. Compute an independent-samples t statistic for this data t = -3.54 2. Compute an F statistic for this data F = 12.53 Comparing F and t What is the relationship between F and t? F = t2