Analysis of Variance Introduction Analysis of Variance The Analysis of Variance is abbreviated as ANOVA Used for hypothesis testing in Simple Regression Multiple Regression Comparison of Means Sources There is variation anytime that all of the data values are not identical This variation can come from different sources such as the model or the factor There is always the left-over variation that can’t be explained by any of the other sources. This source is called the error Variation Variation is the sum of squares of the deviations of the values from the mean of those values As long as the values are not identical, there will be variation Abbreviated as SS for Sum of Squares Degrees of Freedom The degrees of freedom are the number of values that are free to vary once certain parameters have been established Usually, this is one less than the sample size, but in general, it’s the number of values minus the number of parameters being estimated Abbreviated as df Variance The sample variance is the average squared deviation from the mean Found by dividing the variation by the degrees of freedom Variance = Variation / df Abbreviated as MS for Mean of the Squares MS = SS / df F F is the F test statistic There will be an F test statistic for each source except for the error and total F is the ratio of two sample variances The MS column contains variances The F test statistic for each source is the MS for that row divided by the MS of the error row F F requires a pair of degrees of freedom, one for the numerator and one for the denominator The numerator df is the df for the source The denominator df is the df for the error row F is always a right tail test The ANOVA Table The ANOVA table is composed of rows, each row represents one source of variation For each source of variation … The variation is in the SS column The degrees of freedom is in the df column The variance is in the MS column The MS value is found by dividing the SS by the df ANOVA Table The complete ANOVA table can be generated by most statistical packages and spreadsheets We’ll concentrate on understanding how the table works rather than the formulas for the variations The ANOVA Table Source SS (variation) df MS F (variance) Explained* Error Total The explained* variation has different names depending on the particular type of ANOVA problem Example 1 Source SS df MS Explained 18.9 3 Error 72.0 16 F Total The Sum of Squares and Degrees of Freedom are given. Complete the table. Example 1 – Find Totals Source SS df MS Explained 18.9 3 Error 72.0 16 Total 90.9 19 Add the SS and df columns to get the totals. F Example 1 – Find MS Source SS df MS Explained 18.9 ÷3 = 6.30 Error 72.0 ÷ 16 = 4.50 Total 90.9 ÷ 19 = 4.78 Divide SS by df to get MS. F Example 1 – Find F Source SS df MS Explained 18.9 3 6.30 Error 72.0 16 4.50 Total 90.9 19 4.78 F = 6.30 / 4.50 = 1.4 F 1.40 Notes about the ANOVA The MS(Total) isn’t actually part of the ANOVA table, but it represents the sample variance of the response variable, so it’s useful to find The total df is one less than the sample size You would either need to find a Critical F value or the p-value to finish the hypothesis test Example 2 Source Explained Error Total Complete the table SS df 106.6 26 MS F 21.32 2.60 Example 2 – Step 1 Source Explained Error SS 106.6 df MS F 5 21.32 2.60 26 8.20 Total SS / df = MS, so 106.6 / df = 21.32. Solving for df gives df = 5. F = MS(Source) / MS(Error), so 2.60 = 21.32 / MS. Solving gives MS = 8.20. Example 2 – Step 2 Source SS df MS F 2.60 Explained 106.6 5 21.32 Error 213.2 26 8.20 Total 31 SS / df = MS, so SS / 26 = 8.20. Solving for SS gives SS = 213.2. The total df is the sum of the other df, so 5 + 26 = 31. Example 2 – Step 3 Source SS df MS F 2.60 Explained 106.6 5 21.32 Error 213.2 26 8.20 Total 319.8 31 Find the total SS by adding the 106.6 + 213.2 = 319.8 Example 2 – Step 4 Source SS df MS F 2.60 Explained 106.6 5 21.32 Error 213.2 26 8.20 Total 319.8 31 10.32 Find the MS(Total) by dividing SS by df. 319.8 / 31 = 10.32 Example 2 – Notes Since there are 31 df, the sample size was 32 Since the sample variance was 10.32 and the standard deviation is the square root of the variance, the sample standard deviation is 3.21 Example 3 Source Explained Error SS df MS 56.7 14 13.50 Total The sample size is n = 20. Work this one out on your own! F Example 3 - Solution Source Explained SS df MS 56.7 5 11.34 Error 189.0 14 13.50 Total 245.7 19 12.93 How did you do? F 0.84