Analysis of Variance Notes Experiments versus Studies Types of Experiments Assumptions & Assumption Checks Types of Analysis NCSS 1. Experiments versus Studies 1.1 Terminology Factors versus Independent Variables Example: hours studied and major are two factors affecting Grade Treatments – Example: specific combinations of hours studied and teaching method 1.2 Purpose Observational Study – o Correlational – o Observe values of X Experiment – o Cause-Effect o Control values of X Designs o Balanced o unbalanced 2. Types of Experimental Designs 2.1 Randomized Design one factor two factor 2.2 Randomized Block Design 2.3 Examples Teaching Method only Teaching method and hours studied Teaching method within major 3. Assumptions & Assumption Checks 3.1 Assumptions Same Variance Independence Normality 3.2 Assumption checks Modified Levine – comparing differences to center Normality Tests and Box Plots 4. Analysis Sources of Variability and degrees of freedom Tests of effects of o One factor designs o Each factor in two factor designs o Combination effects in two factor designs Tests of Assumptions Tests and estimation of differences in averages 4.1 Sources of variability and degrees of freedom Total: Values around overall average: divisor of (n-1) Factor: Factor averages variation : divisor of (# averages – 1) Interaction: Combination effects: divisor of (product of factor divisors) Error: Randomness: divisor of (n - # of averages or combination of averages) 4.2 Tests of effects of 4.2.1 One Factor – Completely Randomized Design or Independent Sample Study 4.2.1.1 Test Template: Null hypothesis: average value of Y is the same for all levels of the factor Alternative: at least two are different Test Statistic: Compares variation of factor averages to variation of random data Among-Group variation to within-group variation Rejection Region: Above ratio is large (F ratio) > F table Two degrees of freedom: numerator degrees of freedom and divisor degrees of freedom Conclusion: We can (not) say the average value of Y differs for at least two levels of the factor. 4.2.1.2 Example: Y = tensile strength of a product Factor = 4 Suppliers Obtain samples of size 5 from each supplier (n = ____ ) MSA = sample factor variability = 21.095 MSW = sample error variability = 6.094 Null hypothesis: 1=2=3=4 (average value of ______________ is the Same for all ________________) Alternative: at least two are different Test Statistic: MSA/MSW = Rejection Region: Reject Ho if F > F table with Numerator degrees of freedom = ______ and denomination d.f. = _____ F-Table = _______ Conclusion: We can (not) say that the average _________________ differs for at least two ________________________ 4.2.2 One Factor – Randomized Block 4.2.2.1 Test Template: Same as in 4.2.1.1 but divisor degrees of freedom = (# of factor means-1)*( # of block means-1) 4.2.1.2 Example: Y = Rating of a restaurant’s service Factor = 4 Restaurants Block = all restaurants reviewed by same 6 raters (n = ____ ) MSA = sample factor variability = 595.8 MSE = sample error variability = 14.986 Null hypothesis: 1=2=3=4 (average value of ______________ is the Same for all ________________) Alternative: at least two are different Test Statistic: MSA/MSW = Rejection Region: Reject Ho if F > F table with Numerator degrees of freedom = ______ and denomination d.f. = _____ F-Table = _______ Conclusion: We can (not) say that the average _________________ differs for at least two ________________________ 4.2.3 Two Factors – Interaction or combination effects 4.2.3.1 Test Template: Null hypothesis: (no interaction) difference in average value of Y between any two levels of factor one does not depend on the level of factor two Alternative: (interaction) difference in average value of Y between any two levels of factor one does depend on the level of factor two Test Statistic: Compares variation of interaction to variation of random data Among-Group variation to within-group variation Rejection Region: Above ratio is large (F ratio) > F table Two degrees of freedom: numerator d.f. = product of factor 1 and 2 d.f. denominator = n – number of combination of factor 1 and 2 Conclusion: we can (not) say that the difference in average value of Y between any two levels of factor one does depend on the level of factor two. 4.2.1.2 Example: Y = length of a ball-bearings life Factor 1 = heat treatment (high or low) Factor 2 = ring osculation (high or low) Obtain samples of size 2 from each combination (n = ____ ) MSAB = sample interaction variability = 3280.5 MSE = sample error variability = 61 Null hypothesis: (no interaction) difference in average value of _______________ between any two levels of ________________ does not depend on the level of ___________________ Alternative: (interaction) difference in average value of _______________ between any two levels of ________________ does depend on the level of ___________________ Test Statistic: Compares variation of interaction to variation of random data F= MSAB / MSE = Rejection Region: Above ratio is large (F ratio) > F table Two degrees of freedom: numerator d.f. = product of factor 1 and 2 d.f = . denominator = n – number of combination of factor 1 and 2 = Conclusion: we can (not) say that the difference in average value of _____________________ between any two levels of ____________________ does depend on the level of _______________. 4.2.4 One of the two factors – Completely Randomized Design or Independent Sample Study – NO SIGNIFICANT INTERACTION 4.2.4.1 Test Template: same as in the one-factor test but divisor d.f. = n – (#of levels of factor 1)*(# in factor 2) 4.2.4.2 Example: Y = rating of a photographic plate Factor A = 2 levels of development strength, Factor B = 2 levels of development time (10 and 14 minutes) Randomly assign 4 plates to each of the 4 combinations MSA = sample variability of factor A (time) = 1.5625 MSB = sample variability of factor B (strength) = 56.5625 MSE = sample error variability = 2.229 (no interaction was found – testing time effect) Null hypothesis: 1=2 (average value of ______________ is the Same for all ________________) Alternative: at least two are different Test Statistic: MSB/MSE = Rejection Region: Reject Ho if F > F table with Numerator degrees of freedom = ______ and denomination d.f. = _____ F-Table = _______ Conclusion: We can (not) say that the average _________________ differs for at least two ________________________ 4.3 Tests of Assumptions 4.3.1 Equal Variance – 4.3.1.1 Test Template: Null hypothesis: variation of Y is the same for all levels of the factor Alternative: at least two are different Compute the absolute difference between each value in a group and the median of the group Test Statistic and rejection region: same as for the factor tests Conclusion: We can (not) say the variation of Y differs for at least two levels of the factor. 4.3.1.2 Example: Y = tensile strength of a product Factor = 4 Suppliers Obtain samples of size 5 from each supplier (n = ____ ) MSDifference = sample factor variability = 0.59 MSE = sample error variability = 2.2853 Null hypothesis: 1=2=3=4 (variability of __________ is the Same for all ________________) Alternative: at least two are different Test Statistic: MSDiff/MSE = Rejection Region: Reject Ho if F > F table with Numerator degrees of freedom = ______ and denomination d.f. = _____ F-Table = _______ Conclusion: We can (not) say that the variability of _________________ differs for at least two ________________________ 4.3.2 Normality – 4.3.2.1 Test Template: Null hypothesis: distribution of Y is the normal for all levels of the factor Alternative: at least one is not normal Test Statistic and rejection region: use tests on NCSS and p-value is less than alpha reject normality. Conclusion: We can (not) say the distribution of Y is not normal for at least two levels of the factor. 4.3.2.2 Example: Y = tensile strength of a product Factor = 4 Suppliers Obtain samples of size 5 from each supplier (n = ____ ) Assumption Test Skewness Normality of Residuals Kurtosis Normality of Residuals Omnibus Normality of Residuals Prob -Level 0.605780 0.548522 0.731126 Null hypothesis: distribution of __________ is normality distributed for all ________________) Alternative: distribution of __________ is non-normally distributed for at least one level of ________________) Test Statistic: p-value Rejection region: p-value < alpha Conclusion: We can (not) say that the distribution of __________ is nonnormally distributed for at least one level of ________________) 4.4 Testing the difference in means 4.4.1 Expermentwise error versus comparison error 4.4.2 Testing one factor Use NCSS. The output will tell you which means are statistically different Example: Y = tensile strength of a product Factor = 4 Suppliers Obtain samples of size 5 from each supplier Tukey-Kramer Multiple-Comparison Test Response: strength Term A: supplier Alpha=0.050 Error Term=S(A) DF=16 MSE=6.094 Critical Value=4.046122 Different Group Count Mean From Groups 1 5 19.52 2 4 5 21.16 3 5 22.84 2 5 24.26 1 Conclusions: We can say that the average value of (Y) _________ for (factor level) ________ differs from (factor level). <Repeat for each difference> The average (Y) for the other (factor levels) ______________ are not significantly different. 4.4.3 Same procedure works for Randomized Block and Two-factor studies without interaction. 4.5 Nonparametric tests 4.5.1 Kruskal-Wallis test One-factor designs Compares medians instead of means Test similar to ANOVA but does not require normality Using NCSS: p-value < alpha reject equality of medians 4.5.2 Friedman’s Test Randomized block designs Compares medians instead of means Test similar to ANOVA but does not require normality Using NCSS p-value < alpha reject equality of medians 5. NCSS 5.1 data format: place all the values of Y in one column and let the next column(s) be the values of the factor(s). 5.2 Approach 5.2.1 One factor designs Click on Analysis, ANOVA, one-way anova Choose the dependent variable and factor In reports, uncheck EMS report and check Tukey-Kramer Test 5.2.2 Randomized Block Designs Click on Analysis, ANOVA, Analysis of Variance Choose o First, the dependent variable o Second, for factor 1 the block and choose Random from Type-list o Third, for factor 2 the factor of interest, (fixed type) In reports, uncheck EMS report and check Tukey-Kramer Test 5.2.3 Two-factor designs Click on Analysis, ANOVA, Analysis of Variance Choose o First, the dependent variable o Second, factor 1 Type Fixed o Third, factor 2 Type fixed o If interaction exists, tests for two-factor interaction In reports, uncheck EMS report and check Tukey-Kramer Test