Simple Comparative Experiments – Introduction to ANOVA Read Sections 2.1, 3.1 – 3.3 in the text Note: These notes were modified from lecture notes created by Tisha Hooks and Christopher Malone. Example: Let’s consider an experiment that was conducted to study the bond strength of a Portland cement mortar. The data points given below were sampled from the Tension Bond Strength experiment discussed on pages 25 – 26 in your text. Enter these points in Excel as shown below: Also, I’ve included the data points on the number line below. Questions: 1. Does the modified formula seem to improve tension bond strength? Discuss. 2. What is the factor under consideration? What is the response under consideration? 3. Compute the average for each group and sketch the averages on the plot above. Average for modified: ________________ Average for unmodified: ________________ 4. Does there appear to be a considerable difference in tension bond strength between the two groups in this experiment? Discuss. 1 Measuring the effect of a factor Ultimately, we are interested in determining whether the factor in the experiment has an effect on the response variable. For the above example in particular, we’re interested in whether the cement mortar is modified or unmodified has any impact on the bond tension strength. To investigate whether the type of mortar has an impact on tension bond strength, let’s first look at the scenario assuming group (type of cement mortar) has _____ effect on tension bond strength. If that is the case, we would not expect there to be difference between the two groups; that is we could ignore group all together. This scenario is depicted by the number line below. Questions: 5. Calculate the average of these points and plot it on the number line above. Note, this is our best guess for the tension bond strength under the assumption group has no effect. Average of all observations: _________________ In statistics we use the concept of ________________ to determine which factor(s) are significant. For example, if the amount of error is __________________ significantly when we consider the groups, then we can say the groups are important and that the factor under consideration has some effect on the response variable. On the other hand, if the amount of error is not reduced much by considering the factor, then the factor is said to be statistically ___important. 6. Sketch the amount of “error” for each observation on the above number line (ignoring groups). Error observation #1: _________________________________ = __________________ Error observation #2: _________________________________ = __________________ Error Observation #3: _________________________________ = __________________ Error Observation #4: _________________________________ = __________________ 2 7. What is the total amount of “error”? 8. What problems exist with computing the total amount of “error” in this manner? 9. How might we overcome this problem? Computing the total amount of error assuming group has NO effect 10. The total amount of error assuming groups has NO effect = ____________________________. Computing the total amount of error assuming group HAS an effect Next, let’s suppose that factor __________ have an effect on tension bond strength. In this case, we WOULD expect there to be differences between the two groups; that is, we should ______ ignore group! 3 Fill in the following information ASSUMING that factor IS important. 11. What is the total amount of error assuming groups HAVE an effect? 12. What do we gain by considering group? That is, how much is the total amount of error reduced after we account for the group effect? Difference: ________________________________ Recall, that if the amount of error is reduced significantly when we consider the groups, then we say the groups are important and the factor under consideration has a significant effect. On the other hand, if the amount of error is not reduced much by considering the factor, then the factor is said to be statistically unimportant. 13. Consider the difference found in Question 12 above. Is this large enough to say that group (type of cement mortar) has a significant statistical effect? To answer this question, we will use the statistical procedure known as Analysis of Variance (ANOVA). We’ll now take a look at the ANOVA procedure. 4 Analysis of Variance (ANOVA) The analysis of variance procedure is derived from partitioning the total variability into two components. Suppose that _____ is the jth observation taken under factor level i. In general, we have _____ factor levels and _____ observations under the ith treatment. Label the observations from the cement mortar example using this notation. Partitioning the Sums of Squares Though we haven’t used the terminology or seen the formulas given below, we have already calculated each of these ____________________________ for the cement mortar example. Total Sum of Squares o SST = o This was calculated based on the assumption groups had _____ effect. Error Sum of Squares o SSE = o This was calculated based on the assumption groups ______ an effect. Treatment Sum of Squares o SSTrt = o This represents the _________________ in our measure of experimental error after accounting for groups. Note that ________________________________. That is, we have partitioned the _______________ Sum of Squares into two parts: that variation _____________ factor level means (between treatments) and the variation ____________________ treatments, i.e. due to experimental error. 5 Degrees of Freedom The degrees of freedom (df) associated with each sum of squares can be regarded as the number of ______________ elements in the sum of squares. The df for each sum of squares is given below. Total Sum of Squares o o df = Error Sum of Squares o o df = Treatment Sum of Squares o o df = Mean Squares These are obtained by dividing the sum of squares by its associated degrees of freedom. Therefore, we get: Treatment Mean Square o MSTrt = Mean Square Error o MSE = F-Statistic F= 6 Questions: 14. What does it mean if the F-statistic is large? 15. What does it mean if the F-statistic is small, say close to 1? p-value Recall, if the p-value is less than some predetermined error rate, usually ______, then the data is said to support the alternative hypothesis (i.e. there is statistically significant evidence for the research question). H0: Ha: Carrying out the ANOVA in Minitab Enter the data into a new worksheet as follows. If you want to construct a dotplot of the data, select Graph Dotplot… To make a dotplot with groups, choose the following, and then enter the information as shown. 7 Click OK and you should get the following dotplot. To obtain the ANOVA, choose Stat ANOVA General Linear Model. Next, enter the information as shown below. After clicking OK, you should get the following output. Questions: 16. Find the sums of squares in the output. a. SST = b. SSE = c. SSTrt = 8 17. Find the degrees of freedom in the output. a. df Total = b. df Error = c. df Treatment = 18. Find the mean squares in the output. a. MSE = b. MSTrt = 19. Find the F-statistic in the output. 20. Find the p-value in the output. 21. What do we conclude from this study? We’ve just been using a sample of the data collected from this study. Let’s take a look at the analysis using the complete data set. The data can be found on the course website in the file cement_mortar.mpj. Questions: 22. Create a dotplot of the data by group. Looking at the dotplot, does there appear to be a difference in tension bond strength based upon the type of cement mortar used? Explain your reasoning. 23. Carry out the ANOVA to determine if there is an effect of cement mortar on tension bond strength. 9