Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–Share Alike 3.0 License: http://creativecommons.org/licenses/by-nc-sa/3.0/ We have reviewed this material in accordance with U.S. Copyright Law and have tried to maximize your ability to use, share, and adapt it. The citation key on the following slide provides information about how you may share and adapt this material. Copyright holders of content included in this material should contact open.michigan@umich.edu with any questions, corrections, or clarification regarding the use of content. For more information about how to cite these materials visit http://open.umich.edu/education/about/terms-of-use. Any medical information in this material is intended to inform and educate and is not a tool for self-diagnosis or a replacement for medical evaluation, advice, diagnosis or treatment by a healthcare professional. Please speak to your physician if you have questions about your medical condition. Viewer discretion is advised: Some medical content is graphic and may not be suitable for all viewers. Some material may be sourced from: Mind on Statistics Utts/Heckard, 3rd Edition, Duxbury, 2006 Text Only: ISBN 0495667161 Bundled version: ISBN 1111978301 Material from this publication used with permission. Attribution Key for more information see: http://open.umich.edu/wiki/AttributionPolicy Use + Share + Adapt { Content the copyright holder, author, or law permits you to use, share and adapt. } Public Domain – Government: Works that are produced by the U.S. Government. (17 USC § 105) Public Domain – Expired: Works that are no longer protected due to an expired copyright term. Public Domain – Self Dedicated: Works that a copyright holder has dedicated to the public domain. Creative Commons – Zero Waiver Creative Commons – Attribution License Creative Commons – Attribution Share Alike License Creative Commons – Attribution Noncommercial License Creative Commons – Attribution Noncommercial Share Alike License GNU – Free Documentation License Make Your Own Assessment { Content Open.Michigan believes can be used, shared, and adapted because it is ineligible for copyright. } Public Domain – Ineligible: Works that are ineligible for copyright protection in the U.S. (17 USC § 102(b)) *laws in your jurisdiction may differ { Content Open.Michigan has used under a Fair Use determination. } Fair Use: Use of works that is determined to be Fair consistent with the U.S. Copyright Act. (17 USC § 107) *laws in your jurisdiction may differ Our determination DOES NOT mean that all uses of this 3rd-party content are Fair Uses and we DO NOT guarantee that your use of the content is Fair. To use this content you should do your own independent analysis to determine whether or not your use will be Fair. Module 8: One-Way Analysis of Variance (ANOVA) Objective: In this module you will perform a one-way Analysis of Variance, often abbreviated ANOVA. We have already seen that the two independent samples t test can be used to compare the means of two populations (when the samples are independent). What if we want to compare the means of three or more populations? We turn to a technique called Analysis of Variance (ANOVA). You can think of ANOVA as sort of an extension of the two independent sample pooled t-test since it can compare several population means and requires the assumption that the populations have equal variances. Overview: Analysis of Variance (ANOVA) is a statistical tool for analyzing how the mean value of a quantitative response (or dependent) variable is affected by one or more categorical variables, known as treatment variables or factors. We base our conclusions regarding the equality of the population means on an F test that ANOVA produces. For example, we might administer a new antibiotic drug to a random sample of people. The response variable is white blood cell count and the grouping variable or factor could be age with levels 1 = 0 to 19 years; 2 = 20 to 29 years; 3= 31 to 40 years; and 4 = 40 years and older. We would then use the ANOVA method to see if the mean white blood cell count for the four age group populations are all the same. In this example the number of populations under study is k = 4. Another example is in a study of painkillers for relief of headache pain. The response variable might be the time to relief and the factor or treatment variable might be the type of painkiller. The different levels of the type of painkiller might be a new drug, a standard drug, and a placebo. In this example the number of populations under study, or treatment groups, is k = 3. Several assumptions are made in ANOVA. The response for each population is assumed to be normally distributed with equal variance across the populations. The data are assumed to consist of independent random samples. The analysis of variance involves decomposing the Total variation of the responses into two parts: (1) that due to the variation among sample means (Between Groups variation), and (2) that due to natural variation within groups (variation due to Error). SS Total = SS Groups + SS Error If the sum of squares between groups (SS Groups) is large relative to the sum of squares within groups (SS Error), it implies that the model of different treatment means explains a significant portion of the observed variability. In this case, the null hypothesis H0: 1 = 2 = … = k (that the population means are equal) might then be rejected, in favor of the alternative hypothesis Ha: at least one of the population means i is different. In order to determine what is "large" (for SS Groups relative to SS Error), the sum of squares values are divided by their respective degrees of freedom, and the resulting mean square terms are used to calculate an F-statistic. The degrees of freedom for SS Groups are the number of treatment groups, k, minus one (k - 1); for SS Error they are the total sample size, N, minus the number of treatment groups (N -k). MS Groups = SS Groups/(k – 1) MS Error = SS Error/(N – k) 78 The ratio of these two mean squares forms the F-statistic with numerator degrees of freedom (k - 1) and denominator degrees of freedom (N -k). F Variation among sample means MS Groups Natural variation within groups MSE We can view this F-statistic as the ratio of two estimators of the common population variance, 2: the denominator (MSE) is a good (unbiased) estimator, while the numerator (MS Groups) is only good when H0 is true and otherwise tends to overestimate 2. Thus, large F values are evidence against the null hypothesis of equal population means. If at least one of the population means appears to be different, then we can turn to a multiple comparisons procedure for learning which population mean(s) appear to be different and how they differ. The most common set multiple comparisons that are analyzed is the set of all pairwise comparisons. Either of two equivalent techniques can be used for each pair of means: perform a test to see if the two population means are significantly different; or construct a confidence interval for the difference in population means and see whether the value of 0 is in the interval or not. Several multiple comparisons procedures are available that control for the overall type I error rate (overall significance level) or the overall confidence level. One such procedure is called Tukey’s procedure, which is one of the options available in SPSS. Formula Card: Activity: Is there a Difference among the Mean Freshman GPAs for three different socioeconomic classes? Background: Sociologists often conduct experiments to investigate the relationship between socioeconomic status and college performance. Socioeconomic status is generally partitioned into three groups: lower class, middle class and upper class. Consider the problem of comparing the mean grade point average of college freshmen across the three socioeconomic populations. The grade point averages (GPA) for random samples of seven college freshmen from each of the three socioeconomic classes (socclass) were selected from a university’s files at the end of the first academic year. The data are in the GPA.sav data set (Source: Mendenhall and Sincich, 1996, page 589). Do the data provide sufficient evidence to indicate a difference among the mean freshmen GPAs for the three different socioeconomic classes? If so, which groups appear to be significantly different and how do they differ? 79 Task: Perform and interpret an analysis of variance using the GPA data set. Recall: Write out the Five Steps for conducting a test of hypotheses (Reference page 51). 1. 2. 3. 4. 5. Before conducting any test, here are a set of questions to ask yourself: How many populations are there? One Two More than two How many variables are there? One Two What is the response variable? What type of variable is the response? Categorical Quantitative What is the explanatory variable (if applicable)? What type of variable is the explanatory variable (if applicable)? Categorical Quantitative What type of parameter would be useful for summarizing this response, considering the explanatory variable (if any)? Proportion Mean Other (see Supplement 3) Based on the answers to these questions, you should be able to identify the appropriate inference procedure. You may refer back to Supplement 3 – Name that Scenario for assistance. The appropriate inference procedure for this scenario is ______________________________ and the value of k for this problem is ___________________ . 1. State the hypotheses: H0: ___________________ Ha: _______________________ where _____ represents Your parameter definition should always be a statement about the population(s) under study. 80 2. Assumption Checks and Computing the Test Statistic: Assumptions: a. For this scenario, we need to assume that the k samples are ________________ from each other. b. We need to assume that each sample is a ___________ sample. To check this assumption, we would make a __________ plot (if there was time order) for each sample and look for ________________________________. c. Each sample needs to come from a normally distributed _________________ . To check this assumption, we would make a _______ plot for each __________. d. Finally, for ANOVA, we need to assume all k populations have equal ________________. Checking equal population variances: There are three ways to check the assumption of equal population variances. o Examine the sample standard deviations. If they are similar, then the assumption is valid. (This is because variance is standard deviation squared). o Examine side-by-side boxplots of the sample data. If the IQRs are similar, then the assumption is valid. o Use Levene’s test. If the Levene’s test p-value is greater than 0.05 (or the specified significance level), the assumption of equal population variances appears to hold. e. Do the Assumptions Appear Valid? Comment on each assumption below, using graphs and output when appropriate. Are the three samples independent? Are the samples random samples? Note there is no time order for this data. If there was time order, since you need EACH sample to be a random sample, how many time plots would you need to make to check this assumption? ______ time plot(s) Construct the Q-Q plots to check the assumption about normally distributed populations. Recall that if you need to split a data file the command is: Data> Split File Does it appear that the assumption that each sample comes from a normally distributed population is met? Why? Note: The equal population variances assumption will be considered after the ANOVA output is generated next. 81 Test-statistic: e. Generate the ANOVA output. Use Analyze> Compare Means> One-Way ANOVA. Under Options, select the Descriptive (gives you sample means and standard deviations) and the Homogeneity of variance test (this is the Levene’s test) options. Use this and any additional output you feel is appropriate to answer the following questions. o Choose a way to determine if the assumption of equal population variances is valid. Check the assumption and comment. o The assumption of equal population variances appears to be Explain. o Obtain an estimate of the common population standard deviation for the response. valid not valid. The notation for the common population standard deviation is ______. This value can be obtained by computing __________ , and for this problem it is equal to ___________ f. What is the notation for and value of the test statistic? ________ = ____________ g. What is the distribution of the test statistic if the null hypothesis is true? This is the same as asking what model you use to find the p-value. 3. Calculate the p-value: a. What is the SPSS reported p-value? _____________. b. Draw a picture of the p-value. Use the “pval()” function in R to check your work. 4. Decision: What is your decision at a 5% significance level? Reject H0 Fail to reject H0 Remember: Reject H0 Fail to reject H0 Results statistically significant Results not statistically significant 5. Conclusion: What is your conclusion in context of the problem? Conclusions should not be too strong -- i.e. say you have sufficient evidence or equivalent, do NOT say we have proven. 82 Conclusions should always include a reference to the population parameter of interest. 6. Follow-up Analyses: ANOVA assesses whether there appears to be a difference between two or more of groups. A multiple comparison test can tell us which groups appear to be different and by how much those groups differ. Multiple comparison tests are a group of tests that follow after an ANOVA, but only if significant differences have been found. It would appear that they could be used on their own but because they are not as powerful as ANOVA, they can occasionally fail to find differences when the ANOVA F test would succeed. a. Obtain the multiple comparisons output. You can request multiple comparisons by clicking on the Post Hoc … button in the dialog box under the One-Way ANOVA command. Choose Tukey from the list. The default significance level is 0.05. Click on continue and then on Ok. The multiple comparisons output contains p-values and confidence intervals for every possible pairwise comparison of groups to indicate where the differences are. The p-values that are equal to or smaller than 0.05 or the confidence intervals that do NOT contain 0 indicate a difference between those two population means. b. Summarize the findings about the differences in population means for the GPAs of freshmen in the different socioeconomic classes. Which pairs are significantly different? c. Calculate a 95% confidence interval for the mean GPA for the middle income group, where the sample mean based on the 7 subjects involved in the group was 3.25. 83 Check Your Understanding: Circle the appropriate words and fill in the blank line to complete the following sentences. The p-value of 0.025 from this activity implies that if this study were repeated many times, we would see an F test statistic of 4.579 or greater less in about ____________% of repetitions if the population means were really all not equal. equal ANOVA procedures can be thought of as an extension of the two independent sample pooled unpooled t-test and hence requires the assumption of equal population sample variances. One way to check this assumption is to use Levene’s test and see if the p-value is greater than less than or equal to 0.10 (or any reasonable significance level). Think about it… For the p-value of an ANOVA test, would there be a situation in which we would need to divide the SPSS output p-value by 2? Why or why not? 84 Example Exam Question on ANOVA A study was conducted to compare the effects of two different therapy treatments and a control condition on weight gain in anorexic girls. Group 1 was the control condition subjects that received no intervention, Group 2 subjects received a cognitive-behavioral treatment condition, and Group 3 subjects received a family therapy condition. The response was weight gain over a fixed time period. a. The ANOVA output provided below is used to test a set of hypotheses. ANOVA Gain in Weight Between Groups Within Groups Total i. Sum of Squares 601.916 3331.037 3932.953 df 2 60 62 Mean Square 300.958 55.517 F 5.421 Sig. .007 State the null and alternative hypotheses. H0: ___________________________________________________________ Ha: ____________________________________________________________ ii. The p-value for this test is reported as 0.007. Draw a sketch of the appropriate distribution showing how the p-value was determined for this ANOVA study. Provide all details. Multiple Comparisons Dependent Variable: Gain in Weight Tukey HSD b. Multiple comparisons were performed on the weight gain data (using Tukey’s method). Use the results to circle all pairs that are significantly different (using a 5% significance level). control versus cognitive behavior (I) Condition Control Cog Behav Family control versus family therapy (J) Condition Cog Behav Family Control Family Control Cog Behav Mean Difference (I-J) -3.65 -8.29 3.65 -4.64 8.29 4.64 95% Confidence Interval Lower Bound Upper Bound -8.77 1.48 -14.36 -2.22 -1.48 8.77 -10.58 1.30 2.22 14.36 -1.30 10.58 cognitive behavior versus family therapy c. Calculate a 95% confidence interval for the mean weight gain for the family therapy group, where the sample mean based on the 23 subjects involved in group 3 was 7.4 pounds. Final answer: ____________________________________ 85