Analysis of Variance (ANOVA) - Solutions One-way analysis of variance is a method for comparing several population means, when the data are from independent samples. It can be thought of as a tool for examining the relationship between a quantitative response variable and a categorical explanatory variable. 1 In each part: Explain whether one-way analysis of variance can be used to analyze the relationship, or not. a. An educational researcher compares three different teaching methods. Each method is used by 100 students (300 in all). Scores on a final exam will be used for the comparison. Yes, response is quantitative and we’re comparing groups so explanatory is categorical b. A sociologist looks whether there is a relationship between racial group (Caucasian, AfricanAmerican, etc) and opinion about the death penalty (favor or oppose). No, response is categorical c. Restaurant servers draw a happy face on some bills, write a thank you message on other bills and write nothing on a third batch of bills. Tip percents are compared for the three different “conditions.” Yes, response is quantitative and explanatory is categorical d. A medical researcher examines the relationship between age and blood pressure. No, both variables are quantitative 2 The data are from the 2002 General Social Survey, a federally funded national survey done every other year by the University of Chicago. In the Datasets folder of the agenda page, click the link for the “GSS Dataset” to open Minitab with the data in place. This activity examines the relationship between number of children ever had (children) and answer to a question about how often premarital sex is wrong (premarsx). There are four categories for how often premarital sex is wrong: Always, Almost always, Sometimes, and Never. a. Use Stat>Basic Statistics>Display Descriptive Statistics. Enter the variable children in the Variables box and enter premarsx in the “By Variables” box. Give the sample mean number of children ever had for each category for how often premarital sex is wrong. Means are: Always 2.167 Almost always 1.789 Sometimes 1.637 Never 1.4286 b. Write two or three sentences that describe how the sample means differ. For instance, which group has the greatest number of children (on average), etc. People who think premarital sex is always wrong had the highest mean ideal number of children. People who think premarital sex is never wrong had the lowest mean. Generally, means decreased as the attitude regarding premarital sex lessened in severity. c. In words, write a null hypothesis about the mean number of children for the four response categories of the premarital sex question. . Null: The population means are the same for the four categories. 1 d. Write the null hypothesis given in part c using appropriate statistical notation. H0: μ 1 = μ 2 = μ 3 = μ 4 e. Use Stat>ANOVA>One-way to do a one-way analysis of variance F-test to compare mean number of children for the four categories of the premarital sex question. The “Response” is children and the “Factor” is premaresx. Locate the p-value in the output. What is the p-value? p-value = 0.000 State a conclusion about this situation. We reject the null and conclude that population means are not all the same f. The output gives a graphical display of confidence intervals for the population means in the four categories of how often premarital sex is wrong. Using that display, describe how mean number of children differs for these categories. Generally the means change according to how often premarital sex is wrong. The intervals for “Always” and “Never” do not overall so it’s reasonable to say they’re different. 3 In past class surveys students were asked to rate how much the like Rap music on a scale of 1 to 6, with 1 = hate it and 6 = like it a lot. Students were also asked whether they are from a big city, rural area, small town or suburban area. Following are analysis of variance results comparing mean ratings for the four categories of hometown. Source DF Hometown SS MS 3 31.70 10.57 Error 2107 4810.77 2.28 Total 2110 4842.46 F 4.63 P 0.003 Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev Big_city 220 4.527 1.393 Rural 314 4.070 1.585 Small_town 533 4.128 1.535 (----*----) Suburban 4.180 1.500 (--*---) 1044 - ---+---------+---------+---------+----(-------*-------) (------*-----) ----+---------+---------+---------+----4.00 4.25 4.50 4.75 a. Using appropriate statistical notation, write a null hypothesis for this situation. Ho: u1= u2 = u3 = u4 2 b. What conclusion can be made about the null hypothesis? Justify your answer. With a p-value of 0.003 which is less than 0.05, there exists statistical evidence that not all populations means are equal for the four regions. c. Use the display of confidence intervals (and accompanying sample means) to describe any differences (and similarities) in ratings of Rap for the categories of hometown. Since the intervals overlap for Rural, Small Town and Suburban, we can say that these three regions have a similar mean in “likeness” of rap music. The Big City region, however, indicates that this population of students has a higher appreciation of rap music. d. A multiple comparisons analysis (not shown) includes these two confidence intervals for the difference in mean ratings: Big City – Rural 95% CI for difference in means is 0.20 to 0.72 Small Town – Rural 95% CI for difference in means is −.15 to +0.27 (i) Based on the confidence interval explain whether it is reasonable to conclude that (population) mean ratings differ for students from big cities and rural areas. Yes, since the interval does not contain 0. (ii) Based on the confidence interval explain whether it is reasonable to conclude that (population) mean ratings differ for students from small towns and rural areas. No, since the interval contains 0. 3