Chapter 12 Section 1 Homework Set B 12.23 The importance of recreational sports to college satisfaction. The National IntramuralRecreational Sports Association (NIRSA) performed a survey to look at the value of recreational sports on college campuses.6 One of the questions asked each student to rate the importance of recreational sports to college satisfaction and success. Responses were on a 10-point scale with 1 indicating total lack of importance and 10 indicating very high importance. The following table summarizes these results: Class Freshman Sophomore Junior Senior n 724 536 593 437 x-bar 7.6 7.6 7.5 7.3 (a) To compare the mean scores across classes, what are the degrees of freedom for the ANOVA F statistic? Numerator d.f. = 4 – 1 = 3 Denominator d.f. = 724 + 536 + 593 + 437 – 4 = 2286 (b) The MSG = 11.806. If Sp = 2.16, what is the F statistic? F= 11.806 = 2.53 Recall that Sp2 = MSE. 2.16 2 (c) Give the P-value by using Excel ,Fdist(f, df num, df denom), or the F-calculator found at http://www.stat.tamu.edu/~west/applets/fdemo.html . What do you conclude? Fdist(2.53, 3, 2286) P(F > 2.53) = 0.05556 This suggests that we should consider the fact that at least one of the classes scored differently. Mainly it looks like the seniors have a different mean. Notice that the difference in means is tiny, and the pooled standard deviation suggest overlap between the data values. But the sample sizes are very large, and thus any small deviation from the perfect (no difference in means) can be detected. However, I have a feeling something is wrong here. I calculated x-bar (the mean of all the data to be equal 7.51, and calculating MSG I get 9.810, which produces a p-value of 0.0978. 12.45 How long should an infant be breast-fed? Recommendations regarding how long infants in developing countries should be breast-fed are controversial. If the nutritional quality of the breast milk is inadequate because the mothers are malnourished, then there is risk of inadequate nutrition for the infant. On the other hand, the introduction of other foods carries the risk of infection from contamination. Further complicating the situation is the fact that companies that produce infant formulas and other foods benefit when these foods are consumed by large numbers of customers. One question related to this controversy concerns it amount of energy intake for infants who have other foods introduced into the diet at different ages. Part of one study compared the energy intakes, measured in kilocalories per day (kcal/d) for infants who were breast-fed exclusively for 4, 5, or 6 months.'6 Here are the data: (a) Make a table use data already typed into Excel, giving the sample size, mean and standard deviation (=qrt(variance) ) for each group of infants. Is it reasonable to pool the variances? Write down the table or use the copy and paste feature of our computer BF4 499 620 469 485 660 588 675 517 649 209 404 738 628 609 617 704 558 653 548 BF5 490 395 402 177 475 617 616 587 528 518 370 431 518 639 368 538 519 506 SUMMARY Groups Count Sum Average Variance s BF4 19 10830 570 15118.55556 122.9575 BF5 18 8694 483 12757.29412 112.9482 BF6 8 4335 541.875 8828.982143 93.96266 We do meet the rule that 2(93.96) > 122.95 thus it is not unreasonable to assume equal standard deviations, . BF6 585 647 477 445 485 703 528 465 (b) Make a Normal quantile plot (Use CrunchIt!, upload data using file 12_45 at spot) for the data in each of the four treatment groups. Summarize the information in the plots and draw a conclusion regarding the Normality of these data. Make a copy of the plots. Normal Quanilte Plot BF5 700 600 500 data 400 300 200 100 0 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 EXpected Z -score The plot of BF6 is closest to being perfectly straight, indicating that the population that we are sampling from is close to a normal distribution. The other two are also close to straight except for the endpoints to the left. Since there is no pattern before the dip occurs that suggest a serious move away from a straight line, we will say that there is no strong evidence that the distributions we are sampling from are not close to normal. (c) Make a dotplot of the data using Excel as done in class. Does the dotplot indicate that the means of all the groups are equal or does it indicate that at least one of the means is not equal? Notice that the dot plots clearly show those 800.00 700.00 Two extreme points from BF4 and BF5. A 600.00 researcher would look at those two data 500.00 BF4 BF5 values more carefully and try to understand 400.00 BF6 why they are so much lower than the rest of 300.00 the data points. I see that the means are very close together and that the spread of 200.00 each are close to being equal. The amount of 100.00 0 0.5 1 1.5 2 2.5 3 3.5 4 overlap in the dot plots along with the proximity of the means, and relatively small sample sizes, would indicate to me that the p-value will be high leading to no evidence that the population means are not the same. (d) Run the analysis of variance using Excel. Report the F statistic with its degrees of freedom and Pvalue. What do you conclude? Copy the table below. ANOVA Source of Variation Between Groups Within Groups Total SS 71288.325 550810.875 622099.2 df MS F P-value F crit 2 35644.16 2.717910798 0.077625 3.219938 42 13114.54 44 The p-value of 0.0776 shows that my evidence is not strong against the null hypothesis. It would be interesting to see what would happen if I removed those two extreme points and ran the test again. 1. Do we experience emotions differently? Do people from different cultures experience emotions differently? One study designed to examine this question collected data from 416 college students from five different cultures.9 The participants were asked to record, on a 1 (never) to 7 (always) scale, how much of the time they typically felt Culture European American Asian American Japanese Indian Hispanic American n 16 33 91 160 80 Mean 4.39 4.35 4.72 4.34 5.04 s 1.03 1.18 1.13 1.26 1.16 eight specific emotions. These were averaged to produce the global emotion score for each participant. Here is a summary of this measure: (a) Is it reasonable to used a pooled standard deviation for these data? Why or why not? Yes, since 2(1.03) > 1.26. (b) Draw a rough sketch denoting the location of the mean for each group and use the value of the sample standard deviation of each group to indicate how spread the data is. I used roughly three standard deviations away from each sample mean. (c) From the information given (and your sketch in (b) allowing you visualize the information), do you think that we need to be concerned that a possible lack of Normality in the data will invalidate the conclusions that we might draw using ANOVA to analyze the data? Give reasons for your answer. The data is not normal because the measurements are discrete (the only possible numbers are the integers 1 through 7) similar to the chapter 8 situation when dealing with proportions. Also the means hover around 4 and the standard deviations around one, so if you use three standard deviations away from the mean to encapsulate 99.7% of the data (about) you reach the ends of the possible values in our measurements. Thus, you hope the sample size is large enough to overcome the measurement type. The sample sizes of 16 and 33 are the more worrisome of the five. (d) Fill out the table given below. Sketch a picture of the F distribution (using CrunchIt!) that illustrates the P-value. What do you conclude? Show your work. How to calculate the sample mean of entire data set regardless of group. x = n1 x1 n 2 x 2 n I x I n1 n 2 n I d.f. SS MS F P Group 4 30.25 7.56 5.31 0.000361 Error 375 534.12 1.42 16(4.39) 33(4.35) 91(4.72) 160(4.34) 80(5.04) = 4.58 16 33 91 160 80 SSG = 16(4.39 – 4.58)2 + 33(4.35 – 4.58)2 + 91(4.72 – 4.58)2 + 160(4.35 – 4.58)2 + 80(5.04 – 4.58)2 = 30.25 SSE = 15(1.03)2 + 32(1.18)2 + 90(4.72)2 +159(1.26)2 +79(1.16)2 = 534.12 16+ 33 + 91 + 160 + 80 = 380 P(F > 5.31) = 0.000361 The result says that at least one of the means is different. (e) Without doing any additional formal analysis, describe the pattern in the means that appears to be responsible for your conclusion in part (d). Are there pairs, of means that are quite similar? The Hispanic American group has the largest mean and the second is the Japanese group. 2. If a supermarket product is offered at a reduced price frequently, do customers expect the price of the product to be lower in the future? This question was examined by researchers in a study conducted on students enrolled in an introductory management course at a large Midwestern University. For 10 weeks subjects received information about the products. The treatment conditions corresponded to the number of promotions (1, 2, 3, or 4) that were described during this 10-week period. Students were randomly assigned to four groups. Below are three possible outcomes of this study. Which one do you think produces the smallest p-value and why? Column n Mean Column n Column n Mean 1 40 4.224 1 20 4.1405 2 40 4.06275 2 20 4.027 3 40 3.759 3 20 3.828 4 40 3.54875 4 20 3.583 Mean 1 7 4.257143 2 7 4.04 3 7 3.7042856 4 7 3.602857 The group with the largest sample size should produce the smallest p-value. Why? Notice that the means of each group from the three situations are about the same, thus, SSG for each group is about the same. But SSE, is created by the standard deviation of each group (s) which is about the same in the same situations, but, the sample size change produces a different degrees of freedom. 3. Is there a relationship between the amount of time a battery lasts measured in minutes and the battery manufacturer? Four different manufacturers of batteries were tested under the same conditions. Culture Manufacturer 1 Manufacturer 2 Manufacturer 3 Manufacturer 4 (a) Is it reasonable to used a pooled standard deviation for these data? Why or why not? n 10 10 10 10 Mean 265.31 277.2 268.2 275.03 s 5.32 4.18 5.13 4.26 (b) Fill out the table given below. Sketch a picture of the F distribution (using CrunchIt!) that illustrates the P-value. What do you conclude? Show your work. How to calculate the sample mean of entire data set regardless of group. x = d.f. SS MS F n1 x1 n 2 x 2 n I x I n1 n 2 n I P Group Error 10(265.31) 10(277.2) 10(268.2) 10(275.03) = 4.58 40 4. Is there a relationship between the amount of time a battery lasts measured in minutes and the battery manufacturer? Four different manufacturers of batteries were tested under the same conditions. Culture Manufacturer 1 Manufacturer 2 Manufacturer 3 Manufacturer 4 n 100 100 100 100 Mean 265.31 277.2 268.2 275.03 s 5.32 4.18 5.13 4.26 (a) Is it reasonable to used a pooled standard deviation for these data? Why or why not? (b) Fill out the table given below. Sketch a picture of the F distribution (using CrunchIt!) that illustrates the P-value. What do you conclude? Show your work. How to calculate the sample mean of entire data set regardless of group. x = d.f. Group Error SS MS F n1 x1 n 2 x 2 n I x I n1 n 2 n I P