1. Which is wider (all else being equal), a 95% or a 90% confidence interval? A 95 percent CI is wider. 2. Students were tested in their statistics knowledge, then given a review sheet, after which they were tested again. The results are below: ID number before review after review 3472 52 58 4381 58 49 44 51 7724 3737 67 66 39 48 3465 5472 51 61 increase 6 -9 7 -1 9 10 (a) If you were hoping that the review made the test scores go up, what would the null and alternative hypotheses be for a hypothesis test? Let µ be the average increase. H0 : µ ≤ 0 H1 : µ > 0 (b) Test this hypothesis. The mean difference is d = 3.667 and the standard deviation is s = 7.312. Of course the sample size is n = 6. Our t statistic is d−0 3.667 − 0 √ = 1.23. = s/sqrtn 7.312/ 6 Looking this up on the T -table with n − 1 = 5 degrees of freedom we get an upper tail area of .1367, which since this is a one sided test is the same as the p-value. Thus we fail to reject H0 with a p-value of 0.1367 and find no evidence of improvement. (c) Give a 95% confidence interval for how much scores went up. Note this should be a 1-sided confidence interval since we want the score increase. s 7.312 µ > d − 2.02 √ = 3.667 − 2.02 √ = −2.36 n 6 Thus we are 95% confident that scores increase at least −2.36 points. 3. Students in a MATH 1343 class can be sorted into two groups, “Fools,” and “Uncools.” The Uncools think they’re smarter than the Fools, while the Fools think they’re hipper than the Uncools. There are 17 Fools and 14 Uncools in the class. (a) The class is given a statistics practice test. Because the math department gives this test every semester they know the standard deviation of the scores is 3.2 points no matter who takes the test. The average score for the Fools is 63.2, while the average for the Uncools was 71.5. If you wanted to set up a hypothesis test to see whether the Uncools are really smarter, what would your null and alternative hypotheses be? H0 : µuncools ≤ µf ools H1 : µuncools > µf ools (b) Test your hypothesis above. (71.5 − 63.2) − 0 q = 7.19 3.22 3.22 + 14 17 We look this up on the normal table, and note that it is off the end of the table. Thus our p-value will be less than 0.0001, so we reject the null hypothesis and conclude that Uncools are smarter than Fools (i.e. their average test scores are higher). (c) Give a 90% confidence interval for the difference in average scores. Note that this is again a one-sided CI. r 3.22 3.22 + = 6.39 (71.5 − 62.2) − 1.65 17 14 We conclude that Uncools score at least 6.39 points higher than fools on this test. (d) A CS 101 class, for a final class project, comes up with a website “coolornot.com” that allows users to rate one another’s coolness on a 0-10 scale, kind of like “hotornot.com” purports to do for looks. In spite of obvious design flaws (no 0 score for CS majors, no 11 score for Fonzie) the Fools in the MATH 1343 class decide to use the site to embarrass the Uncools. If you wanted to help the Fools set up a hypothesis test to test their belief, what would the null and alternative hypotheses be? We let µ denote the average “coolornot.com” score, with a subscript indicating the group. H0 : µf ools − µuncools ≤ 0 H1 : µf ools − µuncools > 0 (e) The average coolornot score for the Fools was 7.4, with a standard deviation of 1.3, while the average coolornot score for the Uncools was 6.2, with a standard deviation of 1.8. Test the hypothesis above. We need 16(1.3)2 + 13(1.8)2 s2p = = 2.38 17 + 14 − 2 Now we have (7.4 − 6.2) − 0 q = 2.16. 2.38 2.38 + 14 17 We look this up on a T -table with 17 + 14 − 2 = 29 degrees of freedom and find a 0.9804 lower tail area, and hence a 0.0196 upper tail area. Our p-value is thus 0.0196 and we reject H0 and conclude that the average coolornot score is higher for Fools than it is for Uncools. (f) The divide in the class has now deepened, despite the efforts of the foolish and uncool professor to mediate, and both sides go looking for support in the larger campus community. The trouble is, the typical student is not very self-aware. A sample of 83 students resulted in 56 of them self-reporting their Fool/Uncool status incorrectly. Give a 90% confidence interval for the percentage of all students who will self-report their Fool/Uncool status incorrectly. We have p̂ = 56/83 = .675 as the proportion of the sample incorrectly selfreporting. To compute a 90% confidence interval we do: r .675(1 − .675) = (.590, .760). .675 ± 1.65 83 Thus somewhere between 59% and 76% of the campus community will self-report incorrectly. (g) Fools accuse Uncools of improperly labeling themselves “cooler than the other side of the pillow,” and hence logically not in the Uncool group, with a different frequency than Fools rate themselves “inteligynt,” and hence logically not in the “Fool” group. Never able to quite grasp that negative number thing they are unwilling to risk looking silly making a more precise accusation than that. If you were to set up null and alternative hypotheses to test this accusation what would the null and alternative hypotheses be? Let ρ denote the population proportion of incorrect self-reporters, with a subscript denoting the group. H0 : ρuncool − ρf ools = 0 H1 : ρuncool − ρf ools 6= 0 (h) A sample of 48 Fools found 17 of them claimed they were “inteligynt” while a sample of 53 Uncools found 22 of them claimed to be “cooler than the other side of the pillow.” Test the hypotheses above. 17+22 Jointly in the sample we have p̂pooled = 48+53 = .386. Separately we have p̂f ool = 17/48 = .354 and p̂uncool = 22/53 = .415. We compute our test statistic (.415 − .354) − 0 q .386(1−.386) 48 + = .63. .386(1−.386) 53 We look this up on the normal table to find a lower tail area of 0.7357. Thus we have an upper tail area of 0.2643. Since this is a two-tailed test, we double this for the p-value of 0.5286. Hence we fail to reject H0 and are unable to detect any difference in incorrect self-reporting between the two populations. (i) Compute a 95% confidence interval for the difference. r (.415 − .354) ± 1.96 .415(1 − .415) .354(1 − .354) + = (−.1288, .2508) 48 53 Not surprisingly, given the outcome of the hypothesis test above, we find a confidence interval that contains zero. 4. A 2009 health and fitness study classified Americans into 4 groups according to their weekly level of exercise. The groups and study results are below. I) Total Couch Potatoes II) Lazy Bums III) People who exercise less than they say they do IV ) People with an unhealthy obsession with exercise I II III 118 142 133 IV 122 Test at the 5% level whether you can find unequal proportions of the population falling into the four groups. Obs Exp O−E (O − E)2 (O − E)2 /E I II III IV 118 142 133 122 128.75 128.75 128.75 128.75 −10.75 13.25 4.25 −6.75 15.5625 175.5625 18.0625 45.5625 0.8976 1.3636 0.1403 −.3539 Summing the last row we get 2.7554. We look this up on a χ2 table with 3 degrees of freedom and find 0.4301. Thus our p-value is 0.4301 and we fail to reject the null hypothesis that the population is evenly split among the four groups. 5. A more detailed analyis in the study elicited information about people’s honesty in reporting the number of days per week they exercised, determining whether they A) exercised less than they claimed to, B) exercised exactly what they claimed to, or C) exercised more than they claimed to. Results are below. Note the totals for the groups are the same as above. I II III IV Total A) 22 34 72 5 133 B) 77 66 31 101 275 C) 19 42 30 16 107 Total 118 142 133 122 515 Is there a difference in honesty between the four groups (test at the 1% level). Expected: I II A) 30.4738 36.6718 B) 63.0097 75.8252 C) 24.5165 29.5029 Observed-Expected: I II A) -8.4738 -2.6718 B) 13.9903 -9.8252 C) -5.5165 12.4971 III IV 34.3476 31.5068 71.0194 64.1456 27.6330 25.3476 III IV 37.6524 -26.5068 -40.0194 35.8544 2.3670 -9.3476 (O − E)2 : I II III IV A) 71.805 7.139 1417.703 702.610 B) 195.728 96.535 1601.552 1285.538 5.603 87.378 C) 30.432 156.178 (O − E)2 /E: I A) 2.356 B) 3.106 C) 1.241 II 0.195 1.273 5.294 III 41.275 21.637 0.203 IV 22.300 20.041 3.316 If we sum these we get 122.237. We will “look this up” on a χ2 table with (R − 1) × (C − 1) = 2 × 3 = 6 degrees of freedom. Obviously this number is way of the charts, so we conclude that the p-value is less than 0.0001. Thus we reject the null hypothesis of no relationship between the groups and their honesty about their exercise habits and conclude that there indeed is a difference. 6. The folow-up study also measured the LDL cholesterol levels of people in the study. Results are summarized below. Note the n for each group is the same as in the above problems. group mean LDL sd LDL I 171 18.1 II 152 15.2 III 133 11.5 IV 102 16.4 Test whether there is a difference in the average LDL level for the four groups. We need the overall mean. Recall that the group totals are 118, 142, 133, and 122. x= 118 · 171 + 142 · 152 + 133 · 133 + 122 · 102 = 145.2424 118 + 142 + 133 + 102 We can now compute the group sum of squares: 118·(171−142.2424)2 +142·(152−142.2424)2 +133·(133−142.2424)2 +102·(102−142.2424)2 = 287651. The error sum of squares will be: 117 · 18.12 + 141 · 15.22 + 132 · 11.52 + 101 · 16.42 = 115529. We can now construct our anova table: Source Sum Sq DF Mean Sq F Group 287651 3 95883.67 424.1139 Error 115529 511 226.08 Total 403180 514 We “look this number up” on the F -table with 3 numerator and 511 denominator degrees of freedom. Obviously the p-value is less than 0.01, so we reject H0 and find that there is indeed a difference in LDL level among the groups. 7. Consider the data x y 4 -5 3 4 7 21 5 14 1 11 The line y = 2x + 1 is the least squares regression line. (a) Compute the fitted values and the residuals. x y ŷ = 2x + 1 = (y − ŷ) 4 -5 9 -14 3 4 7 -3 7 21 15 6 5 14 11 3 1 11 3 8 (b) Compute a 95% confidence interval for m. We’ll need sP r 2i 196 + 9 + 36 + 9 + 64 s= = = 10.231. n−2 3 P The standard error for the slope requires (xi − x)2 so we compute this. Note that x = 4. x (xi − 4) (xi − 4)2 4 0 0 3 -1 1 3 9 7 5 1 1 -3 9 1 sum: 20 We can now compute the standard error for the slope to be s pP = 2.288. (xi − x)2 Thus a 90% confidence interval for the slope will be (using the T -table with 4 degrees of freedom): 2 ± 2.78 · 2.288 = (−4.36, 8.36). (c) Based on your interval above, would you reject or fail to reject the null hypothesis H0 : m = 0? Since 0 is in the confidence interval, we would not reject H0 . (d) Predict y for an x-value of 6. 2 · 6 + 1 = 13 (e) Compute a 90% confidence interval for your prediction. r 13 ± 2.14 · 10.231 1 + 1 (6 − 4)2 + = (−12.91, 38.91) 5 20