Stat 250.3 November 19, 2003 HOMEWORK 6– SOLUTIONS 12.19 a. s.e. ( x ) s 3.7 0.57 kg. Over all possible samples of n = 42 from this population, the average difference n 42 between the sample mean and the population mean is about 0.57 kg. b. The interval is about 6.06 to 8.34 kg, computed as 7.2 2 0.57 kg. Calculate the interval as Sample estimate 2 Standard error, which here is x 2 s . n x 7.2 kg and the standard error of the mean was found in part (a). Interpretation: With approximate 95% confidence, we can say that in the population of men represented by this sample the mean weight loss for men if they used the diet plan would be between 6.06 kg and 8.34 kg. 12.22 a. Approximate 95% confidence interval is .264 to .416, computed as .34 2 .038. Parameter is p1 p 2 = difference in proportions of men (population 1) and women (population 2) who have driven after having too much to drink Calculate interval as Sample estimate 2 Standard error, which is pˆ 1 pˆ 2 2 s.e. ( pˆ 1 pˆ 2 ) Sample estimate = pˆ 1 pˆ 2 .63 .29 .34 Standard error = s.e. ( pˆ 1 pˆ 2 ) pˆ 1 (1 pˆ 1 ) pˆ 2 (1 pˆ 2 ) .63(1 .63) .29 (1 .29 ) .038 n1 n2 300 300 b. With 95% confidence, we can say the interval .264 to .416 covers the difference between the proportions of men and women in the population who would say they have driven after having too much to drink. The sample proportion for men was higher, so we estimate (with 95% confidence) the proportion for men in the population is somewhere between .264 and .416 above the proportion for women. 12.26 a. 2.52 b. 2.13 c. 4.60 d. 2.06 e. 1.99 12.49 a. Confidence interval is about .032 to .338, computed as .185 (2)(.0763). If z * =1.96 is used for the multiplier, the answer is about .035 to .335. Compute interval as Sample estimate Multiplier Standard error: Sample estimate is pˆ 1 pˆ 2 .611 .426 .185 Standard error is s.e.( p1 p 2 ) = pˆ 1 (1 pˆ 1 ) pˆ 2 (1 pˆ 2 ) .611(1 .611) .426 (1 .426 ) .0763 n1 n2 131 61 Multiplier is z * 2 (or 1.96 would be more exact) Interpretation: With 95% confidence, we can say that in the population(s) represented by the sample(s), the difference in proportions who would say yes to this question is between about .032 and .338. The proportion is higher for women. b. This answer will vary. The issue concerns whether, for this question, students in a statistics class at Penn State represent a population of only Penn State students or whether they represent a broader population of college students in the United States. c. The interval computed in part (a) does not include the value 0. This tells us that it is reasonable to conclude that in the population represented by the sample, there is a difference between the proportion of women who would say yes to this question and the proportion of men who would say “yes” to this question. A higher proportion of women than men would say that they would date someone with a great personality even if they did not find them attractive. 12.50 a. Confidence interval is about .112 to .382, computed as .185 (2.576)(.0763). Multiplier for 0.99 confidence level is z * = 2.576. This can be found the table at the bottom of page 424 or in the last row of Table A.2. The sample estimate of the difference and the standard error of the difference were determined in part (a) of the previous exercise. b. The 99% confidence interval is wider than the 95% confidence interval, In general, increasing the confidence level gives a wider interval. c. A 90% confidence interval will be narrower than the 99% confidence interval. The greater the confidence level, the wider the interval. 12.74 First, create a two-way table for the two variables to determine relevant counts (using the cross tabulation procedure in 163 .516 . Of n2 = 373 Democrats, the Minitab). Of n1 = 316 Republicans, the number owning a gun is 163 so pˆ 1 316 131 .351 The 95% confidence interval for p1 p 2 is about .09 to .24 (note number owning a gun is 131 so pˆ 2 373 that it does not cover 0 so is evidence of difference in the population). . Minitab Output for Exercise 12.74 Sample 1 2 X 163 131 N 316 373 Sample p 0.515823 0.351206 Estimate for p(1) - p(2): 0.164616 95% CI for p(1) - p(2): (0.0912489, 0.237984) 12.75 . a. The 98% confidence interval given by Minitab is 183.11 to 203.16. Output for Exercise 12.75a Variable control N 30 Mean 193.13 StDev 22.30 SE Mean 98.0% CI 4.07 ( 183.11, 203.16) Interpretation: With 98% confidence, we can say that in the population of individuals who have not had a heart attack, the mean cholesterol level is between 183.11 and 203.16. Assumption and necessary conditions: The sample represents a random sample from a larger population of individuals who have not had a heart attack. The data are roughly symmetric and there are no outliers. A comparative dotplot is shown here because part (b) asks about the patients who have had a heart attack. Figure for Exercise 12.75 parts a and b b. The 98% confidence interval given by Minitab is 231.63 to 276.22. . Output for Exercise 12.75b Variable 2-Day N 28 Mean 253.93 StDev 47.71 SE Mean 98.0% CI 9.02 ( 231.63, 276.22) Interpretation: With 98% confidence, we can say that in the population of individuals who have had a heart attack, the mean cholesterol level two days after the attack is between 231.63 and 276.22. Assumption and necessary conditions: The sample represents a random sample from a larger population of individuals who have had a heart attack. The data are roughly symmetric and there are no outliers. c. A 98% confidence interval (unpooled procedure) for the difference in population means is given by Minitab as36.74 to 84.85. Output for Exercise 12.75c Two-sample T for 2-Day vs control N Mean StDev SE Mean 2-Day 28 253.9 47.7 9.0 control 30 193.1 22.3 4.1 Difference = mu 2-Day - mu control Estimate for difference: 60.80 98% CI for difference: (36.74, 84.85) ……… DF = 37 Interpretation: With 98% confidence we can say that in the population of people who have suffered a heart attack, the mean cholesterol (measured two days after the attack) is between 36.74 and 84.85 points higher than the mean cholesterol in a population of people who have not had a heart attack. Assumption and necessary conditions: See parts (a) and (b). Stat 250.3 November 19, 2003 13.5 Figure for Exercise 13.5a a. The p-value is 0.028. df = n – 1 = 28 – 1 = 27. In Table A.3, one-sided p-value under t = 2.0 is given as 0.028 (in the df = 27 row). Figure for Exercise 13.5b b. The p-value is 0.972. It is the area to the right of t = 2.0 in a tdistribution with df = 27. This area can be found as P(t >2.0) = 1 P(t <2.0) - i.e., subtract the area to the left of 2.0 from 1. By symmetry, the area to the left of 2.0 equals the area to the right of 2.0. Table A.3 gives this area (probability) as 0.028, so the p-value = 10.028 = 0.972. Figure for Exercise 13.5c and d c. The p-value is 0.048. df = n1 = 811= 80. In Table A.3, under t = 2.0, the one-sided p-value is given as 0.024. The twosided p-value is 2(0.024)=0.048. d. The p-value = 0.048, as it was in part (c), and the figure will be the same as for part (c). 13.21 Step 1: H0: 1 2 = 0, or equivalently 1 = 2 Ha: 1 2 0, or equivalently, 1 2 1 = mean sleep hours for population of UC Davis students 2 = mean sleep hours for population of Penn State students Step 2: The sample sizes are sufficiently large to proceed. We must assume the samples represent random samples from the larger populations of students at these schools. Sample statistic - Null value 0.18 0 0.94 Test statistic is t Null standard error 0.192 Sample statistic is x1 x 2 = 6.93 –7.11 = 0.18 hours. Standard error s12 s 22 1.71 2 1.95 2 0.0369 0.192 n1 n 2 173 190 Step 3: p-value 0.35 for either procedure. It is calculated as 2P( t >0.94) for the unpooled procedure, and as 2P( t >0.93) for the pooled., df 172 (minimum of n11 and n21) . With Table A.3, the two-sided p-value would be estimated to be greater than 2(.102) = 0.204. Steps 4 and 5: Do not reject the null hypothesis. We do not reject the possibility that the mean hours of sleep are the same for the populations of students at the two schools. 13.78 Step 1: H0: d = 0 vs H a: d 0 d = mean “2 day4 day” cholesterol difference in population of heart attack patients Step 2: Assume the sample represents a random sample from a larger population of heart attack patients. The sample size does not meet the arbitrary standard of n = 30 for a “large” sample size. So, we must use the data set to check that there are no outliers and that the difference data are reasonably symmetric. A boxplot of the differences shows that the necessary conditions are met. s 38 .28 7.234 Null standard error is d n 28 Sample statistic - Null value 23 .29 0 3.22 . Null standard error 7.234 Step 3: p-value < 0.003. It is the area (probability) to the right of t = 3.22 in a t-distribution with df = n1= 281 = 27. From Table A.3 it can be determined that the p-value is less than 0.003. Software or an appropriate calculator can be used to determine more exactly that the p-value is 0.0017. Steps 4 and 5: Reject the null hypothesis for = 0.05. The conclusion about the population of heart attack patients represented by the sample is that, on average, cholesterol levels are higher two days after the attack than they are four days after the attack. Test statistic is t 13.80 Step 1: H0: d 0 vs H a: d > 0 d = mean “2 day14 day” cholesterol difference in population of heart attack patients Step 2: Assume the sample represents a random sample from a larger population of heart attack patients. A dotplot (or a boxplot) shows that the sample of differences is reasonably symmetric and there are no outliers. Figure for Exercise 13.80 The appropriate test is a t-test for paired data. Minitab output follows. Output for Exercise 13.80 Paired T for 2-Day - 14-Day 2-Day 14-Day Difference N 19 19 19 Mean 259.5 221.5 38.0 StDev 47.9 43.2 50.4 SE Mean 11.0 9.9 11.6 95% lower bound for mean difference: 18.0 T-Test of mean difference = 0 (vs > 0): T-Value = 3.29 P-Value = 0.002 Step 2 continued and Steps 3, 4, and 5: The test statistic is t = 3.29 (df = 18) and the p-value is 0.002. Reject the null hypothesis and conclude that in the population of heart attack patients there is a decrease, on average, in cholesterol level from day 2 to day 14 after the attack. The observed magnitude of the decrease is d 38 points. 13.82 Step 1: H0: p1p2 = 0, or equivalently, p1 = p2 Ha: p1p2 0, or equivalently, p1 p2 p1 = proportion favoring legalization of marijuana in the U.S. population of men, and p2 = proportion favoring legalization of marijuana in the U.S. population of women Step 2: The sample represents a random sample from the U.S. population and the sample size is sufficiently large so that observed counts in both categories (legal or not legal) are greater than 10 in both groups (males and females). Minitab output is given below. Output for Exercise 13.82 Sample 1 (Male) 2 (Female) X 118 116 N 413 591 Sample p 0.285714 0.196277 Estimate for p(1) - p(2): 0.0894368 95% CI for p(1) - p(2): (0.0353663, 0.143507) Test for p(1) - p(2) = 0 (vs not = 0): Z = 3.30 P-Value = 0.001 Step 2 continued and Steps 3, 4, and 5: The test statistic is z = 3.30 and the p-value is 0.001. Reject the null hypothesis and conclude that in the U.S. population, different proportions of males and females favor legalization of marijuana. Note that the observed proportion favoring legalization is higher for males ( pˆ 1 .29 ) than for females ( pˆ 2 .20 ).