Chapter 2 solutions - Penn State Department of Statistics

advertisement
Stat 250.3
November 19, 2003
HOMEWORK 6– SOLUTIONS
12.19
a. s.e. ( x ) 
s

3.7
 0.57 kg. Over all possible samples of n = 42 from this population, the average difference
n
42
between the sample mean and the population mean is about 0.57 kg.
b. The interval is about 6.06 to 8.34 kg, computed as 7.2  2  0.57 kg.
Calculate the interval as Sample estimate  2  Standard error, which here is x  2 
s
.
n
x  7.2 kg and the standard error of the mean was found in part (a).
Interpretation: With approximate 95% confidence, we can say that in the population of men represented by this
sample the mean weight loss for men if they used the diet plan would be between 6.06 kg and 8.34 kg.
12.22
a. Approximate 95% confidence interval is .264 to .416, computed as .34  2  .038.
Parameter is p1  p 2 = difference in proportions of men (population 1) and women (population 2) who have driven
after having too much to drink
Calculate interval as Sample estimate  2  Standard error, which is pˆ 1  pˆ 2  2  s.e. ( pˆ 1  pˆ 2 )
Sample estimate = pˆ 1  pˆ 2  .63  .29  .34
Standard error = s.e. ( pˆ 1  pˆ 2 ) 
pˆ 1 (1  pˆ 1 ) pˆ 2 (1  pˆ 2 )
.63(1  .63) .29 (1  .29 )



 .038
n1
n2
300
300
b. With 95% confidence, we can say the interval .264 to .416 covers the difference between the proportions of men
and women in the population who would say they have driven after having too much to drink. The sample proportion
for men was higher, so we estimate (with 95% confidence) the proportion for men in the population is somewhere
between .264 and .416 above the proportion for women.
12.26
a. 2.52
b. 2.13
c. 4.60
d. 2.06
e. 1.99
12.49
a. Confidence interval is about .032 to .338, computed as .185  (2)(.0763).
If z * =1.96 is used for the multiplier, the answer is about .035 to .335.
Compute interval as Sample estimate  Multiplier  Standard error:
Sample estimate is pˆ 1  pˆ 2  .611  .426  .185
Standard error is s.e.( p1  p 2 ) =
pˆ 1 (1  pˆ 1 ) pˆ 2 (1  pˆ 2 )
.611(1  .611) .426 (1  .426 )



 .0763
n1
n2
131
61
Multiplier is z *  2 (or 1.96 would be more exact)
Interpretation: With 95% confidence, we can say that in the population(s) represented by the sample(s), the difference
in proportions who would say yes to this question is between about .032 and .338. The proportion is higher for women.
b. This answer will vary. The issue concerns whether, for this question, students in a statistics class at Penn State
represent a population of only Penn State students or whether they represent a broader population of college students
in the United States.
c. The interval computed in part (a) does not include the value 0. This tells us that it is reasonable to conclude that in
the population represented by the sample, there is a difference between the proportion of women who would say yes to
this question and the proportion of men who would say “yes” to this question. A higher proportion of women than men
would say that they would date someone with a great personality even if they did not find them attractive.
12.50
a. Confidence interval is about .112 to .382, computed as .185  (2.576)(.0763).
Multiplier for 0.99 confidence level is z * = 2.576. This can be found the table at the bottom of page 424 or in the last
row of Table A.2.
The sample estimate of the difference and the standard error of the difference were determined in part (a) of the
previous exercise.
b. The 99% confidence interval is wider than the 95% confidence interval, In general, increasing the confidence level
gives a wider interval.
c. A 90% confidence interval will be narrower than the 99% confidence interval. The greater the confidence level, the
wider the interval.
12.74
First, create a two-way table for the two variables to determine relevant counts (using the cross tabulation procedure in
163
 .516 . Of n2 = 373 Democrats, the
Minitab). Of n1 = 316 Republicans, the number owning a gun is 163 so pˆ 1 
316
131
 .351 The 95% confidence interval for p1  p 2 is about .09 to .24 (note
number owning a gun is 131 so pˆ 2 
373
that it does not cover 0 so is evidence of difference in the population).
.
Minitab Output for Exercise 12.74
Sample
1
2
X
163
131
N
316
373
Sample p
0.515823
0.351206
Estimate for p(1) - p(2): 0.164616
95% CI for p(1) - p(2): (0.0912489, 0.237984)
12.75
.
a. The 98% confidence interval given by Minitab is 183.11 to 203.16.
Output for Exercise 12.75a
Variable
control
N
30
Mean
193.13
StDev
22.30
SE Mean
98.0% CI
4.07 ( 183.11, 203.16)
Interpretation: With 98% confidence, we can say that in the population of individuals who have not had a heart attack,
the mean cholesterol level is between 183.11 and 203.16.
Assumption and necessary conditions: The sample represents a random sample from a larger population of individuals
who have not had a heart attack. The data are roughly symmetric and there are no outliers. A comparative dotplot is
shown here because part (b) asks about the patients who have had a heart attack.
Figure for Exercise 12.75 parts a and b
b. The 98% confidence interval given by Minitab is 231.63 to 276.22.
.
Output for Exercise 12.75b
Variable
2-Day
N
28
Mean
253.93
StDev
47.71
SE Mean
98.0% CI
9.02 ( 231.63, 276.22)
Interpretation: With 98% confidence, we can say that in the population of individuals who have had a heart attack, the
mean cholesterol level two days after the attack is between 231.63 and 276.22.
Assumption and necessary conditions: The sample represents a random sample from a larger population of individuals
who have had a heart attack. The data are roughly symmetric and there are no outliers.
c. A 98% confidence interval (unpooled procedure) for the difference in population means is given by Minitab
as36.74 to 84.85.
Output for Exercise 12.75c
Two-sample T for 2-Day vs control
N
Mean
StDev
SE Mean
2-Day
28
253.9
47.7
9.0
control 30
193.1
22.3
4.1
Difference = mu 2-Day - mu control
Estimate for difference: 60.80
98% CI for difference: (36.74, 84.85)
……… DF = 37
Interpretation: With 98% confidence we can say that in the population of people who have suffered a heart attack, the
mean cholesterol (measured two days after the attack) is between 36.74 and 84.85 points higher than the mean
cholesterol in a population of people who have not had a heart attack.
Assumption and necessary conditions: See parts (a) and (b).
Stat 250.3
November 19, 2003
13.5
Figure for Exercise 13.5a
a. The p-value is 0.028. df = n – 1 = 28 – 1 = 27. In Table A.3, one-sided
p-value under t = 2.0 is given as 0.028 (in the df = 27 row).
Figure for Exercise 13.5b
b. The p-value is 0.972. It is the area to the right of t = 2.0 in a tdistribution with df = 27. This area can be found as P(t >2.0) = 1 P(t
<2.0) - i.e., subtract the area to the left of 2.0 from 1.
By symmetry, the area to the left of 2.0 equals the area to the right of 2.0.
Table A.3 gives this area (probability) as 0.028, so the p-value = 10.028 =
0.972.
Figure for Exercise 13.5c and d
c. The p-value is 0.048. df = n1 = 811= 80. In Table A.3,
under t = 2.0, the one-sided p-value is given as 0.024. The twosided p-value is 2(0.024)=0.048.
d. The p-value = 0.048, as it was in part (c), and the figure will be
the same as for part (c).
13.21
Step 1: H0: 1   2 = 0, or equivalently 1 =  2
Ha: 1   2  0, or equivalently, 1   2
1 = mean sleep hours for population of UC Davis students
 2 = mean sleep hours for population of Penn State students
Step 2: The sample sizes are sufficiently large to proceed. We must assume the samples represent random samples
from the larger populations of students at these schools.
Sample statistic - Null value 0.18  0

 0.94
Test statistic is t 
Null standard error
0.192
Sample statistic is x1  x 2 = 6.93 –7.11 = 0.18 hours.
Standard error
s12 s 22
1.71 2 1.95 2



 0.0369  0.192
n1 n 2
173
190
Step 3: p-value  0.35 for either procedure. It is calculated as 2P( t >0.94) for the unpooled procedure, and as 2P( t
>0.93) for the pooled., df  172 (minimum of n11 and n21) . With Table A.3, the two-sided p-value would be
estimated to be greater than 2(.102) = 0.204.
Steps 4 and 5: Do not reject the null hypothesis. We do not reject the possibility that the mean hours of sleep are the
same for the populations of students at the two schools.
13.78
Step 1: H0:  d = 0 vs
H a:  d  0
 d = mean “2 day4 day” cholesterol difference in population of heart attack patients
Step 2: Assume the sample represents a random sample from a larger population of heart attack patients. The sample size
does not meet the arbitrary standard of n = 30 for a “large” sample size. So, we must use the data set to check that there are
no outliers and that the difference data are reasonably symmetric. A boxplot of the differences shows that the necessary
conditions are met.
s
38 .28
 7.234
Null standard error is d 
n
28
Sample statistic - Null value 23 .29  0

 3.22 .
Null standard error
7.234
Step 3: p-value < 0.003. It is the area (probability) to the right of t = 3.22 in a t-distribution with df = n1= 281 = 27.
From Table A.3 it can be determined that the p-value is less than 0.003.
Software or an appropriate calculator can be used to determine more exactly that the p-value is 0.0017.
Steps 4 and 5: Reject the null hypothesis for  = 0.05. The conclusion about the population of heart attack patients
represented by the sample is that, on average, cholesterol levels are higher two days after the attack than they are four days
after the attack.
Test statistic is t 
13.80
Step 1: H0:  d  0
vs
H a:  d > 0
 d = mean “2 day14 day” cholesterol difference in population of heart attack patients
Step 2: Assume the sample represents a random sample from a larger population of heart attack patients. A dotplot (or
a boxplot) shows that the sample of differences is reasonably symmetric and there are no outliers.
Figure for Exercise 13.80
The appropriate test is a t-test for paired data. Minitab output follows.
Output for Exercise 13.80
Paired T for 2-Day - 14-Day
2-Day
14-Day
Difference
N
19
19
19
Mean
259.5
221.5
38.0
StDev
47.9
43.2
50.4
SE Mean
11.0
9.9
11.6
95% lower bound for mean difference: 18.0
T-Test of mean difference = 0 (vs > 0):
T-Value = 3.29 P-Value = 0.002
Step 2 continued and Steps 3, 4, and 5: The test statistic is t = 3.29 (df = 18) and the p-value is 0.002. Reject the null
hypothesis and conclude that in the population of heart attack patients there is a decrease, on average, in cholesterol
level from day 2 to day 14 after the attack. The observed magnitude of the decrease is d  38 points.
13.82
Step 1: H0: p1p2 = 0, or equivalently, p1 = p2
Ha: p1p2  0, or equivalently, p1  p2
p1 = proportion favoring legalization of marijuana in the U.S. population of men, and
p2 = proportion favoring legalization of marijuana in the U.S. population of women
Step 2: The sample represents a random sample from the U.S. population and the sample size is sufficiently large so
that observed counts in both categories (legal or not legal) are greater than 10 in both groups (males and females).
Minitab output is given below.
Output for Exercise 13.82
Sample
1 (Male)
2 (Female)
X
118
116
N
413
591
Sample p
0.285714
0.196277
Estimate for p(1) - p(2): 0.0894368
95% CI for p(1) - p(2): (0.0353663, 0.143507)
Test for p(1) - p(2) = 0 (vs not = 0):
Z = 3.30 P-Value = 0.001
Step 2 continued and Steps 3, 4, and 5: The test statistic is z = 3.30 and the p-value is 0.001. Reject the null hypothesis
and conclude that in the U.S. population, different proportions of males and females favor legalization of marijuana.
Note that the observed proportion favoring legalization is higher for males ( pˆ 1  .29 ) than for females ( pˆ 2  .20 ).
Download