Chapter 12 Study Guide Solutions

advertisement
Chapter 12 Study Guide Solutions
Study Guide
12.1a
12.1b
Ideal Response
2
No, the conditions for performing inference are not met. The variance of the residuals increases
as the laboratory measurement increases.
4
Linear: The residual plot is reasonably centered around 0. This means that the scatterplot is
approximately linear. Independent: This was a randomized experiment. Due to the random
assignment, the observations can be viewed as independent. Normal: The histogram is mound
shaped and approximately symmetric so the residuals could follow a Normal distribution.
Equal Variance: The residual plot shows roughly equal scatter for all x values. Random: This
was a randomized experiment. The conditions are met.
6
 is the y-intercept. In this case it would measure the BAC level if no beers had been drunk.
We would expect this to be 0 and the estimate is close to 0 with a value of -0.0127.  is the
slope. It tells us how much the BAC increases, on average, with the drinking of each additional
beer. The estimate for this parameter is 0.018. In other words, we expect the BAC level to
increase by 0.018 with each additional beer. Finally,  measures the standard deviation of
BAC values about the population regression line. In this case the estimate is 0.0204. This means
that the actual BAC level will vary from the estimated value by 0.0204 on average.
8
(a) If we repeated the experiment many times, the slope of the sample regression line would
typically vary by about 0.0024 from the true slope of the population regression line for
predicting BAC level from the number of beers consumed.
(b) Since there are 16 observations, the appropriate t distribution has 14 degrees of freedom
and This leads to the confidence interval of
0.018  2.977(0.0024)  0.018  0.007  (0.011,0.025) .
(c) We are 99% confident that the interval from 0.011 to 0.025 captures the true slope of the
population regression line for predicting BAC level from the number of beers consumed.
(d) If we were to repeat the experiment many times and compute confidence intervals for the
regression slope in each case, about 99% of the resulting intervals would contain the slope
of the population regression line.
10
State:
Plan:
Do:
Conclude:
12
We want to construct a 90% confidence interval for the true slope  , of the
population regression line relating heights an arm spans of students in a large
school.
If the conditions are met, we will use a t-interval for the slope to estimate  . We
are assuming that the conditions are met here.
This leads to the confidence interval of:
0.8404  1.746(0.0809)  0.8404  0.1413  (0.6991,0.9817)
We are 90% confident that the interval from 0.6991 to 0.9817 captures the true
slope of the population regression line predicting height from arm span..
(a) 1.286  11.894(5)  58.184 clusters
(b)  measures the typical amount of error between the actual values and the predicted values.
In this case our estimate for  is 6.419 so we would expect our prediction of clusters of
beetles to be off by that much on average.
Chapter 12 Study Guide Solutions
12.c
14
(a) The scatterplot suggests that there is a moderately strong negative linear relationship
between the amount of time spent at the table and the calories consumed for young children.
(b) The equation for the line is yˆ  560.65  3.0771x where ŷ is the calories consumed and x is
the time spent at the table.
(c) The y-intercept says that if there no time spent at the table, we would predict the average
number of calories consumed to be 560.65. In this case that is extrapolation as the smallest
amount of time measured was 20 minutes. Also, clearly, if the children spend no time at the
table, they cannot consume 560 calories. The slope says that for each additional minute at
the table we can expect the average caloric consumption to decrease by 3.0771 calories.
(d) State:
We want to perform a test of H o :   0 versus Ha :   0 where  is the true
slope of the population regression line relating time at the table to caloric
consumption. We will use a significance level   0.01 .
Plan:
If the conditions are met, we will do a t-test for the slope  . The residual plot
and the histogram of the residuals are given below. Linear: The scatterplot is
approximately linear. Independent: There were 20 toddlers observed. This is
clearly less than 10% of all possible toddlers. Normal: The histogram of the
residuals is mound shaped and approximately symmetric so the residuals could
follow a Normal distribution. Equal Variance: The residual plot shows roughly
equal scatter for all x values. Random: The data come from a random sample.
The conditions are met.
Do:
Conclude:
16
According to the output, the test statistic is -3.62 and the one-sided P-value using
df  18 is 0.001.
Since the P-value is less than 0.01 we reject the null hypothesis and conclude that
there is convincing evidence of a negative linear relationship between time at the
table and caloric consumption.
We want to construct a 98% t-interval for  , the true slope of the population
regression line relating time at the table and caloric consumption.
Plan:
If the conditions are met, we will construct a t-interval for the slope  . The
conditions were check in Exercise 12.14.
Do:
The sample size is 20 so the appropriate distribution has 18 degrees of freedom
and t*  2.552 . This leads to the confidence interval of
3.0771  2.552(0.8498)  3.0771  2.1687  (5.2358, 0.9084) .
Conclude:
We are 98% confident that the interval from -5.2458 to -0.9084 contains the true
slope of the population regression line for predicting calorie consumption from
time at the table. In Exercise 12.14 we rejected the null hypothesis that the true
slope was 0. The conclusions are the same.
(b) (i) The typical error when using the regression line to predict calorie consumption is about
23.4 calories.
(ii) Approximately 42.1% of the variation in calorie consumption can be explained by the
linear relationship with the time spent at the table.
(iii) If samples like this were observed many times, the estimated slope would differ from
the slope of the true regression line for predicting calorie consumption from time at the
table by an average of 0.8498.
(a)
State:
Chapter 12 Study Guide Solutions
20
(a) State:
Plan:
Do:
We want to perform a test of H o :   0 versus Ha :   0 where  is the
true slope of the population regression line relating swim time to pulse rate.
We will use a significance level   0.05 .
If the conditions are met, we will do a t-test for the slope  . The scatterplot,
residual plot, and the histogram of the residuals are given below. Linear: The
scatterplot is approximately linear. Independent: There were 20 toddlers
observed. This is clearly less than 10% of all possible swim times that could
have been measured. Normal: The histogram of residuals is mound shaped and
approximately symmetric so the residuals could follow a Normal distribution.
Equal Variance: The residual plot shows roughly equal scatter for all x values.
Random: The data come from a random sample. The conditions are met.
The output is given below. According to the output, the test statistic is 5.13
and the P-value given is 0.000. This P-value is for the two-sided test and we
are conducting a one-sided test. But when you divide by 2 here you get the
same thing.
The regression equation is
pulse = 480-9.69 time
Predictor
Constant
time
S = 6.45505
Conclude:
Coef
479.93
-9.695
SE Coef
66.23
1.889
R-Sq = 55.6%
T
7.25
-5.13
P
0.000
0.000
R-Sq (adj) = 53.5%
Since the P-value is less than 0.05 we reject the null hypothesis and conclude
that there is convincing evidence of a negative linear relationship between
swim time and pulse rate.
Chapter 12 Study Guide Solutions
(b) State:
Do:
Conclude:
12.1d
18
We want to construct a 95% t interval for β the true slope of the population
regression line relating time at the table and caloric consumption. Plan: If the
conditions are met, we will construct a t interval for the slope .β The
conditions were checked in (a).
The sample size is 23 so the appropriate distribution has 21 degrees of freedom
and t*  2.08 . This leads to the confidence interval of
9.695  2.08(1.889)  9.695  3.929  (13.624, 5.766) .
We are 95% confident that the interval from -13.624 to -5.766 contains the true
slope of the population regression line for predicting puls rate from swim time.
(a) In computing a 95% confidence interval we use a t distribution with 19 degrees of freedom
and t*  2.093 . The interval is computed as
11,630.6  2.093(1, 249)  11,630.6  2,614.16  (9016.44,14244.76)
(b) The vehicle is measured in years and mileage in miles. Since the automotive group claims
that people drive 15,000 miles per year, that says that for every increase of 1 year, the
mileage would increase by 15,000 miles. This translates into a slope of 15,000.
(c) Since the interval in part (a) does not include the value 15,000, it suggests that the slope
could not plausibly be 15,000. That is, we would reject the null hypothesis that the slope is
15,000.
12.1 MC
12.2b
21. C
34
22. D
23. C
24. A
25. B
26. B
(a) The scatterplot is given below.
The scatterplot shows a strong, negative, curved relationship between volume and pressure.
(b) In this case the explanatory variable is the reciprocal of the volume and the response variable
is the pressure.
(c) Here the explanatory variable is the volume and the response variable is the reciprocal of the
pressure.
36
 
(a) For transformation 1: yˆ  0.3677  15.8994 1x where y is the pressure and x is the volume.
For transformation 2: 1y  0.1002  0.0398 x where y is the pressure and x is the volume.
 
1 =1.303 atmospheres..
(b) For transformation 1: yˆ  0.3677  15.8994 17
1
For transformation 2: 1y  0.1002  0.0398(17)  0.7768 so ŷ  0.7768
atmospheres.
(c) For transformation 1: The typical distance that a predicted value of the pressure will be from
the actual value is about 0.044 atmospheres.
For transformation 2: The typical distance that the reciprocal of the predicted value of
pressure will be from the reciprocal of the actual value is about 0.00355 (atmospheres) 1
Chapter 12 Study Guide Solutions
12.2c
37
(a) The scatterplot is below
The relationship is strong, negative, and slightly curved with one potential outlier in the top
left hand corner.
(b) Since the graph of the explanatory variable against the natural log of the response is
fairly linear, an exponential model would be reasonable.
(c) ln y  5.973  0.218 x where y is the count of surviving bacteria and x is time in
minutes.
(d) ln y  5.973  0.218(17)  2.267 so yˆ  e 2.267  9.65 or 965 bacteria. Since the residual
plot shows a random scatter around the value of 0, we would expect this prediction
to be about right.
38
(a) The scatterplot is below
The relationship is strong, negative, and slightly curved with no outliers.
(b) Since the graph of the explanatory variable against the natural log of the response is
fairly linear, an exponential model would be reasonable.
(c) ln y  6.789  0.333x where y is the light intensity and x is the depth.
(a) ln y  6.789  0.333(12)  2.793 so yˆ  e 2.793  16.33 lumens. This gives a residual of
2.785  2.793  0.008 . The value of s for this model is 0.00006, so the residual for
this point is quite large.
12.2d
40
The equation for the regression line is ln y  2.00  2.42ln x where x is the diameter at
breast height in cm, and y is the aboveground biomass in kg. If a tree is 30 cm in
diameter, then ln y  2.00  2.42ln(30)  6.231 . This means that yˆ  e6.231  508.263 kg is
the total aboveground biomass of the tree.
42
(a) The exponential model would work better because the graph with only the response
variable transformed is linear whereas the graph with both variables transformed has
curvature to it.
(b) log y  0.4537  0.1172 x where y is the height in feet and x is the bounce number.
(c) log y  0.4537  0.1172(7)  0.3667 so yˆ  100.3667  0.4298 feet.
(d) The trend in the residual plot suggests that the residual would be positive which
means that our prediction would be too low.
Chapter 12 Study Guide Solutions
44
(a) The scatterplot is below.
(b) Two scatterplots are given below.
(c) The MiniTab output is given below.
The regression equation is
ln(weight) = -0.314 + 3.14 ln(length)
Predictor
Constant
ln(length)
Coef
-0.3140
3.1387
S = 0.353543
SE Coef
0.1958
0.1151
R-Sq = 99.3%
T
-1.60
27.27
P
0.170
0.000
R-Sq(adj) = 99.2%
So the equation is ln y  0.314  3.1387 ln x where y is the weight of the heart and x is the
length of the cavity of the left ventricle.
(d) ln y  0.314  3.1387 ln(6.8)  5.703 so yˆ  e5.703  299.77 grams.
12.2 MC
45. C
46. E
47. E
48. C
Download