Clicker_chapter24

advertisement
Inference for Regression
BPS chapter 24
© 2006 W.H. Freeman and Company
Linear regression
Which point represents “a” in our least-squares regression equation?
a)
b)
c)
d)
Point Q
Point S
Point R
Point T
Linear regression (answer)
Which point represents “a” in our least-squares regression equation?
a)
b)
c)
d)
Point Q
Point S
Point R
Point T
Correlation
If two quantitative variables, X and Y, have a correlation coefficient r =
0.80, which graph could be a scatterplot of the two variables?
a)
b)
c)
Plot A
Plot B
Plot C
Correlation (answer)
If two quantitative variables, X and Y, have a correlation coefficient r =
0.80, which graph could be a scatterplot of the two variables?
a)
b)
c)
Plot A
Plot B
Plot C
Correlation
Which of the following statements is true?
a)
b)
c)
d)
rPlot A > rPlot B
rPlot C > rPlot A
rPlot C > rPlot B
The correlation coefficient is the same in all plots.
Correlation (answer)
Which of the following statements is true?
a)
b)
c)
d)
rPlot A > rPlot B
rPlot C > rPlot A
rPlot C > rPlot B
The correlation coefficient is the same in all plots.
Residual
The following scatterplot shows the number of gold medals earned by
countries in 1992 versus how many earned in 1996. Which of the
points would have the smallest residual?
a)
b)
c)
d)
Point A
Point B
Point C
Point D
Residual (answer)
The following scatterplot shows the number of gold medals earned by
countries in 1992 versus how many earned in 1996. Which of the
points would have the smallest residual?
a)
b)
c)
d)
Point A
Point B
Point C
Point D
Regression line
In the previous question about gold medals, the least-squares
regression equation is:
Where x is the number of medals earned in 1992 and yˆ is the
predicted number of medals earned in 1996. What is the best
interpretation of b in this example?
a)
b)
c)
d)
Countries that earned ten medals in the 1992 Olympics are
predicted to earn an average of nine medals in 1996.
For all countries participating in the 1992 Olympics, 89% earned
medals in 1996.
If a country earned zero medals in 1992, they would have an 89%
chance of earning one in 1996.
All countries who earned medals in 1992 had an 89% probability of
earning a medal in 1996.
Regression line (answer)
In the previous question about gold medals, the least-squares
regression equation is:
Where x is the number of medals earned in 1992 and yˆ is the
predicted number of medals earned in 1996. What is the best
interpretation of b in this example?
a)
b)
c)
d)
Countries that earned ten medals in the 1992 Olympics are
predicted to earn an average of nine medals in 1996.
For all countries participating in the 1992 Olympics, 89% earned
medals in 1996.
If a country earned zero medals in 1992, they would have an 89%
chance of earning one in 1996.
All countries who earned medals in 1992 had an 89% probability of
earning a medal in 1996.
Appropriate analysis
Edwin Hubble collected data on the distance a galaxy is from the earth
and the velocity with which it appears to be receding. If he wanted
to investigate if there was a linear relationship between the distance
and the velocity, what type of analysis did he perform?
a)
b)
c)
d)
Two-sample t-test on means
2 analysis on proportions
Linear regression analysis
Matched pairs experiment
Appropriate analysis (answer)
Edwin Hubble collected data on the distance a galaxy is from the earth
and the velocity with which it appears to be receding. If he wanted
to investigate if there was a linear relationship between the distance
and the velocity, what type of analysis did he perform?
a)
b)
c)
d)
Two-sample t-test on means
2 analysis on proportions
Linear regression analysis
Matched pairs experiment
Linear regression
Edwin Hubble collected data on the distance a galaxy is from the earth
and the velocity with which it appears to be receding. He used the
following model:  y     x where x represents the distance the
galaxy is from the earth (in megaparsecs) and  y represents the
mean velocity (in km/sec) for all galaxies at that distance. What
does  represent in this problem?
a)
b)
c)
d)
The average velocity for a galaxy that is extremely close to earth.
The average change in velocity for a one-megaparsec increase in
distance for those galaxies in the sample.
The average velocity for all galaxies in the universe.
The average change in velocity for a one-megaparsec increase in
distance of all galaxies.
Linear regression (answer)
Edwin Hubble collected data on the distance a galaxy is from the earth
and the velocity with which it appears to be receding. He used the
following model:  y     x where x represents the distance the
galaxy is from the earth (in megaparsecs) and  y represents the
mean velocity (in km/sec) for all galaxies at that distance. What
does  represent in this problem?
a)
b)
c)
d)
The average velocity for a galaxy that is extremely close to earth.
The average change in velocity for a one-megaparsec increase in
distance for those galaxies in the sample.
The average velocity for all galaxies in the universe.
The average change in velocity for a one-megaparsec increase
in distance of all galaxies.
Linear regression
Edwin Hubble collected data on the distance a galaxy is from the earth
and the velocity with which it appears to be receding. Summarizing
his data with a scatterplot and generating the least-squares
regression line gave the following table:
Based on the information in the table, what is the correct equation for
the least-squares regression line?
a)
b)
c)
d)
e)
Linear regression (answer)
Edwin Hubble collected data on the distance a galaxy is from the earth
and the velocity with which it appears to be receding. Summarizing
his data with a scatterplot and generating the least-squares
regression line gave the following table:
Based on the information in the table, what is the correct equation for
the least-squares regression line?
a)
b)
c)
d)
e)
Residuals
Edwin Hubble collected data on the distance a galaxy is from the earth and the
velocity with which it appears to be receding. By looking at the following
residual plot and histogram of the residuals, what conclusion should be
made about the conditions for performing the linear regression?
a)
b)
c)
d)
e)
Because the residual plot shows no pattern and the histogram is
approximately bell-shaped, the conditions are met.
The residual plot implies that the data violate the assumption of normality.
The histogram of the residuals shows that the data are extremely rightskewed.
Neither plot tells us anything about the assumptions for doing inference for
regression.
The residual plot implies that the data violate the assumption of linearity.
Residuals (answer)
Edwin Hubble collected data on the distance a galaxy is from the earth and the
velocity with which it appears to be receding. By looking at the following
residual plot and histogram of the residuals, what conclusion should be
made about the conditions for performing the linear regression?
a)
b)
c)
d)
e)
Because the residual plot shows no pattern and the histogram is
approximately bell-shaped, the conditions are met.
The residual plot implies that the data violate the assumption of normality.
The histogram of the residuals shows that the data are extremely rightskewed.
Neither plot tells us anything about the assumptions for doing inference for
regression.
The residual plot implies that the data violate the assumption of linearity.
Linear relationship
Edwin Hubble collected data on the distance a galaxy is from the earth
and the velocity with which it appears to be receding. If the
researchers want to test whether there is a positive linear
relationship between the distance and velocity, what hypotheses
could be used?
a)
b)
c)
d)
Linear relationship (answer)
Edwin Hubble collected data on the distance a galaxy is from the earth
and the velocity with which it appears to be receding. If the
researchers want to test whether there is a positive linear
relationship between the distance and velocity, what hypotheses
could be used?
a)
b)
c)
d)
Linear regression
Edwin Hubble collected data on the distance a galaxy is from the earth
and the velocity with which it appears to be receding.
For a confidence interval for  we use the general form for a confidence
interval: estimate  (table value) (SE of the estimate)
According to the printout above, what value should we use for the
standard error of the estimate?
a)
b)
c)
d)
83.4389
75.2371
-40.7836
454.2584
Linear regression (answer)
Edwin Hubble collected data on the distance a galaxy is from the earth
and the velocity with which it appears to be receding.
For a confidence interval for  we use the general form for a confidence
interval: estimate  (table value) (SE of the estimate)
According to the printout above, what value should we use for the
standard error of the estimate?
a)
b)
c)
d)
83.4389
75.2371
-40.7836
454.2584
Confidence interval
Edwin Hubble collected data on the distance a galaxy is from the earth
and the velocity with which it appears to be receding. If a 95%
confidence interval for  is (298.12, 610.20), what conclusion could
be made about  at a significance level of  = 0.05?
a)
b)
c)
d)
We have sufficient evidence to conclude that there is no linear
relationship between velocity and distance.
We have sufficient evidence to conclude that there is a linear
relationship between velocity and distance.
There is insufficient evidence to conclude that there is a linear
relationship between velocity and distance.
The confidence interval does not give us enough information to
answer this question.
Confidence interval (answer)
Edwin Hubble collected data on the distance a galaxy is from the earth
and the velocity with which it appears to be receding. If a 95%
confidence interval for  is (298.12, 610.20), what conclusion could
be made about  at a significance level of  = 0.05?
a)
b)
c)
d)
We have sufficient evidence to conclude that there is no linear
relationship between velocity and distance.
We have sufficient evidence to conclude that there is a linear
relationship between velocity and distance.
There is insufficient evidence to conclude that there is a linear
relationship between velocity and distance.
The confidence interval does not give us enough information to
answer this question.
Prediction intervals
For a house of size 1500 ft2, the 95% prediction interval for its selling
price will be _________ the 95% confidence interval for the average
selling price of all homes that are 1500 ft2?
a)
b)
c)
d)
Wider than
The same as
Narrower than
Not comparable with
Prediction intervals (answer)
For a house of size 1500 ft2, the 95% prediction interval for its selling
price will be _________ the 95% confidence interval for the average
selling price of all homes that are 1500 ft2?
a)
b)
c)
d)
Wider than
The same as
Narrower than
Not comparable with
Prediction intervals
True or false: If we give a prediction interval for one home whose size
is 1500 ft2, this interval estimates the mean selling prices for all
homes whose size is 1500 ft2.
a)
b)
True
False
Prediction intervals (answer)
True or false: If we give a prediction interval for one home whose size
is 1500 ft2, this interval estimates the mean selling prices for all
homes whose size is 1500 ft2.
a)
b)
True
False
Prediction intervals
True or false: If we compute a prediction interval for one home whose
size is 1100 ft2 and a 95% confidence interval for the mean selling
prices of all homes whose size is 1100 ft2, the centers of the
intervals will be the same.
a)
b)
True
False
Prediction intervals (answer)
True or false: If we compute a prediction interval for one home whose
size is 1100 ft2 and a 95% confidence interval for the mean selling
prices of all homes whose size is 1100 ft2, the centers of the
intervals will be the same.
a)
b)
True
False
Hypothesis tests
Researchers at The Ohio State University wanted to know if they could
use the number of beers consumed by a student to predict the
student’s blood alcohol content (BAC). The following scatterplot
shows the data. In order to know if the number of beers consumed
was a good predictor of BAC, they tested H0 :   0, Ha :   0 .
From the following table, what is the test statistic for performing this
test?
a)
b)
c)
d)
e)
0.0126
0.0180
0.3320
7.4796
-1.050
Hypothesis tests (answer)
Researchers at The Ohio State University wanted to know if they could
use the number of beers consumed by a student to predict the
student’s blood alcohol content (BAC). The following scatterplot
shows the data. In order to know if the number of beers consumed
was a good predictor of BAC, they tested H0 :   0, Ha :   0 .
From the following table, what is the test statistic for performing this
test?
a)
b)
c)
d)
e)
0.0126
0.0180
0.3320
7.4796
-1.050
Hypothesis tests
Researchers at The Ohio State University wanted to know if they could
use the number of beers consumed by a student to predict the
student’s blood alcohol content (BAC). In order to know if the
number of beers consumed was a good predictor of BAC, they
tested H0 :   0, Ha :   0. What can we conclude from the
following table?
a)
b)
c)
d)
Because the P-value is 0.3320, there is a significant linear
relationship between the number of beers consumed and BAC.
Because the P-value is 0.0000, there is a significant linear
relationship between the number of beers consumed and BAC.
Because the P-value is 0.3320, there is no significant linear
relationship between the number of beers consumed and BAC.
Because the P-value is 0.0000, there is no significant linear
relationship between the number of beers consumed and BAC.
Hypothesis tests (answer)
Researchers at The Ohio State University wanted to know if they could
use the number of beers consumed by a student to predict the
student’s blood alcohol content (BAC). In order to know if the
number of beers consumed was a good predictor of BAC, they
tested H0 :   0, Ha :   0. What can we conclude from the
following table?
a)
b)
c)
d)
Because the P-value is 0.3320, there is a significant linear
relationship between the number of beers consumed and BAC.
Because the P-value is 0.0000, there is a significant linear
relationship between the number of beers consumed and BAC.
Because the P-value is 0.3320, there is no significant linear
relationship between the number of beers consumed and BAC.
Because the P-value is 0.0000, there is no significant linear
relationship between the number of beers consumed and BAC.
Prediction
Researchers at The Ohio State University wanted to know if they could
use the number of beers consumed by a student to predict the
student’s blood alcohol content (BAC). We want to predict the
mean BAC for students who have had seven beers. Should we use
the 95% confidence interval for  y, which is (0.0976, 0.1290), or the
95% prediction interval for Y for X = x* which is (0.0667, 0.1599)?
a)
b)
Confidence interval
Prediction interval
Prediction (answer)
Researchers at The Ohio State University wanted to know if they could
use the number of beers consumed by a student to predict the
student’s blood alcohol content (BAC). We want to predict the
mean BAC for students who have had seven beers. Should we use
the 95% confidence interval for  y, which is (0.0976, 0.1290), or the
95% prediction interval for Y for X = x* which is (0.0667, 0.1599)?
a)
b)
Confidence interval
Prediction interval
Conclusions
The following scatterplot shows a linear regression analysis of the relationship
between the time (in seconds), y, to run a marathon versus the year the
marathon was run, x. A statistics student used the regression equation y =
337,047 – 165.6809x to predict how fast the marathon would be run in
2004. She got an answer of 5022 seconds, or about 1 hour and 24
minutes. This conclusion is:
a)
b)
c)
d)
Believable because the results came from the regression equation.
Believable because looking at the graph you can see that the time to run a
marathon is indeed decreasing.
Unbelievable because no one will ever be able to run a marathon that
quickly.
Unbelievable because using 2004 to predict the running time would be
considered extrapolation.
Conclusions (answer)
The following scatterplot shows a linear regression analysis of the relationship
between the time (in seconds), y, to run a marathon versus the year the
marathon was run, x. A statistics student used the regression equation y =
337,047 – 165.6809x to predict how fast the marathon would be run in
2004. She got an answer of 5022 seconds, or about 1 hour and 24
minutes. This conclusion is:
a)
b)
c)
d)
Believable because the results came from the regression equation.
Believable because looking at the graph you can see that the time to run a
marathon is indeed decreasing.
Unbelievable because no one will ever be able to run a marathon that
quickly.
Unbelievable because using 2004 to predict the running time would be
considered extrapolation.
Conclusions
An article in a newspaper said that students who major in subjects that
have higher expected incomes after graduation are more likely to be
married. This conclusion is:
a)
b)
c)
Correct because the data were collected in a scientific way.
Incorrect because the results are likely biased due to lurking
variables.
Not reliable because it does not sound plausible.
Conclusions (answer)
An article in a newspaper said that students who major in subjects that
have higher expected incomes after graduation are more likely to be
married. This conclusion is:
a)
b)
c)
Correct because the data were collected in a scientific way.
Incorrect because the results are likely biased due to lurking
variables.
Not reliable because it does not sound plausible.
Relationships
The following plot shows a person’s score on a sobriety test versus
their blood alcohol content. Which statement is NOT true about this
plot?
a)
b)
c)
d)
An outlier is present in the dataset.
A relationship exists between BAC and the test score.
The relationship could be modeled with a straight line.
There is a positive relationship between the two variables.
Relationships (answer)
The following plot shows a person’s score on a sobriety test versus
their blood alcohol content. Which statement is NOT true about this
plot?
a)
b)
c)
d)
An outlier is present in the dataset.
A relationship exists between BAC and the test score.
The relationship could be modeled with a straight line.
There is a positive relationship between the two variables.
Conclusions
The average height of people in the United States has been increasing
for decades. Similarly there is evidence that the number of plant
species is decreasing over these decades. An appropriate
conclusion to draw from these observations would be that
a)
b)
c)
Even though they appear to be associated, we could not conclude
association.
Growing adults are causing the number of plant species to
decrease.
There is a positive relationship between the two variables.
Conclusions (answer)
The average height of people in the United States has been increasing
for decades. Similarly there is evidence that the number of plant
species is decreasing over these decades. An appropriate
conclusion to draw from these observations would be that
a)
b)
c)
Even though they appear to be associated, we could not
conclude association.
Growing adults are causing the number of plant species to
decrease.
There is a positive relationship between the two variables.
Download