Correlation and Linear Regression Review Problems *Make sure to

advertisement
Correlation and Linear Regression Review Problems
*Make sure to also review and know the “Properties of Correlation” that are on your 3.1 notes
1. If data set A of (x, y) data has correlation coefficient r = 0.65, and a second data set B has
correlation 𝑟 = – 0.65, then
a) the points in A exhibit a stronger linear association than B.
b) the points in B exhibit a stronger linear association than A.
c) neither A nor B has a stronger linear association.
d) you can’t tell which data set has a stronger linear association without seeing the data or seeing
the scatterplots.
e) a mistake has been made—r cannot be negative.
2. Leonardo da Vinci, the renowned painter, speculated that an ideal human would have an armspan
(distance from outstretched fingertip of left hand to outstretched fingertip of right hand) that was
equal to his height. The following computer regression printout shows the results of a leastsquares regression on height and arm span, in inches, for a sample of 18 high school students.
Which of the following statements is false?
a) This least-squares regression model would make a prediction that is 1.63 inches higher than
da Vinci projected for a 62-inch tall student.
b) One of the students in the sample had a height of 70.5 inches and an arm span of 68 inches. The
residual for this student is 1.83 inches.
c) Da Vinci’s projection is lower than the prediction that this least-squares line will make for any
height.
d) For every one-inch increase in arm span, the regression model predicts about a 0.84-inch
increase in height.
e) For a student 66 inches tall, our model would predict an arm span of about 67 inches.
3. A set of data relates the amount of annual salary raise and the performance rating. The least
squares regression equation is = 1400 + 2000x where y is the raise amount and x is the
performance rating. Which of statements (a) to (d) is not correct?
a) For each increase of one point in performance rating, the raise will increase on average by
$2000.
b) This equation produces predicted raises with an average error of 0.
c) A rating of 0 will yield a predicted raise of $1400.
d) The correlation between salary raise and performance rating is positive.
e) All of the above are true.
4. You are interested in predicting the cost of heating houses on the basis of how many rooms the
house has. A scatterplot of 25 houses reveals a strong linear relationship between these variables,
so you calculate a least-squares regression line. “Least-squares” refers to
a) Minimizing the sum of the squares of the 25 houses’ heating costs.
b) Minimizing the sum of the squares of the number of rooms in each of the 25 houses.
c) Minimizing the sum of the products of each house’s actual heating costs and the predicted
heating cost based on the regression equation.
d) Minimizing the4 sum of the squares of the difference between each house’s heating costs and
number of rooms.
e) Minimizing the sum of the squares of the residuals.
Use following to answer questions 5 and 6.
One concern about the depletion of the ozone layer is that the increase in ultraviolet (UV) light will
decrease crop yields. An experiment was conducted in a green house where soybean plants were
exposed to varying levels of UV, measured in Dobson units. At the end of the experiment the yield (kg)
was measured. A regression analysis was performed with the following results:
5. The least-squares regression line is the line that
a) minimizes the sum of the distances between the actual UV values and the predicted UV values.
b) minimizes the sum of the squared residuals between the actual yield and the predicted yield.
c) minimizes the sum of the distances between the actual yield and the predicted UV.
d) minimizes the sum of the squared residuals between the actual UV reading and the predicted
UV values.
e) Minimizes the perpendicular distance between the regression line and each data point.
6. Which of the following is correct?
a) If the UV value increases by 1 Dobson unit, the yield is expected to increase by 0.0463 kg.
b) If the yield increased by 1 kg, the UV value is expected to decrease by 0.0463 Dobson units.
c) If the UV increases by 1 Dobson unit, the yield is expected to decrease by 0.0463 kg.
d) The predicted yield is 4.3 kg when the UV value is 20 Dobson units.
e) None of the above is correct.
7. Which statements below about least-squares regression are correct?
I. Switching the explanatory and response variables will not change the least-squares
regression line.
II. The slope of the line is very sensitive to outliers with large residuals.
III. A value of 𝑟 2 close to 1 does not guarantee that the relationship between the
variables is linear.
a) Only I is correct.
b) Only II is correct.
c) Only III is correct
d) Both II and III are correct.
e) All three statements – I, II and III – are correct.
8. A least-squares regression line for predicting weights of basketball players on the basis of their
heights produced the residual plot below.
What does the residual plot tell you about the linear model?
a) A residual plot is not an appropriate means for evaluating a linear model.
b) The curved pattern in the residual plot suggests that there is no association between the
weight and height of basketball players.
c) The curved pattern in the residual plot suggests that the linear model is not appropriate.
d) There are not enough data point to draw any conclusions from the residual plot.
e) The linear model is appropriate, because there are approximately the same number of points
above and below the horizontal line in the residual plot.
9. If the correlation coefficient 𝑟 is equal to 1 then:
a) There is a positive relationship between the two variables.
b) There is a negative relationship between the two variables.
c) There is no relationship between the two variables.
d) There is a perfect positive relationship between the two variables.
e) There is a perfect negative relationship between the two variables.
10. A student produces a correlation of +1.3. This is
a) a high positive correlation
b) a significant correlation
c) an impossible correlation
d) only possible if n is large
11. What sort of correlation would be expected between a company’s expenditure on health and
safety and the number of work related accidents?
a) Positive
b) Negative
c) Random
d) None
12. If Ali scored the top mark in the apprentices test on computing and the correlation between that
test and the test on finger dexterity was +1.0 what position did Ali get in the second test?
a) Middle
b) Bottom
c) Top
d) Cannot say from the information given
13. All correlation coefficients share in common the property that they range between
a) +1 and 0
b) +1.00 and -1.00
c) +0.1 and -0.1
d) +1.96 and -1.96
14. When 𝑟 is negative, one variable increases in value,
a) the other increases
b) the other increases at a greater rate
c) the other variable decreases in value
d) there is no change in the other variable
e) all of the above
15. The Co-efficient of Determination expresses the amount of the variation in the dependent variable
that can be explained by –
a) the proportion of Y
b) the variation in the independent variable Y
c) the variation in the independent variable X
d) all of the above
e) none of the above
16. To obtain the Co-efficient of Determination, it is necessary to –
a)
b)
c)
d)
e)
17. If
a)
b)
c)
d)
e)
√𝑟
𝑟𝑥
𝑟2
𝑟× 𝑟 × 𝑟
3
√𝑟
r = 0.7. then the Co-efficient of Determination is equal to
7.0
0.7
0.049
0.49
4.9%
18. If two variables are absolutely independent of each other the correlation between them must be,
a) -1
b) 0
c) +1
d) +0.1
Review problems for Chapter 3 start on page 251. Work through a few of the odd problems especially if
you are still feeling shaking on calculating regressions or residuals. Definitely understand 3.77. Answers
for these will get posted to the website, hopefully before Monday.
Download