Chapter 3 Review-KEY The midterm and final test grades of a

advertisement
Chapter 3 Review-KEY
The midterm and final test grades of a sample of 11 statistics students were recorded. Use this data to answer
questions 1 – 8.
Graph the data so that you can predict final exam grade from midterm grade.
Student Number
Midterm Score
Final Exam Score
1
77
81
2
90
96
3
65
72
4
86
91
5
59
82
6
92
93
7
97
95
8
72
69
9
79
89
10
76
74
11
50
42
1. Identify the explanatory and response variables. Justify your choices.
The explanatory variable is the midterm score and the response variable is the final exam score because we
are predicting final exam score from midterm score.
2. Describe the form, strength, and direction of the scatterplot. Are there any distinguishing characteristics or
influential points?
The graph has a strong, positive, linear relationship. The point at (59, 82) is an influential point because it
appears to be an outlier in the y-direction.
3. Using your calculator, determine the equation of the regression line. Plot that line on your graph.
Predicted final exam score = 8.546 + 0.937 (midterm score)
4. Identify the slope and y-intercept. Interpret them in context.
Slope: 0.937; this value represents an increase in final exam score of 0.937 points for every 1 point increase in
midterm score.
Y - intercept: 8.546; this value represents the final exam score if the midterm score were 0 points.
5. The correlation for this data is 0.8544. How would removing the point (59, 82) change the correlation?
Removing point (59, 82) would increase the correlation because that point increases the spread of the data.
Removing it would make all the points closer to the line.
6. Which point appears to have the largest negative residual?
The point at (50, 42) appears to have the largest negative residual.
7. Calculate the residual for that point and interpret it in context.
The predicted value needs to be calculated before the residual can be calculated. Using the equation of the
line from question 3:
Predicted value = 8.546 + 0.937(midterm grade)
Predicted value = 8.546 + 0.937 (50) = 55.4 points
Residual = actual value – predicted value
Residual = 42 points – 55.4 points = -13.4; this student score 13.4 points less than expected
8. What does the following residual plot tell about the form of the data?
Because the residual plot has no obvious pattern, is
completely scattered, a linear model is appropriate for
this data.
Use this regression analysis for diameter (in inches) versus age (in years) for a sample of 25 oak trees to answer
questions 9 - 12.
9. What is the equation of the line?
Predicted diameter = 1.1755 + 0.16476 (age)
10. What does a coefficient of determination of 80.89% represent in the context of this problem?
A coefficient of determination (r2) of 80.89% means that 80.89% of the variation in height is due to the
variation in age. The other 19.11% of variation could be attributed to such things as soil conditions, growing
season, and annual rainfall / droughts.
11. Calculate the correlation. What does this tell you about the data?
Correlation (r) is calculated by taking the square root of the coefficient of determination. For r2 = 0.8089,
r  0.8089 = ±0.90. Because the slope is positive, the correlation is also positive. A correlation of 0.90
indicates a strong relationship.
12. What would happen to the value of the correlation if the diameter were measured in centimeters rather than
inches?
Correlation would not change with a change in units because it is not dependent on units.
An ecologist studying breeding habits of the common crossbill in different years finds that there is a linear relationship
between the number of breeding pairs of crossbills and the abundance of the spruce cones. Below are statistics on eight
years of measurements, where x = average number of cones per tree and y = number of breeding pairs of crossbills in a
certain forest. The correlation between x and y is r = 0.968. Use this information to answer questions 13 – 15.
Mean
x = mean number of cones/tree
y = number of crossbill pairs
23.0
18.0
Standard
deviation
16.2
15.1
13. Determine the equation of the least-squares regression line (with y as the response variable).
Slope: b = r
sy
sx
 15.1 
b = 0.986 

 16.2 
b = 0.9023
Y-intercept:
a= y -bx
a = 18.0 - (0.9023)(23) = -2.753
Therefore the equation of the line is:
predicted number of crossbill pairs = -2.753 + 0.9023 (number of cones per tree)
14. What percentage of the variation in numbers of breeding pairs of crossbills can be accounted for by this
regression?
The percentage of the variation is the coefficient of determination (r2).
r = 0.968
r2 = (0.968)2 = 0.937 or 93.7%
93.7% of the variation in the number of breeding pairs can be accounted for by the number of cones per tree.
15. Based on these data, can we conclude that the abundance of spruce cones is responsible for the number of
breeding pairs of crossbills? Explain.
No, correlation does not imply causation. Just because there is a strong, linear relationship does not
necessarily mean that one variable caused the other.
Download