Stat 401B Solutions to Practice Problem for Exam 2

advertisement
Stat 401B
Solutions to Practice Problem for Exam 2
1
One way to prepare for the second exam is to practice on problems similar to the ones you may see on that
exam. Below is a problem on High and Low temperatures for cities in the United States. Associated output
can be found on the course web site. This problem is like a problem you might see on the second exam.
Other types of problems will be asked and so you must study all of the material we have covered leading up
to the exam. Solutions will be made available on the course web site on Monday before the second exam.
1. [50 pts] Data on the average low and average high temperature (degrees Fahrenheit) for 20 cities in
the U.S. is collected. The latitude (degrees north of the equator) for each city is also noted. Below are
the data. Refer to the JMP output entitled Predicting Average High Temperature.
Latitude
30
41
39
44
41
High
81
63
66
63
66
Low
59
43
44
36
40
Latitude
43
41
30
44
38
High
60
66
81
65
68
Low
39
40
57
39
45
Latitude
45
44
45
33
43
High
64
52
58
88
62
Low
41
34
40
61
39
Latitude
35
38
40
47
40
High
75
78
66
58
65
Low
51
50
37
31
48
(a) [3] What is the prediction equation and the value of R 2 for the simple linear regression of average
High temperature on average Low temperature?
Pred High = 23.54 + 1.000*Low with R2 =0.848
(b) [3] What is the value of the adjusted R2 for the simple linear regression of average High temperature on average Low temperature?
adjusted R2 = 1 −
M SError
M ST otal
=1-
12.94
80.62
= 1 - 0.161 = 0.839
(c) [4] Is the model that uses average Low temperature to predict average High temperature statistically significant? Support you answer.
Yes. The F-ratio for the model is 100.36 with associated P-value less than 0.0001.
Since the P-value is so small, the model is statistically significant.
(d) [3] What is the prediction equation and the value of R 2 for the multiple regression of average
High temperature on average Low temperature and Latitude?
Pred High = 55.59 + 0.727*Low - 0.503*Latitude with R2 = 0.859
(e) [4] Does Latitude add significantly to the explanatory ability of the model with just average Low
temperature? Support your answer.
No. The F-ratio for Latitude added to average Low temperature is 1.36 (t-Ratio of
-1.17) with associated P-value of 0.2592. This P-value is not small and so we cannot
reject the hypothesis that the slope is zero. Latitude is not adding significantly to
the model.
(f) [4] Is there a statistically significant interaction between average Low temperature and Latitude?
Support your answer.
No. The F-ratio for Latitude*Low is 0.19 (t-Ratio of -0.43) with an associated Pvalue of 0.6716. Since the P-value is not small we cannot reject the null hypothesis
of a zero slope. The interaction between Latitude and average Low temperature is
not statistically significant.
(g) [4] If there was a significant interaction between average Low temperature and Latitude (not necessarily the case) what would that say about the relationship between average High temperature
and average Low temperature?
If there was a significant interaction between Latitude and average Low temperature,
then the relationship between average High temperature and average Low temperature would be different for different Latitudes.
Stat 401B
Solutions to Practice Problem for Exam 2
2
(h) [3] How would you check to see if average Low temperature and Latitude are multicollinear?
You could calculate the correlation between average Low temperature and Latitude
or you could regress average Low temperature on Latitude and examine the value of
R2 . If the value of the correlation coefficient, similarly R 2 , is zero, then there is no
collinearity.
(i) [3] Give the prediction equation and value of R 2 for the simple linear regression of average High
temperature on Latitude.
Pred High = 132.90 − 1.64*Latitude with R2 = 0.791.
(j) [4] Give an interpretation of the estimated slope coefficient for Latitude.
For every one degree increase in Latitude, the average High temperature decreases,
on average, 1.64 degrees Fahrenheit.
(k) [4] Why is it not a good idea to use the prediction equation relating Latitude to average High
temperature to predict the average High temperature in Caracas, Venezuela (10.5 degrees north
latitude)?
A Latitude of 10.5 degrees is outside the range of the data that were used to construct
the prediction equation. Extrapolating beyond the range of the data is dangerous.
Additionally, the predicted value, 115.68 degrees Fahrenheit, does not make sense
for an average High temperature.
(l) [4] Would you suggest adding a Latitude2 term to the model with just Latitude in it? Support
your decision based on the information provided in the JMP output.
No. There is no pattern in the plot of residuals versus Latitude values. This means
that we are doing as well as can be expected and a squared term will probably not
add significantly to the model.
(m) [4] Describe the distribution of residuals. What does this indicate about the condition of normality?
The histogram of residuals is skewed to the right with one low value that could be
an outlier. The box plot is fairly symmetric and not potential outliers are flagged.
The normal quantile plot has the one low value and a pattern that supports the idea
of skew to the right. All of these put the condition of normality in doubt.
(n) [3] Which model would you use to predict average High temperature?
•
•
•
•
average Low temperature alone
Latitude alone
average Low temperature and Latitude
average Low temperature, Latitude and Low*Latitude
Explain your choice briefly.
The equation that uses average Low temperature. This model is statistically significant and has the highest R2 value among models where all terms are statistically
significant.
Download