Stat 401B Solutions to Practice Problems for Exam 2 Fall 2003 1 One way to prepare for the second exam is to practice on problems similar to the ones you may see on November 4. Below are two problems and associated output. These are representative of the type of problems that can be asked on a second exam. Other types of problems may be asked and so you must study all of the material we have covered leading up to the exam. Your answers to these questions should be turned in by 5 pm on Friday, October 31. Solutions will be handed out in class on Monday, November 3. 1. [50 pts] Data on the average low and average high temperature (degrees Fahrenheit) for 20 cities in the U.S. is collected. The latitude (degrees north of the equator) for each city is also noted. Below are the data. Refer to the JMP output entitled Predicting Average High Temperature. Latitude 30 41 39 44 41 High 81 63 66 63 66 Low 59 43 44 36 40 Latitude 43 41 30 44 38 High 60 66 81 65 68 Low 39 40 57 39 45 Latitude 45 44 45 33 43 High 64 52 58 88 62 Low 41 34 40 61 39 Latitude 35 38 40 47 40 High 75 78 66 58 65 Low 51 50 37 31 48 (a) [3] What is the prediction equation and the value of R 2 for the simple linear regression of average High temperature on average Low temperature? Pred High = 23.54 + 1.000*Low with R2 =0.848 (b) [3] What is the value of the adjusted R2 for the simple linear regression of average High temperature on average Low temperature? adjusted R2 = 1 − M SError M ST otal =1- 12.94 80.62 = 1 - 0.161 = 0.839 (c) [4] Is the model that uses average Low temperature to predict average High temperature statistically significant? Support you answer. Yes. The F-ratio for the model is 100.36 with associated P-value less than 0.0001. Since the P-value is so small, the model is statistically significant. (d) [3] What is the prediction equation and the value of R 2 for the multiple regression of average High temperature on average Low temperature and Latitude? Pred High = 55.59 + 0.727*Low - 0.503*Latitude with R2 = 0.859 (e) [4] Does Latitude add significantly to the explanatory ability of the model with just average Low temperature? Support your answer. No. The F-ratio for Latitude added to average Low temperature is 1.36 (t-Ratio of -1.17) with associated P-value of 0.2592. This P-value is not small and so we cannot reject the hypothesis that the slope is zero. Latitude is not adding significantly to the model. (f) [4] Is there a statistically significant interaction between average Low temperature and Latitude? Support your answer. No. The F-ratio for Latitude*Low is 0.19 (t-Ratio of -0.43) with an associated Pvalue of 0.6716. Since the P-value is not small we cannot reject the null hypothesis of a zero slope. The interaction between Latitude and average Low temperature is not statistically significant. (g) [4] If there was a significant interaction between average Low temperature and Latitude (not necessarily the case) what would that say about the relationship between average High temperature and average Low temperature? If there was a significant interaction between Latitude and average Low temperature, then the relationship between average High temperature and average Low temperature would be different for different Latitudes. Stat 401B Solutions to Practice Problems for Exam 2 Fall 2003 2 (h) [3] How would you check to see if average Low temperature and Latitude are multicollinear? You could calculate the correlation between average Low temperature and Latitude or you could regress average Low temperature on Latitude and examine the value of R2 . If the value of the correlation coefficient, similarly R 2 , is zero, then there is no collinearity. (i) [3] Give the prediction equation and value of R 2 for the simple linear regression of average High temperature on Latitude. Pred High = 132.90 − 1.64*Latitude with R2 = 0.791. (j) [4] Give an interpretation of the estimated slope coefficient for Latitude. For every one degree increase in Latitude, the average High temperature decreases, on average, 1.64 degrees Fahrenheit. (k) [4] Why is it not a good idea to use the prediction equation relating Latitude to average High temperature to predict the average High temperature in Caracas, Venezuela (10.5 degrees north latitude)? A Latitude of 10.5 degrees is outside the range of the data that were used to construct the prediction equation. Extrapolating beyond the range of the data is dangerous. Additionally, the predicted value, 115.68 degrees Fahrenheit, does not make sense for an average High temperature. (l) [4] Would you suggest adding a Latitude2 term to the model with just Latitude in it? Support your decision based on the information provided in the JMP output. No. There is no pattern in the plot of residuals versus Latitude values. This means that we are doing as well as can be expected and a squared term will probably not add significantly to the model. (m) [4] Describe the distribution of residuals. What does this indicate about the condition of normality? The histogram of residuals is skewed to the right with one low value that could be an outlier. The box plot is fairly symmetric and not potential outliers are flagged. The normal quantile plot has the one low value and a pattern that supports the idea of skew to the right. All of these put the condition of normality in doubt. (n) [3] Which model would you use to predict average High temperature? • • • • average Low temperature alone Latitude alone average Low temperature and Latitude average Low temperature, Latitude and Low*Latitude Explain your choice briefly. The equation that uses average Low temperature. This model is statistically significant and has the highest R2 value among models where all terms are statistically significant.