EXAMPLE EXAM QUESTIONS ON SIMPLE LINEAR REGRESSION Questions 1-7 refer to the following situation: Stock Prices, Y, are assumed to be affected by the annual rate of dividend of stock, X. A simple linear regression analysis was performed on 20 observations and the results were:: Regression Equation Section Independent Regression Standard Variable Coefficient Error INTERCEPT -7.964633 3.11101359 X1 12.548580 1.27081204 T-Value (Ho: B=0) -2.560 9.874 Prob Level 0.0166 0.0001 1. What statistical conclusion should you make about the effect of the dividend on average stock price? A. Since 11.30869 > table value, reject the null hypothesis. B. Since 12.54858 > table value, reject the null hypothesis. C. Since 9.874 < table value, reject the null hypothesis. D. Since 9.874 > table value, reject the null hypothesis. E. Since 0.7895 < table value, fail to reject the null hypothesis. 2. What is the 95% confidence interval for a value of Y given an X value of 2.36? You are given the standard error of this estimate is 3.351 1) in the sample is interpreted as: I am 95% confident that A. the stock price for a stock with a dividend rate of 2.36% falls between $14.61 and $28.69. B. the mean stock price for all stocks with a dividend rate of 2.36% falls between $14.61 and $28.69. C. the variance in stock price for all stocks falls between $14.61 and $28.69. D. the dividend rate for all stocks falls between $14.61 and $28.69. E. for each one point increase in dividend rate, the stock price will increase from $14.61 and $28.69 3. Which one of the following assumptions is incorrectly stated? A. The stock price is normally distributed for any dividend rate. B. The stock price has the same variability for any dividend rate. C. The stock price for any dividend rate is a linear function of dividend rate. D. The difference between the stock price and the expected stock price given the dividend rate is independent from company to company. 4. The interpretation of 0.7895, the value of R-square (the coefficient of determination) is: A. 78.95% of the sample stock prices (around the mean stock price) can be attributed to a linear relationship with the dividend rate in the population. B. the mean stock price will be estimated to increase $97.50 for each point increase in the rate. C. the mean stock price will be increase $78.95 for each point increase in the rate. D. the stock price will increase $78.95 for each point increase in the rate. E. 78.95% of the sample variability in stock price (around the mean stock price) can be attributed to a linear relationship with the dividend rate. 5. What is the estimate of the change in expected stock prices when the dividend rate increases by one point? A. 97.50 B. -7.964633 C. This is a parameter not a statistic. D. 12.54858 E. 5.36546 6. The estimate of the slope will vary from sample to sample, the estimate of the standard deviation of betahat is: A. 3.36284 B. 3.14983 C. 0.39274 D. 12.54858 E. 1.27081 7. A 95% confidence interval for the average stock price given the rate of return will use the following t value: A. 9.874 B. -2.560 C. 2.101 D. 2.045 E. 2.153 Answers to 1-7 1. D from computer printout use the t-test value across from X1 2. A this is a confidence interval for a conditional mean 3. C the mean stock price falls on the line 4. E r-square is % of sample variation of y explained by x 5. D This is beta-hat – see computer printout to the right of X1 6. E This is the standard error of hat to right of X1 7. C All t-values in simple linear regression have n-2 d. f. Questions 8-17 are concerned with the following situation: A fire insurance company wants to relate the amount of fire damage (y) in major residential fires to the distance between the residence and the nearest fire station (x). The study is to be conducted in a large suburb of a major city, a sample of 15 recent fires in this suburb is selected. The 15 values and the printout follow: OBS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 X 3.4 1.8 4.6 2.3 3.1 5.5 0.7 3.0 2.6 4.3 2.1 1.1 6.1 4.8 3.8 3.5 Y 26.2 17.8 31.3 23.1 27.5 36.0 14.1 22.3 19.6 31.3 24.0 17.3 43.2 36.4 26.1 . Dependent Variable: Y $1000 fire damage Analysis of Variance Source DF Model 1 Error 13 Total(Adjusted) 14 Root MSE Dep Mean C.V. Sum of Squares 2.31635 26.41333 8.76961 841.76636 69.75098 911.51733 R-square Adj R-sq Mean Square 841.76636 5.36546 F Value Pro 156.886 0.0001 0.9235 0.9176 Parameter Estimates Variable Parameter Estimate INTERCEPT X 10.277929 1.42027781 4.919331 0.39274775 Dep Obs 16 Actual Y . Standard T for H0: Error Parameter=0 Prob > |T| 7.237 12.525 Predicted Value 95% LCL Mean 27.4956 26.1901 0.0001 0.0001 95% UCL 95% LCL Mean Individual 28.8011 22.3239 95% Individual 32.66 8. Which one of the following assumptions is incorrect? (A) The difference between the fire damage and the expected fire damage given the distance is independent from house to house. (B) The fire damage is normally distributed for any distance. (C) The mean fire damage has the same variability for any distance. (D) The mean fire damage for any distance is a linear function of distance. 9. You will find the value 4.919331 in the printout under Parameter Estimates. This is interpreted as: (A) The mean fire damage will increase $4,919.33 for each mile from the fire station. (B) The mean fire damage will be estimated to increase $4,919.33 for each mile from the fire station. (C) The fire damage will increase $4,919.33 for each mile from the fire station. (D) The mean fire damage will be $4,919.33 given the distance. (E) The estimated mean fire damage will be $4,919.33 given the distance. 10. The estimate of the standard deviation of fire damage for all homes the same distance from the fire station is (in thousands of dollars) (A) 0.392744775 (B) 2.31635 (C) no information available. (D) 69.75098 (E) 5.36546 11. The interpretation of 0.9235, the value of R-square (the coefficient of determination) is: (A) 92.35% of the variability in fire damage (around the mean fire damage) can be attributed to a linear relationship with the distance to the fire station in the population. (B) the mean fire damage will be estimated to increase $923.50 for each mile from the fire station. (C) the mean fire damage will be increase $923.50 for each mile from the fire station. (D) the fire damage will increase $923.50 for each mile from the fire station. (E) 92.35% of the sample variability in fire damage (around the mean fire damage) can be attributed to a linear relationship with the distance to the fire station. 12. To test the null hypothesis that the parameter of the slope is zero, the test statistic value is: (A) 0.9235 (B) 0.9176 (C) 0.39274775. (D) 12.525 (E) 7.237 13. For testing the slope is zero versus the alternative that the slope is not zero (use alpha of 0.05), the rejection region is: Reject the null hypothesis if (A) t > 2.160 or t < -2.160 (B) | t | < 12.525 (C) t > 1.771 (D) t > 12.525 (E) t > 2.160 14. The 95% confidence interval for the mean fire damage for all house 3.5 miles from the fire station is: (in thousands of dollars) (A) 15.3442 to 25.8279 (B) 4.070 to 5.768 (C) 10.1999 to 21.1785 (D) 13.4329 to 17.9455 (E) 26.1901 to 28.8011 15. The 95% confidence interval for the mean (25.7076 to 28.2997) for the first house (OBS 1) in the sample is interpreted as: I am 95% confident that (A) the fire damage for a house 3.4 miles from the fire station falls between $25,707.60 and $28,299.70. (B) the fire damage for all houses 3.4 miles from the fire station falls between $25,707.60 and $28,299.70. (C) the variance in fire damage for all houses 3.4 miles from the fire station falls between $25,707.60 and $28,299.70. (D) the average fire damage for all houses 3.4 miles from the fire station falls between $25,707.60 and $28,299.70. (E) for each one mile from the fire station, the mean fire damage will increase from $25,707.60 and $28,299.70 16. In this sample for each one standard deviation that a house is from the fire station, the mean fire damage will be estimated to increase 0.96 standard deviations. This is (A) the coefficient of correlation, r (B) the sample standard deviation, s (C) the test statistic value, t (D) coefficient of determination, r-square (E) the least squares coefficient, beta hat 17. The difference between the actual value of y and the predicted value of y (y-yhat) is called (A) a standard deviation (B) a slope (C) a residual (D) a sample standard deviation (E) an error ANSWERS for 8-17 8. 9. 10 11 12 13 14 15 16 17 C fire damage has the same variance given distance for any distance B this is the beta hat B this is an estimate of sigma of y given x, the square root of MSE E r-square is % of sample variation of y explained by x D use t value from computer printout across from X. A use a t with n-2 degrees of freedom E see the 16th observation under the “mean” columns D this is a confidence interval for a conditional mean A this is the definition of pearson’s r from class notes C this is the definition of the residual QUESTIONS 18-27 DEAL WITH THE FOLLOWING SITUATION: The expected sales of a product in a city are assumed to be affected by the per capita discretionary income and the population of the city. Per capita discretionary income will be referred to as PCDI in all the questions. In Questions 1-10 examine only the effect of per capita discretionary income on the mean sales. Thus the following model is hypothesized: E(Y) = B0 + B1 X1 where Y = Sales (in thousands of dollars) X1 = Per Capita Discretionary Income (in dollars) A sample of 15 cities, along with their sales, per capita discretionary income, and the population of the city (in thousands) is given in the attached printout. The 15 values and a printout follow: OBS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 INCOME 2450 3254 3802 2838 2347 3782 3008 2450 2137 2560 4020 4427 2660 2088 2605 2500 3500 Root MSE Dep Mean 49.51434 150.60000 SALES 162 120 223 131 67 169 81 192 116 55 252 232 144 103 212 . . R-square Adj R-sq 0.4087 0.3632 Parameter Estimates Variable INTERCEP INCOME Dep Obs 16 17 Coefficient Estimate -10.207 0.054 Standard Error 55.147 0.018 95% LCL Actual Predicted Mean . . 125.5 179.8 92.5 145.1 T for H0: B=0 Prob -0.185 0.8560 2.998 0.0103 95% UCL 95% LCL 95% UCL Mean Individual Individual 158.5 214.5 13.5 67.3 237.5 292.3 18. The 95% confidence interval for the mean sales of all cities with PCDI = 2500 is A. 92.5 to 158.5 B. can not be calculated because of missing values C. 3500 D. 88.6 to 156.9 E. 13.5 to 237.5 19. When testing the null hypothesis that the slope equals to zero versus the alternative hypothesis that the slope does not equal to zero, the rejection region would be: reject the Null if A. t > t(14, 0.025) or t < -t(14, 0.025) B. t > t(13, 0.05) C. F < F(1, 13, 0.05) D. |t| > t(13, 0.025) E. p-value > alpha 20. What distribution would you use to infer about the variation of sales among all cities with the same PCDI? A. the Chi-square distribution B. the t distribution C. the F distribution D. a t with no interaction and an F with interaction 21. Given the p-value of the F-test is 0.0103, we can interpret this as A. Given the null is true, there is a 1.03% chance of finding this value of the test statistic or something more extreme. B. The percent of sample variability of Y explained by the independent variable is 1.03% C. There is a 98.97% probability that the null hypothesis is right. D. There is a 98.97% probability that the null hypothesis is wrong. E. The probability of a type I error is 0.0103. 22. Does the PCDI help predict the sales of the product? A. Yes, because 2.998 > the table value B. No, because .8560 is greater than alpha C. Yes, because 8.986 < the table value D. Yes, because of MSE = 2451.66959 E. No, because 0.018 is less than the table value 23. What is the interpretation of the coefficient of determination? A. Don't know and don't care (Hint, this is a wrong answer and best left unspoken within hearing of instructor). B. 40.87 probability that sales is linearly related to PCDI. C. 40.87 percent of the sample variability of sales can be attributed to changes in PCDI. D. 40.87 percent of the variability of PCDI can be attributed to a linear relationship between mean PCDI and sales. E. 40.87 percent of the sample variability of PCDI can be attributed to a linear relationship between mean PCDI and sales. 24. What table value would you use in the calculation of a 90% confidence interval for a value of Y given a value of X? A. 1.645 B. 3.140 C. 1.771 D. 2.650 E. 2.998 25. How many estimated standard errors is the point estimate of the slope away from zero? Slope is the change in the mean sales for each dollar increase in PCDI. A. 0.054 B. 0.4087 C. -10.207 D. 2.998 E. 0.018 26. You know that most cities have small PCDI and only a few have large PCDI. Is this a violation of any assumption? A. Yes, because the variation of PCDI would then be unequal. B. No, because sales has to be normally distributed but PCDI does not have to be. C. Yes, this would violate the linear relationship between the mean sales and PCDI. D. No, because the variance of sales has nothing to do with the problem. E. Yes, a violation of normality. 27. What would be the change in the estimated mean sales for each one standard deviation increase in PCDI? A. 0.3632 standard deviations B. can not be calculated. C. 0.4087 squared dollars D. 0.6393 (square root of 0.4087) standard deviations E. 0.0540 dollars Answers to 18-27 ----------------------18. A see observation number 16 19. D use a t with n-2 degrees of freedom 20. A variance is related to chi-squared, see Table 3 in class notes 21. A see definition of p-value in text book 22. A use the F test here 23. C see the definition of r-squared 24 C use t with n-2 d.f 25 D defintion of t-test value 26 B assumptions apply to y|x or to e but not on x 27. D this is the definition of r in class notes