Exam 3 (100pts) Stat 3500 (W07) Instructor: Lin, Xiaoyan Name:________________ Section: _________ Note: please show all of your work to get partial credit. Good luck! Note: during your intermediate steps of calculation, you’d better keep at least 3 decimal digits. 1.(42pts) Suppose we want to predict intelligence level (y) based on brain size (x1), person’s height (x2, in inches), person’s weight (x3, in lbs). Use the output below (when necessary) to answer the following questions. Predictor Coef Constant 14.63 Brain Si (x1) 1.4409 Height (x2) -0.01625 Weight (x3) -0.2130 S = 21.5728 SE Coef 44.30 0.5706 0.03356 0.1797 R-Sq = 18.9% Source Regression Residual Error Total DF 3 34 37 SS 3571.4 15823.1 18894.6 T 0.33 2.53 -0.48 -1.18 P 0.743 0.016 0.631 0.244 R-Sq(adj) = 11.7% MS 1190.5 465.4 F ____ Predicted Values for New Observations (x1=90, x2=72, x3=180) Fit 104.81 SE Fit 6.46 95% CI (91.68, 117.93) 95% PI_____ (_____, _____) (a).(5pts) Write the proposed model. (b).(5pts) Report the least squares prediction equation . (c).(6pts) Show that weight (x3) is not useful using either a 95% confidence interval or using a hypothesis test with significance level 0.05. If you use CI, be sure to explain briefly why weight is not useful. 1 (d).(8pts) Test the overall usefulness of the model. Use a significance level of 0.10. Ho: Ha: Test Statistic: Rejection Region: Conclusion: (e).(6pts) Calculate a 95% interval for the intelligence level of a student who has a brain size of 90, height is 72 inches, and weight is 180 lbs. Hint: The interval of interest is the blank one in the output!! (f).(6pts) If there was one predictor (x-variable) that you could remove from the model which one would it be and why? (g).(6pts) Suppose we add another predictor to the model. What happens to R2? 2 2. (20pts) Consider a fuel consumption problem in which a natural gas company wishes to predict weekly fuel consumption (y) for its city. We wish to predict y on the basis of average hourly temperature (x1) and the chill index (x2). Data was collected for eight weeks and two models were proposed. Model 1: E( y) 0 1 x1 2 x2 3 x1 x2 4 x1 5 x2 2 Source Regression Residual Error Total DF 5 2 7 SS 25.1889 0.3598 25.5488 2 MS 5.03778 .1799 F 28.00 P 0.025 MS 12.438 0.135 F 92.30 P 0.000 Model 2: E ( y ) 0 1 x1 4 x12 Source Regression Residual Error Total DF 2 5 7 SS 24.875 0.674 25.549 (a).(4pts) Set up the null and alternative hypotheses for testing which model is better. Ho: ___________________ Ha: _____________________ (b).(6pts) Perform the test corresponding to your hypotheses from part a using α = 0.10. Test Statistic: Rejection Region: Conclusion & Interpretation: (c).(10pts) Use Ra2 criterion to see which model is a better model. (hint: use the outputs to calculate R a2 for each model first.) 3 3.(20pts) Suppose a golfer has decided to keep a log of his scores from various courses in Columbia. This golfer is interested in building a regression model to estimate his average score (y) based on what golf course (L.A. Nickell, Lake of the Woods, A.L. Gustin) he is playing. Use the following output to answer the questions given below. NOTE: Each time he played was on a different day (independent of each other). Lake of the Woods 75 74 79 75 Avg. 75.75 L.A. Nickell 79 82 77 78 79.00 A.L. Gustin 80 82 84 80 81.50 (a).(8pts) Propose a model for estimating the golfer’s score based on the course he is playing. Be sure to define any indicator variables, if any. E(y) = (b).(8pts) Calculate the least squares line (prediction equation) by hand. (c).(4pts) Set up the null hypothesis and alternative hypothesis for testing whether the there are significant difference among the different courses. 4 4.(12pts) Consider the second-order interaction model : 2 2 E(y)= 0 1 x1 2 x1 3 x2 4 x1 x2 5 x1 x2 , 1 , level1 . 0 , level 2 where, x1 is a quantitative variable, x2 The resulting least squares prediction equation is ŷ = 48.8 3.4 x1 .07 x1 2.4 x2 3.7 x1 x2 .02 x1 x2 2 2 (a).(8pts) Write down the separate prediction equations for each level. (b).(4pts) Suppose x1 =1, what is the predicted y value for level 2? 5.(6pts) Propose an appropriate model according to the following plot between response variable y and the predictor x. 0 5 10 y 15 20 plot of y vs. x 2 4 6 8 10 x 5