Stat479 Assignment #6 Solution Key Fall 2013 Problem 1 (a) Source Regression Error Corrected Total d.f. 1 14 15 SS 2059.78145 974.65605 3034.43750 MS 2059.78145 69.61829 F p-value 29.59 <.0001 (b) 𝛽̂0 =148.05068, s.e.( 𝛽̂0 ) = 11.56292; 𝛽̂1 = -1.02359, s.e.( 𝛽̂1 ) = 0.18818 �= 148.05068−1.02359 𝒙 𝒚 (c) Expected loss in mean muscle mass, E(y) for 1 year increase in age=1.02359. Thus expected loss in mean muscle mass, for 5-year increase in age= 5 X 1.02359 = 5.11795 (d) 𝑅 2 =. 𝟔𝟕𝟖𝟖 = 𝟔𝟕. 𝟖𝟖% This means that 67.88% of the variability in muscle mass is explained by the predicted value from a linear regression model using age as the explanatory variable. (e) 95% C.I. for 𝛽1 : (−1.42720,−0.61998) We have 95% confidence that the expected increase in mean muscle mass, E(y) for 1-year) increase in age lies inside the above interval. (f) A t-test for 𝐻0 : 𝛽1 = 0 against 𝐻1 : 𝛽1 ≠ 0 is: t-value= −5.44 for which the p-value is <.0001; Thus we reject the null hypothesis at 𝛼 = .05 (g) From the SAS output the point estimate E(y) at x= 60 i.e. 𝜇(60) is 86.6353 A 95% confidence interval for the mean muscle mass , E(y) at x= 60 is (82.1579, 91.1127) (h) I. II. III. See SAS Output attached. See attached plot: Assumption of constant variance as x increases appear to be satisfied as the residuals are evenly spread around the zero line as x increases. See attached plots: The above is also true of the plot of residuals against the predicted values. The normal probability plot of the studentized residuals does not show a pattern to indicate that the distribution of the errors deviates from a normal distribution. 05:47 Monday, December 02, 2013 1 Simple Linear Regression of Horsepower on Speed The REG Procedure Model: MODEL1 Dependent Variable: y Muscle Mass Number of Observations Read 17 Number of Observations Used 16 Number of Observations with Missing Values 1 Analysis of Variance Source Sum of Squares DF Mean Square F Value Pr > F 1 2059.78145 2059.78145 Model Error 14 974.65605 Corrected Total 15 3034.43750 29.59 <.0001 69.61829 8.34376 R-Square 0.6788 Root MSE Dependent Mean 86.18750 Adj R-Sq 0.6559 9.68094 Coeff Var Parameter Estimates Variable Label Intercept Intercept x Age DF Parameter Standard Estimate Error t Value Pr > |t| 95% Confidence Limits 1 148.05068 11.56292 12.80 <.0001 123.25067 172.85068 1 -1.02359 0.18818 -5.44 <.0001 -1.42720 -0.61998 05:47 Monday, December 02, 2013 2 Simple Linear Regression of Horsepower on Speed The REG Procedure Model: MODEL1 Dependent Variable: y Muscle Mass Output Statistics Obs Dependent Predicted Std Error Variable Value Mean Predict 95% CL Mean Residual Std Error Student Residual Residual Cook's D -2-1 0 1 2 1 82.0000 75.3758 2.8813 69.1960 81.5556 6.6242 7.830 0.846 | |* | 0.048 2 91.0000 82.5410 2.1910 77.8417 87.2402 8.4590 8.051 1.051 | |** | 0.041 3 100.0000 104.0363 3.8883 95.6968 112.3759 -4.0363 7.382 -0.547 | *| | 0.041 4 68.0000 79.4702 2.4241 74.2710 84.6694 -11.4702 7.984 -1.437 | **| | 0.095 5 87.0000 90.7297 2.2469 85.9106 95.5488 -3.7297 8.036 -0.464 | | | 0.008 6 73.0000 73.3287 3.1527 66.5667 80.0906 -0.3287 7.725 -0.0425 | | | 0.000 7 78.0000 78.4466 2.5252 73.0307 83.8625 -0.4466 7.952 -0.0562 | | | 0.000 8 80.0000 90.7297 2.2469 85.9106 95.5488 -10.7297 8.036 -1.335 | **| | 0.070 9 65.0000 70.2579 3.5955 62.5463 77.9695 -5.2579 7.529 -0.698 | *| | 0.056 10 84.0000 81.5174 2.2557 76.6793 86.3554 2.4826 8.033 0.309 | | | 0.004 11 116.0000 101.9892 3.5764 94.3186 109.6597 14.0108 7.538 1.859 | |*** | 0.389 12 76.0000 88.6825 93.2633 -12.6825 8.066 -1.572 | ***| | 0.087 13 97.0000 101.9892 3.5764 94.3186 109.6597 -4.9892 7.538 -0.662 | *| | 0.049 14 100.0000 93.8004 2.5120 88.4128 99.1881 6.1996 7.957 0.779 | |* | 0.030 15 105.0000 97.8948 2.9973 91.4663 104.3233 7.1052 7.787 0.912 | |* | 0.062 16 77.0000 68.2107 3.9082 59.8285 76.5929 8.7893 7.372 1.192 | |** | 0.200 17 . 86.6353 2.0876 82.1579 91.1127 . . 2.1358 84.1017 Sum of Residuals Sum of Squared Residuals Predicted Residual SS (PRESS) 0 974.65605 1277.86656 . . 05:47 Monday, December 02, 2013 3 Simple Linear Regression of Horsepower on Speed The REG Procedure Model: MODEL1 05:47 Monday, December 02, 2013 4 Simple Linear Regression of Horsepower on Speed The REG Procedure Model: MODEL1 05:47 Monday, December 02, 2013 5 Simple Linear Regression of Horsepower on Speed The REG Procedure Model: MODEL1 05:47 Monday, December 02, 2013 6 Simple Linear Regression of Horsepower on Speed The REG Procedure Model: MODEL1 Problem 2 The case statistics and the plots shown (see attached SAS outputs for this part) show clearly that (a) Car O is an x-outlier. The Hat Diag for this case is 0.27 which is markedly larger than the other hats (as well as it is larger than the cutoff 4/16= .25). It stands well away from the other cars in the xdirection in the plots MPG vs. Weight. Clearly, several plots shown in the diagnostics panel are affected by this case. (b) Car A is a possible y-outlier. Its observed value is much smaller than the value predicted by the fitted line. The RStudent for case A is 3.91 which is larger than the 5% critical value of 3.62 from Table B.10. (c) The two largest Cooks’D values are the case A and O above. For Car O, this statistic is large primarily because it is a high leverage case (i.e. the Hat Diag is large) and not because it is a y-outlier. Thus it fits the model well but has very high influence. For Car A, Cooks D is large clearly because it is a y-outlier, and therefore does not fit the model very well at all. (d) The following is a summary of statistics resulting from fitting the model to three different data sets: Model All data A deleted O deleted • • Estimated β 0 Estimated β1 41.57 43.38 39.21 -.00681 -.00725 -.00608 MSE 8.57 4.24 8.43 R2 .69 .84 .60 The case statistics for model fitted with A deleted improves the model significantly. indicate the Car A is still influential but not a y-outlier and the plots also support this. The case statistics for model fitted with Car O deleted does not give a better fitting model overall. (e) Clearly, the case statistics in Parts a), b) and c) for the model fitted for the complete data set indicated the outcome of part d) . That is, removing a case that is highly influential affects the fit of the model. If the influential case is a y-outlier, the model fit is expected to “improve.” Thus instead of refitting models with cases deleted, the user can use the case statistics from the original fit to make similar conclusions. This is the way these statistics are meant to be used. Also the other statistics like DFFITS and DFBETAS can be used to determine how each of the suspected cases affect the overall model fit. 05:51 Monday, December 02, 2013 1 The SAS System The REG Procedure Model: MODEL1 Dependent Variable: y MPG Number of Observations Read 16 Number of Observations Used 16 Analysis of Variance Source Sum of Squares DF Mean Square F Value Pr > F 1 262.30984 262.30984 Model Error 14 120.01453 Corrected Total 15 382.32437 30.60 <.0001 8.57247 2.92788 R-Square 0.6861 Root MSE Dependent Mean 21.71875 Adj R-Sq 0.6637 13.48087 Coeff Var Parameter Estimates Variable Label Intercept Intercept x Weight(lbs.) DF Parameter Standard Estimate Error t Value Pr > |t| 1 41.57248 3.66300 11.35 <.0001 1 -0.00681 0.00123 -5.53 <.0001 05:51 Monday, December 02, 2013 2 Output Statistics The SAS System Dependent Predicted Std Error Std Error Student Obs Car Variable Value Mean Predict Residual Residual Residual 1 A 16.0000 23.7375 2 B 21.0000 22.0017 The REG Procedure -7.7375 2.811 -2.752 Model: MODEL1 0.7338 -1.0017Variable: 2.834y MPG -0.353 Dependent 3 C 22.8000 25.7797 1.0367 -2.9797 4 D 21.4000 19.6872 0.8189 5 E 18.7000 18.1556 6 F 19.1000 7 G 8 Cook's D RStudent -2-1 0 1 2 | *****| | 0.321 -3.9150 | | | 0.004 -0.3421 2.738 -1.088 | **| | 0.085 -1.0960 1.7128 2.811 0.609 | |* | 0.016 0.5951 0.9750 0.5444 2.761 0.197 | | | 0.002 0.1903 17.6791 1.0340 1.4209 2.739 0.519 | |* | 0.019 0.5047 14.3000 17.2706 1.0874 -2.9706 2.718 -1.093 | | 0.096 -1.1010 H 24.4000 22.5803 0.7484 1.8197 2.831 0.643 | |* | 0.014 0.6288 9 I 22.8000 20.1297 0.7863 2.6703 2.820 0.947 | |* | 0.035 0.9431 10 J 19.2000 19.5170 0.8332 -0.3170 2.807 -0.113 | | | 0.001 -0.1089 11 K 16.4000 16.5899 1.1813 -0.1899 2.679 -0.0709 | | | 0.000 -0.0683 12 L 17.3000 16.1815 1.2401 1.1185 2.652 0.422 | | | 0.019 0.4090 13 M 30.4000 26.5966 1.1460 3.8034 2.694 1.412 | |** | 0.180 1.4689 14 N 25.5000 24.7926 0.9190 0.7074 2.780 0.254 | | | 0.004 0.2458 15 O 31.9000 29.1493 1.5298 2.7507 2.496 1.102 | |** | 0.228 1.1110 16 P 26.3000 27.6517 1.2985 -1.3517 2.624 -0.515 | | 0.032 -0.5011 0.8179 **| *| Output Statistics DFBETAS Hat Diag Cov Obs Car H Ratio DFFITS Intercept x 1 A 0.0780 0.2649 -1.1390 -0.7017 0.5082 2 B 0.0628 1.2155 -0.0886 -0.0237 0.0062 3 C 0.1254 1.1112 -0.4149 -0.3465 0.2938 4 D 0.0782 1.1924 0.1734 -0.0452 0.0777 5 E 0.1109 1.2972 0.0672 -0.0334 0.0444 6 F 0.1247 1.2746 0.1905 -0.1049 0.1346 7 G 0.1379 1.1256 -0.4404 0.2599 -0.3257 8 H 0.0653 1.1686 0.1662 0.0664 -0.0346 9 I 0.0721 1.0950 0.2629 10 J 0.0810 1.2597 -0.0323 0.0095 -0.0154 11 K 0.1628 1.3843 -0.0301 0.0194 -0.0236 12 L 0.1794 1.3776 0.1912 13 M 0.1532 1.0074 0.6248 0.5508 -0.4807 14 N 0.0985 1.2746 0.0812 0.0611 -0.0491 15 O 0.2730 1.3306 0.6808 0.6509 -0.5978 16 P 0.1967 1.3895 -0.2480 -0.0452 -0.1287 -0.2286 0.0961 0.1544 0.2048 05:51 Monday, December 02, 2013 3 The SAS System The REG Procedure Model: MODEL1 Dependent Variable: y MPG 05:51 Monday, December 02, 2013 4 02:44 Wednesday, November 06, 2013 1 The SAS System The REG Procedure Model: MODEL1 Dependent Variable: y MPG Number of Observations Read 15 Number of Observations Used 15 Analysis of Variance Source Sum of Squares DF Mean Square F Value Pr > F 1 292.36215 292.36215 Model Error 13 55.07785 Corrected Total 14 347.44000 69.01 <.0001 4.23676 2.05834 R-Square 0.8415 Root MSE Dependent Mean 22.10000 Adj R-Sq 0.8293 9.31375 Coeff Var Parameter Estimates Variable Label Intercept Intercept x Weight(lbs.) DF Parameter Standard Estimate Error t Value Pr > |t| 1 43.37935 2.61617 16.58 <.0001 1 -0.00725 0.00087239 -8.31 <.0001 02:44 Wednesday, November 06, 2013 4 02:46 Wednesday, November 06, 2013 1 The SAS System The REG Procedure Model: MODEL1 Dependent Variable: y MPG Number of Observations Read 15 Number of Observations Used 15 Analysis of Variance Source Sum of Squares DF Mean Square F Value Pr > F 1 162.14910 162.14910 Model Error 13 109.60690 Corrected Total 14 271.75600 19.23 0.0007 8.43130 2.90367 R-Square 0.5967 Root MSE Dependent Mean 21.04000 Adj R-Sq 0.5656 13.80071 Coeff Var Parameter Estimates Variable Label Intercept Intercept x Weight(lbs.) DF Parameter Standard Estimate Error t Value Pr > |t| 1 39.20810 4.21015 9.31 <.0001 1 -0.00608 0.00139 -4.39 0.0007 02:46 Wednesday, November 06, 2013 4