STAT 401D LAB #8 ANSWER KEY Spring 2016 Question 1 1a) It appears that a straight line model would be a good fit although there is some curvature shown in the relationship. 1b) βˆ1 = ∑ xy − (∑ x∑ y) / n = ∑ x − (∑ x ) / n 2 2 1098.628 − (11.35)(1415.2) /13 = −23.9890373 15.6183 − (11.35 ^ 2) /13 βˆ0 = y − βˆ1 x = 1415.2 /13 − (−23.9890373)(11.35/13) = 129.805813 The prediction equation is π¦π¦οΏ½ =129.80058 -23.98904x 1c) Analysis of Variance You can use Excel or the formulas SSTot = S yy , SS Re= g S xy2 S yy2 , SSE = S yy − S xy2 S yy2 Using Excel to calculate the sums of squares we have: Analysis of Variance Source Regression Error Corrected Total df 1 11 12 SS MS 3285.30973 3285.30973 101.54104 9.23100 3386.85077 F-ratio 355.9 p-value <.0001 Reject π―π―ππ :π·π·ππ = ππ at πΆπΆ =. ππππ since the p-value <. ππππ. Note also the very high R-squared of 97% 1d) Lack Of Fit Test π₯π₯ .01 .48 .71 .95 1.19 1.44 1.96 π¦π¦ 127.6, 130.2, 128.0 124.0, 122.0 110.8, 113.2 103.9 101.5 92.3, 91.7 83.7, 86.3 π¦π¦οΏ½ 128.6 123.0 112.0 103.9 101.5 92.0 85.0 οΏ½οΏ½οΏ½2 (π¦π¦ − π¦π¦) -1.0, 1.6, -.6 1.0, -1.0 -1.2, 1.2 0.0 0.0 0.3, -0.3 -1.3, 1.3 οΏ½οΏ½οΏ½2 οΏ½(π¦π¦ − π¦π¦) df 3.92 2.0 2.88 0.0 0.0 0.18 3.38 2 1 1 0 0 1 1 Totals 12.36 6 Analysis of Variance for Lack-of-fit Test Source Lack of fit Pure Error Total Error df 5 6 11 SS 89.18 12.36 101.54 MS 17.836 2.06 F-ratio 8.66 F(.05,5,6) = 4.39. Reject H0. Conclude that there is significant lack of fit for the linear regression model. e) JMP output (See attached) There is obvious curve pattern in both rsidual plots indicating that we need to fit a higher order regression (quadratic or cubic) here. If one ignores the horizontal parts (due to replicated values) the normal probability plot is ok. Bivariate Fit of y, Weightloss(mg/dm) By x, Fe % 13 y, Weightloss(mg/dm) 12 11 10 90 80 0 1 0.5 2 1.5 x, Fe % Linear Fit Linear Fit y, Weightloss(mg/dm) = 129.80581 - 23.989037*x, Fe % Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.970019 0.967293 3.038257 108.8615 13 Lack Of Fit Source Lack Of Fit Pure Error Total Error DF 5 6 11 Sum of Squares 89.18104 12.36000 101.54104 Mean Square 17.8362 2.0600 F Ratio 8.6584 Prob > F 0.0103* Max RSq Analysis of Variance Source Model Error C. Total DF 1 11 12 Sum of Squares 3285.3097 101.5410 3386.8508 Mean Square 3285.31 9.23 F Ratio 355.8995 Prob > F <.0001* Parameter Estimates Term Intercept x, Fe % Estimate 129.80581 -23.98904 Std Error 1.39378 1.271596 t Ratio 93.13 -18.87 Prob>|t| <.0001* <.0001* Lower 95% 126.73812 -26.7878 Upper 95% 132.8735 -21.19027 Diagnostics Plots Residual by Predicted Plot 6 dm) Residual y, Weightloss(mg/ 4 2 0 -2 -4 80 90 100 110 120 130 y, Weightloss(mg/ dm) Predicted Residual by X Plot 6 dm) Residual y, Weightloss(mg/ 4 2 0 -2 -4 0.0 0.5 1.0 1.5 2.0 x, Fe % Residual Normal Quantile Plot 6 2 0 Normal Quantile 0.9 0.8 0.7 0.6 0.5 0.4 0.3 -4 0.2 -2 0.1 dm) Residual y, Weightloss(mg/ 4 Stat 401D Lab#8 Problem#2 Part I (extracted from Excel Sheet) 21.78275595 -0.439044462 -0.228898883 0.503209019 (X'X)^-1= -0.439044462 0.162046029 0.00404678 -0.178259035 -0.228898883 0.00404678 0.002927964 -0.008225088 0.503209019 -0.178259035 -0.008225088 0.220709129 1237.03 -102.7620126 X'y= 19659.1047 (X'X)^-1 X'y= 1.462968881 118970.1884 0.663365427 17516.935 5.678808862 y'y= SSE= 80256.5195 219.5180305 s.e.(beta1_hat)= s.e.(beta2_hat)= s.e.(beta3_hat)= 1 1 1 1 1 1 1 1 X= 1 1 1 1 1 1 1 1 1 1 1 1 10.2 13.72 15.43 14.37 15 15.02 15.12 15.24 15.24 15.28 13.78 15.67 15.67 15.98 16.5 16.87 17.26 17.28 17.87 19.13 89 90.07 95.08 98.03 99 91.05 105.6 100.8 94 93.09 89 102 99 89.02 95.09 95.02 91.02 98.06 96.01 101 beta_hat' X'y= MSE= 1.4910572 0.2004278 1.740144269 9.3 24.01271538 12.1 45.77283166 13.3 58.41253987 13.4 59.38660176 13.5 61.5196175 12.8 52.29995553 14 68.91279002 13.5 63.0647878 14 yhat= 61.39330733 13.8 59.71240177 12.6 47.99021322 14 67.32930736 13.7 63.63556842 13.9 58.60446359 14.9 69.07064441 14.9 69.56550732 14.3 64.07531815 14.3 68.77467014 16.9 83.04282569 17.3 90.46788351 80037.00147 13.71987691 Spring 2016 Stat 401D Lab#8 r=0.3663 Problem#2 r=0.9484 r=0.9060 r=0.4285 r=0.5895 Part II Spring 2016 18 x1 16 14 12 10 105 r=0.3663 100 x2 95 90 r=0.9484 17 r=0.4285 r=0.9450 15 x3 13 11 9 r=0.9060 r=0.5895 r=0.9450 80 y 60 40 20 10 13 15 17 90 95 100 9 11 13 15 20 40 60 80 Summary of Fit for Full Model (y,x1,x2,x3) RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.941138 0.930102 3.711464 61.8515 20 Analysis of Variance Source Model Error C. Total DF 3 16 19 Sum of Squares 3523.9590 220.3995 3744.3585 Mean Square 1174.65 13.77 F Ratio 85.2745 Prob > F <.0001* Parameter Estimates Term Intercept x1 x2 x3 Estimate -102.762 1.4629701 0.6633643 5.6787649 Std Error 17.32215 1.494048 0.20083 1.743634 Residual by Predicted Plot t Ratio -5.93 0.98 3.30 3.26 Prob>|t| <.0001* 0.3421 0.0045* 0.0049* Lower 95% -139.4833 -1.70427 0.2376242 1.982425 Upper 95% -66.0407 4.6302099 1.0891045 9.3751047 Normal Probability Plot Of Studentized Residuals 2.5 -1.64 -1.28 -0.67 0.0 0.67 1.28 1.64 2 10 1.5 1 5 y Residual 0.5 0 0 -0.5 -1 -5 -1.5 20 30 40 50 60 y Predicted 70 80 90 100 -2 0.03 0.1 0.2 Normal Quantile Plot 0.5 0.8 0.9 0.97 VIF . 10.453922 1.3022167 11.273271 Full Model Residual and Diagnostic Statistics x1 x2 x3 10.2 13.72 15.43 14.37 15 15.02 15.12 15.24 15.24 15.28 13.78 15.67 15.67 15.98 16.5 16.87 17.26 17.28 17.87 19.13 15.5 89 90.07 95.08 98.03 99 91.05 105.6 100.8 94 93.09 89 102 99 89.02 95.09 95.02 91.02 98.06 96.01 101 90 9.3 12.1 13.3 13.4 13.5 12.8 14 13.5 14 13.8 12.6 14 13.7 13.9 14.9 14.9 14.3 14.3 16.9 17.3 14.1 y Predicted Residual 25.93 45.87 56.2 58.6 63.36 46.35 68.99 62.91 58.13 59.79 56.2 66.16 62.18 57.01 65.62 65.03 66.74 73.38 82.87 95.71 . 24.0122 45.7722 58.4119 59.3859 61.5189 52.2993 68.9121 63.0641 61.3926 59.7117 47.9896 67.3286 63.6349 58.6038 69.0699 69.5648 64.0746 68.7740 83.0420 90.4670 59.6874 1.91778 0.09779 -2.21187 -0.78592 1.84107 -5.94931 0.07793 -0.15410 -3.26260 0.07829 8.21043 -1.16860 -1.45487 -1.59377 -3.44990 -4.53476 2.66539 4.60605 -0.17200 5.24297 . Lower 95% Mean 18.4769 42.8583 56.0450 56.3945 59.0808 49.3621 64.3017 60.0190 58.8574 57.4889 44.4731 64.1637 61.0419 55.2570 66.6967 67.2466 59.6011 64.2301 77.7911 86.2333 56.2892 Upper 95% Mean 29.5476 48.6861 60.7787 62.3774 63.9570 55.2365 73.5224 66.1092 63.9278 61.9345 51.5061 70.4935 66.2278 61.9506 71.4431 71.8829 68.5481 73.3178 88.2929 94.7008 63.0856 Lower 95% Indiv 14.3922 37.3820 50.1956 50.9685 53.2819 43.9010 59.7929 54.6274 53.1263 51.5358 39.3715 58.8480 55.3507 50.0536 60.8518 61.3624 55.0238 59.6882 73.5828 81.5323 51.1170 Upper Studentized hats Cook's D 95% Resid Influence Indiv 33.6322 0.72709 0.495 0.130 54.1624 0.02836 0.137 0.000 66.6281 -0.62490 0.090 0.010 67.8034 -0.22895 0.145 0.002 69.7560 0.52173 0.096 0.007 60.6976 -1.72787 0.139 0.121 78.0313 0.02591 0.343 0.000 71.5008 -0.04503 0.150 0.000 69.6589 -0.92859 0.104 0.025 67.8876 0.02199 0.080 0.000 56.6076 2.47292 0.200 0.382 75.8092 -0.34391 0.162 0.006 71.9191 -0.41519 0.109 0.005 67.1540 -0.47449 0.181 0.012 77.2880 -0.97493 0.091 0.024 77.7671 -1.27858 0.087 0.039 73.1254 0.87299 0.323 0.091 77.8597 1.52017 0.334 0.289 92.5012 -0.06223 0.445 0.001 99.4017 1.67596 0.290 0.286 68.2578 . 0.187 . Summary of Fit for the Reduced Model (y,x2,x3) RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.937611 0.930271 3.706967 61.8515 20 Analysis of Variance Source Model Error C. Total DF 2 17 19 Sum of Squares 3510.7511 233.6073 3744.3585 Mean Square 1755.38 13.74 F Ratio 127.7416 Prob > F <.0001* Parameter Estimates Term Intercept x2 x3 Estimate -98.79827 0.6268295 7.2881078 Std Error 16.82213 0.197094 0.581592 Residual by Predicted Plot t Ratio -5.87 3.18 12.53 Prob>|t| <.0001* 0.0055* <.0001* Lower 95% -134.2899 0.2109967 6.0610565 Upper 95% -63.30669 1.0426623 8.5151591 VIF . 1.2572699 1.2572699 Test of H 0 : β 1 = 0 vs. H a : β 1 ≠ 0 10 F={(SSReg(Full)-SSReg(Reduced)/(k-g)}/MSE(Full) y Residual 5 F= 0 (3523.959 − 3510.7511) /(3 − 2) =13.208/13.77=.96 13.77 -5 20 30 40 50 60 70 80 90 100 F.05,1,17 = 4.45 Thus the F-statistic is not in the RR. y Predicted We fail to rej. H 0 : β 1 = 0 in the full model.