252solnK1 12/02/03 (Open this document in 'Page Layout' view!) K. REGRESSION EXTENSIONS 1. Residual Analysis Text 13.23, 13.24, 13.26, 14.18 [13.20, 13.21, 13.22, 14.9] (13.20, 13.21, 13.22, 14.9) 2. Dummy Variables 14.38-14.39, 14.41 [14.33 – 14.35] (15.6 – 15.8) 3. Nonlinear regression 15.1, 15.6, 15.7 [15.1, 15.6, 15.7] (15.1, 15.13, 15.14) 4. Runs test K1 5. Durbin-Watson test 13.32-13.34 [13.28, 13.29, 13.30] (13.28, 13.29, 13.30) This document includes sections 1-3 --------------------------------------------------------------------------------------------------------------------------------- Residual Analysis Most answers below are from the Instructor’s Solution Manual. Exercise 13.23 [13.20 in 9th]: You have to look at the problem in the book for 20 and 21. A residual analysis of the data indicates no apparent pattern. The assumptions of regression appear to be met. Exercise 13.24 [13.21 in 9th] : A residual analysis of the data indicates a pattern, with sizeable clusters of consecutive residuals that are either all positive or all negative. This appears to violate the assumption of independence of errors. Exercise 13.26 [13.22 in 9th] : (a)-(b) The Instructor’s Solution Manual says “Based on a residual analysis, the model appears to be adequate.” I’m not so sure. I ran the data and got the output. ————— 12/2/2003 4:01:58 PM ———————————————————— Welcome to Minitab, press F1 for help. MTB > Retrieve "C:\Documents and Settings\RBOVE.WCUPANET\My Documents\Drive D\MINITAB\petfood.MTW". Retrieving worksheet from file: C:\Documents and Settings\RBOVE.WCUPANET\My Documents\Drive D\MINITAB\petfood.MTW # Worksheet was saved on Wed Nov 12 2003 Results for: petfood.MTW MTB > Name c7 = 'RESI1' c8 = 'SRES1' MTB > Regress c1 1 c2; SUBC> Residuals 'RESI1'; SUBC> SResiduals 'SRES1'; SUBC> Constant; SUBC> Brief 3. Regression Analysis: Sales versus Space The regression equation is Sales = 1.45 + 0.0740 Space Predictor Constant Space Coef 1.4500 0.07400 SE Coef 0.2178 0.01591 T 6.66 4.65 P 0.000 0.001 1 252solnK1 12/02/03 S = 0.3081 R-Sq = 68.4% R-Sq(adj) = 65.2% Analysis of Variance Source Regression Residual Error Total Obs 1 2 3 4 5 6 7 8 9 10 11 12 Space 5.0 5.0 5.0 10.0 10.0 10.0 15.0 15.0 15.0 20.0 20.0 20.0 DF 1 10 11 SS 2.0535 0.9490 3.0025 Sales 1.6000 2.2000 1.4000 1.9000 2.4000 2.6000 2.3000 2.7000 2.8000 2.6000 2.9000 3.1000 MS 2.0535 0.0949 Fit 1.8200 1.8200 1.8200 2.1900 2.1900 2.1900 2.5600 2.5600 2.5600 2.9300 2.9300 2.9300 F 21.64 SE Fit 0.1488 0.1488 0.1488 0.0974 0.0974 0.0974 0.0974 0.0974 0.0974 0.1488 0.1488 0.1488 P 0.001 Residual -0.2200 0.3800 -0.4200 -0.2900 0.2100 0.4100 -0.2600 0.1400 0.2400 -0.3300 -0.0300 0.1700 St Resid -0.82 1.41 -1.56 -0.99 0.72 1.40 -0.89 0.48 0.82 -1.22 -0.11 0.63 MTB > %Fitline c1 c2; SUBC> Confidence 95.0. Executing from file: W:\wminitab13\MACROS\Fitline.MAC Macro is running ... please wait Regression Analysis: Sales versus Space The regression equation is Sales = 1.45 + 0.074 Space S = 0.308058 R-Sq = 68.4 % R-Sq(adj) = 65.2 % Analysis of Variance Source Regression Error Total DF 1 10 11 SS 2.0535 0.9490 3.0025 MS 2.0535 0.0949 F 21.6386 P 0.001 Fitted Line Plot: Sales versus Space MTB > %Resplots c7 c6; SUBC> Title "Residuals vs Fits". Executing from file: W:\wminitab13\MACROS\Resplots.MAC Macro is running ... please wait Residual Plots: RESI1 vs FITS1 MTB > Plot c7*c1; SUBC> Symbol; SUBC> ScFrame; SUBC> ScAnnotation. 2 252solnK1 12/02/03 Plot RESI1 * Sales MTB > Plot c7*c2; SUBC> Symbol; SUBC> Scram; SUBC> ScAnnotation. Plot RESI1 * Space Regression Plot Sales = 1.45 + 0.074 Space S = 0.308058 R-Sq = 68.4 % R-Sq(adj) = 65.2 % 3.0 Sales 2.5 2.0 1.5 5 10 15 20 Space The plot above looks pretty random. Residuals vs Fits I Chart of Residuals 0.5 0.4 0.3 0.2 0.1 0.0 -0.1 -0.2 -0.3 -0.4 Residual Residual Normal Plot of Residuals 1 UCL=1.081 0 Mean=1.11E-16 -1 -2 -1 0 1 Normal Score Histogram of Residuals 0 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 Residual 10 Residuals v s. Fits Residual Frequency 1 5 Observation Number 3 2 LCL=-1.081 0 2 0.5 0.4 0.3 0.2 0.1 0.0 -0.1 -0.2 -0.3 -0.4 2.0 2.5 3.0 Fit These are the plots that bother me. Normal plots are described on page 227 of the text. The Normal plot looks a lot like the rectangular distribution shown there, though rectangular distributions are not as bad as skewed distributions. The histogram gives us a bimodal distribution. 3 252solnK1 12/02/03 0.5 0.4 0.3 RESI1 0.2 0.1 0.0 -0.1 -0.2 -0.3 -0.4 5 10 15 20 Space This plot is probably the most important and it shows very little in the way of a pattern. This is nice. Exercise14.18 [14.9 in 9th]: The Instructor’s Solution Manual says the following. (a) Based upon a residual analysis the model appears adequate. (b) There is no evidence of a pattern in the residuals versus time. (c) D = 2.26 (d) D = 2.26 > 1.55. There is no evidence of positive autocorrelation in the residuals. For an explanation see the printout. To run this regression I used the Statistics pull-down menu and then picked Regression twice. I had put headings on my columns – the data is in the text and on your CD, but, since I’m lazy, I identified the columns as C1, C2 and C3. So C1 was my response (dependent - Y) variable and C2 and C3 were my predictor (independent – X) variables. There are just too many subcommands here to use the session window to drive Minitab. On the Regression menu I went into Graphs and checked all the residual plots. Under Options I picked Variance Inflation Factors and Durbin-Watson. Under Results I took the last and most complete option, though this can also be done by using the session command ‘Brief 3’ before you start. Under storage I picked both of the residuals, though that seems to have been unnecessary unless I wanted to do some extra plotting. When this regression was finished and I had copied all the graphs into a Word document, I ran Stepwise from the Regression menu using C1, C2 and C3. MTB > Retrieve "C:\Documents and Settings\RBOVE.WCUPANET\My Documents\Drive D\MINITAB\Warecost.MTW". Retrieving worksheet from file: C:\Documents and Settings\RBOVE.WCUPANET\My Documents\Drive D\MINITAB\Warecost.MTW # Worksheet was saved on Thu Nov 20 2003 Results for: Warecost.MTW MTB > Name c4 = 'RESI1' c5 = 'SRES1' MTB > Regress c1 2 c2 c3; SUBC> Residuals 'RESI1'; SUBC> SResiduals 'SRES1'; SUBC> GHistogram; SUBC> GNormalplot; SUBC> GFits; SUBC> GOrder; 4 252solnK1 12/02/03 SUBC> SUBC> SUBC> SUBC> SUBC> SUBC> GVars c2 c3; RType 1; Constant; VIF; DW; Brief 3. Regression Analysis: DistCost versus Sales, Orders The regression equation is DistCost = - 2.73 + 0.0471 Sales + 0.0119 Orders Predictor Constant Sales Orders Coef -2.728 0.04711 0.011947 S = 4.766 SE Coef 6.158 0.02033 0.002249 R-Sq = 87.6% T -0.44 2.32 5.31 P 0.662 0.031 0.000 VIF 2.8 2.8 R-Sq(adj) = 86.4% Analysis of Variance Source Regression Residual Error Total Source Sales Orders DF 1 1 DF 2 21 23 SS 3368.1 477.0 3845.1 MS 1684.0 22.7 F 74.13 P 0.000 Seq SS 2726.8 641.3 Comment: The gigantic p-value tells us that the constant is insignificant, but the coefficients of Sales and Orders were significant at the 5% level. The VIF will be discussed later (Check the end of this document), but a value below 5 is usually fine. The low p-value for the ANOVA tells us that the 2 independent variables explained a lot of the variation in DistCost. Their sequential contribution to the Regression sum of squares is shown below the ANOVA. This makes Order look like a fairly feeble, if significant, contributor to the regression. Actually we will find that is not true. Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Sales 386 446 512 401 457 458 301 484 517 503 535 353 372 328 408 491 527 444 623 596 463 DistCost 52.950 71.660 85.580 63.690 72.810 68.440 52.460 70.770 82.030 74.390 70.840 54.080 62.980 72.300 58.990 79.380 94.440 59.740 90.500 93.240 69.330 Fit 63.425 63.755 84.820 67.082 70.127 67.796 49.839 77.528 84.196 77.503 75.199 48.800 62.311 65.626 63.852 75.145 88.789 59.407 87.302 93.867 70.087 SE Fit 1.332 1.511 1.656 1.332 0.999 1.193 2.134 1.139 1.525 1.126 1.838 2.277 1.483 2.847 1.152 1.069 2.004 2.155 2.535 2.097 1.049 Residual -10.475 7.905 0.760 -3.392 2.683 0.644 2.621 -6.758 -2.166 -3.113 -4.359 5.280 0.669 6.674 -4.862 4.235 5.651 0.333 3.198 -0.627 -0.757 St Resid -2.29R 1.75 0.17 -0.74 0.58 0.14 0.62 -1.46 -0.48 -0.67 -0.99 1.26 0.15 1.75 -1.05 0.91 1.31 0.08 0.79 -0.15 -0.16 5 252solnK1 12/02/03 22 23 24 389 547 415 53.710 89.180 66.800 59.898 87.401 66.535 1.349 1.657 1.107 -6.188 1.779 0.265 -1.35 0.40 0.06 R denotes an observation with a large standardized residual Durbin-Watson statistic = 2.26 Comment: The Durbin-Watson is checking for autocorrelation, which will be explained later. It’s enough to say that values close to 2 indicate that autocorrelation is not a problem. The dependent variable (Y) and the first independent variable are printed out above, followed by Fit, which means the predicted value of Y, SE Fit, which with the appropriate value if t will give us a confidence interval for Y, and Residual, which is the difference between the predicted and the actual value of Y. The standardized residual seems to be the residual after the mean residual has been subtracted from it and the standard deviation of the residual has been divided into it. Residual Histogram for DistCost Normplot of Residuals for DistCost Residuals vs Fits for DistCost Residuals vs Order for DistCost Residuals from DistCost vs Sales Residuals from DistCost vs Orders Histogram of the Residuals (response is DistCost) 7 6 Frequency 5 4 3 2 1 0 -10 -8 -6 -4 -2 0 2 4 6 8 Residual Comment: The histogram seems to indicate a fairly symmetrical distribution with a peak in the middle. This problem is continued in 252solnK1b 6