K. REGRESSION EXTENSIONS 252solnK1 12/02/03

advertisement
252solnK1 12/02/03
(Open this document in 'Page Layout' view!)
K. REGRESSION EXTENSIONS
1. Residual Analysis
Text 13.23, 13.24, 13.26, 14.18 [13.20, 13.21, 13.22, 14.9] (13.20, 13.21, 13.22, 14.9)
2. Dummy Variables
14.38-14.39, 14.41 [14.33 – 14.35] (15.6 – 15.8)
3. Nonlinear regression
15.1, 15.6, 15.7 [15.1, 15.6, 15.7] (15.1, 15.13, 15.14)
4. Runs test
K1
5. Durbin-Watson test
13.32-13.34 [13.28, 13.29, 13.30] (13.28, 13.29, 13.30)
This document includes sections 1-3
---------------------------------------------------------------------------------------------------------------------------------
Residual Analysis
Most answers below are from the Instructor’s Solution Manual.
Exercise 13.23 [13.20 in 9th]: You have to look at the problem in the book for 20 and 21. A residual
analysis of the data indicates no apparent pattern. The assumptions of regression appear
to be met.
Exercise 13.24 [13.21 in 9th] : A residual analysis of the data indicates a pattern, with sizeable clusters of
consecutive residuals that are either all positive or all negative. This appears to violate the
assumption of independence of errors.
Exercise 13.26 [13.22 in 9th] : (a)-(b)
The Instructor’s Solution Manual says “Based on a residual
analysis, the model appears to be adequate.” I’m not so sure. I ran the data and got the output.
————— 12/2/2003 4:01:58 PM ————————————————————
Welcome to Minitab, press F1 for help.
MTB > Retrieve "C:\Documents and Settings\RBOVE.WCUPANET\My Documents\Drive
D\MINITAB\petfood.MTW".
Retrieving worksheet from file: C:\Documents and Settings\RBOVE.WCUPANET\My
Documents\Drive D\MINITAB\petfood.MTW
# Worksheet was saved on Wed Nov 12 2003
Results for: petfood.MTW
MTB > Name c7 = 'RESI1' c8 = 'SRES1'
MTB > Regress c1 1 c2;
SUBC>
Residuals 'RESI1';
SUBC>
SResiduals 'SRES1';
SUBC>
Constant;
SUBC>
Brief 3.
Regression Analysis: Sales versus Space
The regression equation is
Sales = 1.45 + 0.0740 Space
Predictor
Constant
Space
Coef
1.4500
0.07400
SE Coef
0.2178
0.01591
T
6.66
4.65
P
0.000
0.001
1
252solnK1 12/02/03
S = 0.3081
R-Sq = 68.4%
R-Sq(adj) = 65.2%
Analysis of Variance
Source
Regression
Residual Error
Total
Obs
1
2
3
4
5
6
7
8
9
10
11
12
Space
5.0
5.0
5.0
10.0
10.0
10.0
15.0
15.0
15.0
20.0
20.0
20.0
DF
1
10
11
SS
2.0535
0.9490
3.0025
Sales
1.6000
2.2000
1.4000
1.9000
2.4000
2.6000
2.3000
2.7000
2.8000
2.6000
2.9000
3.1000
MS
2.0535
0.0949
Fit
1.8200
1.8200
1.8200
2.1900
2.1900
2.1900
2.5600
2.5600
2.5600
2.9300
2.9300
2.9300
F
21.64
SE Fit
0.1488
0.1488
0.1488
0.0974
0.0974
0.0974
0.0974
0.0974
0.0974
0.1488
0.1488
0.1488
P
0.001
Residual
-0.2200
0.3800
-0.4200
-0.2900
0.2100
0.4100
-0.2600
0.1400
0.2400
-0.3300
-0.0300
0.1700
St Resid
-0.82
1.41
-1.56
-0.99
0.72
1.40
-0.89
0.48
0.82
-1.22
-0.11
0.63
MTB > %Fitline c1 c2;
SUBC>
Confidence 95.0.
Executing from file: W:\wminitab13\MACROS\Fitline.MAC
Macro is running ... please wait
Regression Analysis: Sales versus Space
The regression equation is
Sales = 1.45 + 0.074 Space
S = 0.308058
R-Sq = 68.4 %
R-Sq(adj) = 65.2 %
Analysis of Variance
Source
Regression
Error
Total
DF
1
10
11
SS
2.0535
0.9490
3.0025
MS
2.0535
0.0949
F
21.6386
P
0.001
Fitted Line Plot: Sales versus Space
MTB > %Resplots c7 c6;
SUBC>
Title "Residuals vs Fits".
Executing from file: W:\wminitab13\MACROS\Resplots.MAC
Macro is running ... please wait
Residual Plots: RESI1 vs FITS1
MTB > Plot c7*c1;
SUBC>
Symbol;
SUBC>
ScFrame;
SUBC>
ScAnnotation.
2
252solnK1 12/02/03
Plot RESI1 * Sales
MTB > Plot c7*c2;
SUBC>
Symbol;
SUBC>
Scram;
SUBC>
ScAnnotation.
Plot RESI1 * Space
Regression Plot
Sales = 1.45 + 0.074 Space
S = 0.308058
R-Sq = 68.4 %
R-Sq(adj) = 65.2 %
3.0
Sales
2.5
2.0
1.5
5
10
15
20
Space
The plot above looks pretty random.
Residuals vs Fits
I Chart of Residuals
0.5
0.4
0.3
0.2
0.1
0.0
-0.1
-0.2
-0.3
-0.4
Residual
Residual
Normal Plot of Residuals
1
UCL=1.081
0
Mean=1.11E-16
-1
-2
-1
0
1
Normal Score
Histogram of Residuals
0
-0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4
Residual
10
Residuals v s. Fits
Residual
Frequency
1
5
Observation Number
3
2
LCL=-1.081
0
2
0.5
0.4
0.3
0.2
0.1
0.0
-0.1
-0.2
-0.3
-0.4
2.0
2.5
3.0
Fit
These are the plots that bother me. Normal plots are described on page 227 of the text. The Normal plot
looks a lot like the rectangular distribution shown there, though rectangular distributions are not as bad as
skewed distributions. The histogram gives us a bimodal distribution.
3
252solnK1 12/02/03
0.5
0.4
0.3
RESI1
0.2
0.1
0.0
-0.1
-0.2
-0.3
-0.4
5
10
15
20
Space
This plot is probably the most important and it shows very little in the way of a pattern. This is nice.
Exercise14.18 [14.9 in 9th]: The Instructor’s Solution Manual says the following.
(a)
Based upon a residual analysis the model appears adequate.
(b)
There is no evidence of a pattern in the residuals versus time.
(c)
D = 2.26
(d)
D = 2.26 > 1.55. There is no evidence of positive autocorrelation in the residuals.
For an explanation see the printout.
To run this regression I used the Statistics pull-down menu and then picked Regression twice. I had put
headings on my columns – the data is in the text and on your CD, but, since I’m lazy, I identified the
columns as C1, C2 and C3. So C1 was my response (dependent - Y) variable and C2 and C3 were my
predictor (independent – X) variables. There are just too many subcommands here to use the session
window to drive Minitab. On the Regression menu I went into Graphs and checked all the residual plots.
Under Options I picked Variance Inflation Factors and Durbin-Watson. Under Results I took the last and
most complete option, though this can also be done by using the session command ‘Brief 3’ before you
start. Under storage I picked both of the residuals, though that seems to have been unnecessary unless I
wanted to do some extra plotting. When this regression was finished and I had copied all the graphs into a
Word document, I ran Stepwise from the Regression menu using C1, C2 and C3.
MTB > Retrieve "C:\Documents and Settings\RBOVE.WCUPANET\My Documents\Drive
D\MINITAB\Warecost.MTW".
Retrieving worksheet from file: C:\Documents and Settings\RBOVE.WCUPANET\My
Documents\Drive D\MINITAB\Warecost.MTW
# Worksheet was saved on Thu Nov 20 2003
Results for: Warecost.MTW
MTB > Name c4 = 'RESI1' c5 = 'SRES1'
MTB > Regress c1 2 c2 c3;
SUBC>
Residuals 'RESI1';
SUBC>
SResiduals 'SRES1';
SUBC> GHistogram;
SUBC> GNormalplot;
SUBC> GFits;
SUBC> GOrder;
4
252solnK1 12/02/03
SUBC>
SUBC>
SUBC>
SUBC>
SUBC>
SUBC>
GVars c2 c3;
RType 1;
Constant;
VIF;
DW;
Brief 3.
Regression Analysis: DistCost versus Sales, Orders
The regression equation is
DistCost = - 2.73 + 0.0471 Sales + 0.0119 Orders
Predictor
Constant
Sales
Orders
Coef
-2.728
0.04711
0.011947
S = 4.766
SE Coef
6.158
0.02033
0.002249
R-Sq = 87.6%
T
-0.44
2.32
5.31
P
0.662
0.031
0.000
VIF
2.8
2.8
R-Sq(adj) = 86.4%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
Sales
Orders
DF
1
1
DF
2
21
23
SS
3368.1
477.0
3845.1
MS
1684.0
22.7
F
74.13
P
0.000
Seq SS
2726.8
641.3
Comment: The gigantic p-value tells us that the constant is insignificant, but the coefficients of
Sales and Orders were significant at the 5% level. The VIF will be discussed later (Check the end of this
document), but a value below 5 is usually fine. The low p-value for the ANOVA tells us that the 2
independent variables explained a lot of the variation in DistCost. Their sequential contribution to the
Regression sum of squares is shown below the ANOVA. This makes Order look like a fairly feeble, if
significant, contributor to the regression. Actually we will find that is not true.
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Sales
386
446
512
401
457
458
301
484
517
503
535
353
372
328
408
491
527
444
623
596
463
DistCost
52.950
71.660
85.580
63.690
72.810
68.440
52.460
70.770
82.030
74.390
70.840
54.080
62.980
72.300
58.990
79.380
94.440
59.740
90.500
93.240
69.330
Fit
63.425
63.755
84.820
67.082
70.127
67.796
49.839
77.528
84.196
77.503
75.199
48.800
62.311
65.626
63.852
75.145
88.789
59.407
87.302
93.867
70.087
SE Fit
1.332
1.511
1.656
1.332
0.999
1.193
2.134
1.139
1.525
1.126
1.838
2.277
1.483
2.847
1.152
1.069
2.004
2.155
2.535
2.097
1.049
Residual
-10.475
7.905
0.760
-3.392
2.683
0.644
2.621
-6.758
-2.166
-3.113
-4.359
5.280
0.669
6.674
-4.862
4.235
5.651
0.333
3.198
-0.627
-0.757
St Resid
-2.29R
1.75
0.17
-0.74
0.58
0.14
0.62
-1.46
-0.48
-0.67
-0.99
1.26
0.15
1.75
-1.05
0.91
1.31
0.08
0.79
-0.15
-0.16
5
252solnK1 12/02/03
22
23
24
389
547
415
53.710
89.180
66.800
59.898
87.401
66.535
1.349
1.657
1.107
-6.188
1.779
0.265
-1.35
0.40
0.06
R denotes an observation with a large standardized residual
Durbin-Watson statistic = 2.26
Comment: The Durbin-Watson is checking for
autocorrelation, which will be explained later. It’s enough to say that values close to 2
indicate that autocorrelation is not a problem. The dependent variable (Y) and the first
independent variable are printed out above, followed by Fit, which means the predicted
value of Y, SE Fit, which with the appropriate value if t will give us a confidence
interval for Y, and Residual, which is the difference between the predicted and the
actual value of Y. The standardized residual seems to be the residual after the mean
residual has been subtracted from it and the standard deviation of the residual has been
divided into it.
Residual Histogram for DistCost
Normplot of Residuals for DistCost
Residuals vs Fits for DistCost
Residuals vs Order for DistCost
Residuals from DistCost vs Sales
Residuals from DistCost vs Orders
Histogram of the Residuals
(response is DistCost)
7
6
Frequency
5
4
3
2
1
0
-10
-8
-6
-4
-2
0
2
4
6
8
Residual
Comment: The histogram seems to indicate a fairly symmetrical distribution with a
peak in the middle.
This problem is continued in 252solnK1b
6
Download