252solnK1 12/02/03 Problem 14.41 [14.35] (15.8)... could be results from a symmetrical distribution. Comment:

advertisement
252solnK1 12/02/03
Problem 14.41 [14.35] (15.8) continues.
Histogram of the Residuals
(response is Sales)
5
Frequency
4
3
2
1
0
-0.3
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
Residual
Comment: This graph doesn’t really look bad. Given the relatively small sample size, it
could be results from a symmetrical distribution.
Normal Probability Plot of the Residuals
(response is Sales)
2
Normal Score
1
0
-1
-2
-0.3
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
Residual
Comment: This seems to be fairly close to a straight line indicating that the residuals
have a distribution that is close to Normal.
13
252solnK1 12/02/03
Comment: These look fairly random, given the fact that shelves seem to only come in
lengths that are multiples of 5 and that Location is a dummy variable.
14
252solnK1 12/02/03
MTB > Regress c2 3 c1 c3 c4;
SUBC> RType 1;
SUBC>
Constant;
SUBC>
VIF;
SUBC>
Brief 3.
Regression Analysis: Sales versus Space, Locatn, Inter
The regression equation is
Sales = 1.20 + 0.0820 Space + 0.750 Locatn - 0.0240 Inter
Predictor
Constant
Space
Locatn
Inter
Coef
1.2000
0.08200
0.7500
-0.02400
S = 0.2124
SE Coef
0.1840
0.01344
0.3186
0.02327
R-Sq = 88.0%
T
6.52
6.10
2.35
-1.03
P
0.000
0.000
0.046
0.333
VIF
1.5
6.0
6.5
R-Sq(adj) = 83.5%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
Space
Locatn
Inter
DF
1
1
1
DF
3
8
11
SS
2.64150
0.36100
3.00250
MS
0.88050
0.04513
F
19.51
P
0.000
Seq SS
2.05350
0.54000
0.04800
Comment: Things don’t look so good in this regression. The VIFs for the last two variables are
high indicating collinearity, The p-value for the coefficient of interaction is very high, indication that it’s
not significant at the 1%, 5% or 10% level.
Obs
1
2
3
4
5
6
7
8
9
10
11
12
Space
5.0
5.0
5.0
10.0
10.0
10.0
15.0
15.0
15.0
20.0
20.0
20.0
Sales
1.6000
2.2000
1.4000
1.9000
2.4000
2.6000
2.3000
2.7000
2.8000
2.6000
2.9000
3.1000
Fit
1.6100
2.2400
1.6100
2.0200
2.0200
2.5300
2.4300
2.4300
2.8200
2.8400
2.8400
3.1100
SE Fit
0.1257
0.1777
0.1257
0.0823
0.0823
0.1164
0.0823
0.0823
0.1164
0.1257
0.1257
0.1777
Residual
-0.0100
-0.0400
-0.2100
-0.1200
0.3800
0.0700
-0.1300
0.2700
-0.0200
-0.2400
0.0600
-0.0100
St Resid
-0.06
-0.34
-1.23
-0.61
1.94
0.39
-0.66
1.38
-0.11
-1.40
0.35
-0.09
MTB > Stepwise c2 c1 c3 c4;
SUBC>
AEnter 0.15;
SUBC>
ARemove 0.15;
SUBC>
Constant.
15
252solnK1 12/02/03
Stepwise Regression: Sales versus Space, Locatn, Inter
Alpha-to-Enter: 0.15
Response is
Sales
Alpha-to-Remove: 0.15
on
3 predictors, with N =
Step
Constant
1
1.450
2
1.300
Space
T-Value
P-Value
0.074
4.65
0.001
0.074
6.72
0.000
Locatn
T-Value
P-Value
12
0.45
3.45
0.007
S
0.308
0.213
R-Sq
68.39
86.38
R-Sq(adj)
65.23
83.35
C-p
13.0
3.1
More? (Yes, No, Subcommand, or Help)
SUBC> yes
No variables entered or removed
More? (Yes, No, Subcommand, or Help)
SUBC> no
MTB >
Comment: The stepwise regression confirms the results above. Minitab decides that ‘Space’ is the
best single predictor and essentially redoes our first regression. It then brings in ‘Location’ and gets our
second regression. But when it is told to go ahead and add a third independent variable, it doesn’t find
enough explanatory power in ‘Inter’ to make it worth adding. Note that C-p is close to k + 1 in the second
regression, indicating good results.
The following material is an edited version of the solution in the Instructor’s Solution Manual.
14.35
(a)
(b)
(c)
Yˆ  1.30  0.074X 1  0.45X 2 , where X1 = shelf space and X2 = aisle location.
Holding constant the effect of aisle location, for each additional foot of shelf space, sales
are expected to increase on average by 0.074 hundreds of dollars, or $7.40. For a given
amount of shelf space, a front-of-aisle location is expected to increase sales on average
by 0.45 hundreds of dollars, or $45.
Yˆ  1.30  0.074(8)  0.45(0)  1.892 or $189.20
These intervals can only be done using Minitab or another regression program and appear
in the printout.
1.3684  YX  X i  2.4156
1.6880  Y |X  X i  2.0960
(d)
(e)
Based on a residual analysis, the model appears adequate.
2,9   4.26 . Reject H and say that there is evidence of a relationship
F  28 .53  F.05
0
(f)
between sales and the two dependent variables.
9
 2.262 . Reject H0 and say that Shelf space makes a significant
For X1: t  6.72  t .025
contribution and should be included in the model.
9
 2.262 . Reject H0 and say that aisle location makes a significant
For X2: t  3.45  t .025
contribution and should be included in the model.
Both variables should be kept in the model.
16
252solnK1 12/02/03
(g)
Remember that our results were
The regression equation is
Sales = 1.30 + 0.0740 Space + 0.450 Locatn
Predictor
Constant
Space
Locatn
Coef
1.3000
0.07400
0.4500
SE Coef
0.1569
0.01101
0.1305
T
8.29
6.72
3.45
P
0.000
0.000
0.007
VIF
1.0
1.0
9
Using the coefficient of space, b1  0.074, t nk 1  t.025
 2.262 , and sb1  0.01101 ,
2
you should be able to get 0.049  1  0.099 . (Is this right? I haven’t checked so tell
(h)
(i)
(j)
(k)
(l)
(m)
(n)
me.). By the same method, you should be able to get 0.155   2  0.745 for location.
The slope here takes into account the effect of the other predictor variable,
placement, while the solution for Problem 13.3 did not.
rY2.12  0.864 . So, 86.4% of the variation in sales can be explained by variation in shelf
space and variation in aisle location.
2
radj
 0.834
rY2.12  0.864 while about six pages back, for the simple regression, rY2.1  0.684 . The
inclusion of the aisle-location variable has resulted in the increase.
I’m a little bit too lazy to compute these things the war our author wants you to do it.
There’s a formula at the end of 252corr that gets you these results much faster.
 6.72 2  45 .1584
t2

rY21.2  2 2

 .8338 . Holding constant the effect of aisle
t 2  df  6.72 2  9  54 .1584
location, 83.4% of the variation in sales can be explained by variation in shelf space.
 3.45 2  11 .9025
t2

rY22.1  2 1

 .5694 rY22.1  0.569 . Holding constant the
t1  df  3.45 2  9  20 .9025
effect of shelf space, 56.9% of the variation in sales can be explained by variation in aisle
location.
The slope of sales with shelf space is the same regardless of whether the aisle
location is front or back.
From the last regression in the printout,
Yˆ  1.20  0.082 X 1  0.75 X 2  0.024 X 1 X 2 . Do not reject H0. There is not
(o)
evidence that the interaction term makes a contribution to the model.
The two-variable model in (a) should be used.
17
252solnK1 12/01/03
Nonlinear regression
Exercise 15.1: The equation given is Yˆ  5  3X 1  1.5 X 12 . n  25 , so n  k  1  25  2  1  22 .
(a)
Yˆ  5  3X  1.5 X 2  5  3(2)  1.5(2 2 )  17
(b)
(c)
(d)
t  2.35  t 22  2.0739 with 22 degrees of freedom. Reject H0. The quadratic term is
significant.
t  1.17  t 22  2.0739 with 22 degrees of freedom. Do not reject H0. The quadratic term
is not significant.
Yˆ  5  3X  1.5 X 2  5  3(2)  1.5(2 2 )  5
Exercise 15.6(15.13 in 8th edition): The given equation is ln Yˆ  3.07  0.9 X 1  1.41 ln X 2
(a)
(b)
ln Yˆ  3.07  0.9 ln(8.5)  1.41ln(5.2)  7.32067
Yˆ  e 7.32067  1511 .22
Holding constant the effects of X2, for each additional unit of the natural logarithm of X1
the natural logarithm of Y is expected to increase on average by 0.9. Holding constant the
effects of X1, for each additional unit of the natural logarithm of X2 the natural logarithm
of Y is expected to increase on average by 1.41.
Exercise 15.7(15.14 in 8th edition): The given equation is ln Yˆ  4.62  0.5 X 10.7 X 2 (5.2)  12.51
Yˆ  e12.51  271,034 .12
(a)
ln Yˆ  4.62  0.5(8.5)  0.7(5.2)  12.51
(b)
Holding constant the effects of X2, for each additional unit of X1 the natural logarithm of
Y is expected to increase on average by 0.5. Holding constant the effects of X1, for each
additional unit of X2 the natural logarithm of Y is expected to increase on average by 0.7.
Note: To deal with the VIF, the following material has been added to 252corr together with an example.
A relatively recent method to check for collinearity is to use the Variance Inflation Factor
1
. Here R 2j is the coefficient of multiple correlation gotten by regressing
VIF j 
1  R 2j
the independent variable X j against all the other independent variables  Xs  . The rule
of thumb seems to be that we should be suspicious if any VIFj  5 and positively
horrified if VIFj  10 . If you get results like this, drop a variable or change your model.
Note that, if you use a correlation matrix for your independent variables and see a large
correlation between two of them, putting the square of that correlation into the VIF
formula gives you a low estimate of the VIF, since the R-squared that you get from a
regression against all the independent variables will be higher.
18
Download