Stat 328 Final Exam

advertisement
Stat 328 Final Exam
Summer 2000
Prof. Vardeman
1. Attached to this exam is a yellow JMP-IN printout useful in the analysis of some data taken from Statistics
for Business and Economics by McClave, Benson and Sincich. The data concern gasoline consumption from
1970 through 1993. For these years, in the JMP data table there are the year (numbered " through #%, " being
1970) and:
C œ Auto Fuel Consumption (billion gallons)
Cw œ Per Capita Auto Fuel Consumption ("!!! gallons)
B" œ US Population Size (millions)
B# œ Average Gross Weekly Earnings ("*)# dollars)
B$ œ Price of Regular ($ per gallon)
B% œ Relative Price of Auto Fuel (B$ Î G T M )
B& œ Available Dollars œ B# † B"
Initially consider trying to model C in terms of the B's. Some relevant correlations and scatterplots are on
page 2 of the printout.
(a) Which single variable (of B" through B& ) would produce the largest V # in a SLR analysis of C? What sign
would be on its fitted regression coefficient, ,? Explain both.
variable:
explanations:
sign on its ,:
Now consider analysis of Cw (not C). Page 3 of the printout concerns a SLR analysis of Cw as a function of B# ,
the single predictor of Cw with the highest V # in SLR.
(b) There are, on the plot on page 3 of the printout, two clusters of points. The one on the right comes from the
first "! data points and the one on the left from the last "%. These 2 groups correspond to years before and after
the "Mideast oil embargo." A sensible thing to do in this context would be to ask whether there should be 2
SLR's instead of just one here. Defining a dummy variable
"
.œœ
!
for years "-"!
for years ""-#%
and fitting the model Cw œ "! € "" . € "# B# € "$ . † B# € % to these data produces WWI œ Þ!!#$)%#(. Give
some quantitative assessment of whether single SLR is adequate here, or whether 2 are needed. (Explain what
you calculate, why you do so and what it indicates. A full blown test isn't required.)
Regardless of your answer in (b), now consider a single SLR analysis of these ÐB# ß Cw Ñ data.
-1-
(c) Using the simple linear regression model, give 95% confidence limits for the increase in mean yearly per
capita fuel consumption that seems to accompany a $1 increase in weekly earnings. (No need to simplify.)
(d) Suppose your job is to make an early guess (before figures on fuel use are available) at 2000 per capita auto
fuel consumption, based on reports that say B# œ #(! for this year. Give 95% prediction limits. (You may read
these from the printout instead of doing calculations.)
Vardeman used JMP-IN's stepwise regression facility and found that the 5 œ # variable regression equation
with the largest V # (that can be built from these predictors) is
Cw œ "! € "" B" € "& B& € %
Ð œ "! € "" B" € "& B# † B" € %Ñ
(*)
Pages 4 through 8 of the printout concern the fitting and use of this model.
(e) What is a :-value for testing whether population size (B" ) could be safely dropped from this model (leaving
B& )? Where did you get it and what does it indicate here?
:-value:
origin and interpretation:
(f) What "P" is the difference in mean fuel consumptions for two years with (respectively) B" of #(! (million)
and #($ (million) and B# of $!! and $"!? (Give the linear combination of " 's of interest here.)
-2-
(g) What, for B" œ #(!, are 95% confidence limits for the increase in mean per capita fuel consumption this
model says accompanies a $1 increase in average gross weekly earnings? (No need to simplify.)
(h) What do you see in the plot of residuals for this model against sCw on page 5 of the printout? (If it makes any
difference, the 10 points on the right are the 10 "pre-embargo" points.)
(i) From looking at the JMP data table on pages 7 (to which some things have been added subsequent to fitting
the model (*)) identify a case that you think is probably highly important in the fitting of model (*). What
draws your attention to this case?
(j) Page 8 of the printout is a plot of /3 versus /3•" for this data set, where 3 is both the case number and the
year number (3 œ " being 1970). What does this plot indicate about the MLR regression analysis under model
(*)? Note that 1993 (the 3 œ #%) case in the data set, had /#% ¸ Þ!"Þ If I told you the values for B" and B& for
1994 (for example, say they are B" œ #'" and B& œ ''ß !!!) and asked you to adjust your prediction of C#&
derived from the fitted version of model (*), how (in qualitative terms) would you make use of this information
about /#% ? (Would you adjust up or down, and why?)
-3-
2. The pink printout attached to this printout is from a JMP analysis of some data again from Statistics for
Business and Economics by McClave, Benson and Sincich, this time fashioned after a study from the Journal of
Marketing Research meant to evaluate the effectiveness of short-run supermarket strategies. $ different Display
levels (" œ "normal space," # œ "normal plus end-of-aisle" and $ œ "twice normal") and 3 different Price
levels (" œ "regular," # œ "reduced" and $ œ "cost to store") were used. The response variable
C œ weekly sales ($"!!)
was observed for a single product $ different times for each of the $ ‚ $ different combinations of levels of the
two factors.
(a) On page 2 of the printout is a plot of C versus a "cell number" variable that names the combinations of
Display and Price " œ Ð"ß "Ñß # œ Ð"ß #Ñß $ œ "ß $Ñß % œ Ð#ß "Ñß á ß * œ Ð$ß $Ñ. Do you see anything in that
plot to cause you to worry about the constant variance assumption? There is on pages 3&4 a JMP-IN "Fit
Model" MLR analysis using "cell" as a nominal variable. On it is a pooled estimate of 5 . What is that
estimate, and what in the context of this problem is it estimating?
comment on plot:
=Pooled œ _____________
5 here is:
(b) Give and interpret a :-value for testing whether there is any difference at all among the * different cell
means (* different short run marketing strategies) in terms of mean sales.
:-value:
origin and interpretation:
(c) Give a 95% prediction interval for the next sales figure for the "twice normal space"&"cost to store"
marketing strategy for this item. (No need to simplify.)
-4-
Pages 5 through 10 of the printout refer to a JMP analysis of these data using "nominal" variables
Display and Price. First there is a "no-interactions" analysis. Then there is a "full model"/"with
interactions" analysis.
(d) Does it look from the printout that one really needs to include interactions in a two-factor analysis of these
data? Explain. (Offer some quantitative support for your position.)
Regardless of your answer to (d), for the rest of this exam, consider a "with interactions" analysis.
(e) Give 95% two-sided confidence limits for the main effect of the "normal space" level of Display. What is
being measured here in terms of the * different mean sales figures? (No need to simplify the limits.)
limits:
interpretation:
(f) Below is a $ ‚ $ table. Write in it estimated interactions for the * different combinations of levels of
Display and Price. (Some of these can be gotten directly from the printout, others you'll have to do a bit of
arithmetic to get.)
Price
"
#
$
"
Display
#
$
(g) Suppose that in fact the MLR model with (with dummies) is a good one and that estimates of the model
parameters here are in fact exactly the true values (that's too much to hope for, but pretend they are perfect).
What do you assess as the fraction of weeks that the "normal space"&"regular price" marketing strategy will
produce sales of at least $1000?
-5-
AutoFuel
Rows
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Year
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
y
67.8
69.51
73.5
78
74.2
76.5
78.8
80.7
83.8
80.2
71.9
71
70.1
69.9
68.7
69.3
71.4
70.6
71.9
72.7
72
70.7
73.9
75.1
y'
0.33057
0.335635
0.350167
0.368098
0.346891
0.354167
0.361468
0.366485
0.37646
0.356286
0.315766
0.308696
0.301765
0.298081
0.290732
0.290566
0.296635
0.290774
0.293469
0.293975
0.288115
0.279889
0.28935
0.290972
x1
205.1
207.1
209.9
211.9
213.9
216
218
220.2
222.6
225.1
227.7
230
232.3
234.5
236.3
238.5
240.7
242.8
245
247.3
249.9
252.6
255.4
258.1
x2
308.84
314.35
327.51
327.45
313.91
303.96
308.35
311.88
312.42
302.91
285.32
280.75
276.95
281.83
281.87
277.96
278.15
275.09
272.21
269.55
264.23
259.9
259.17
258.57
1
x3
0.36
0.36
0.37
0.4
0.53
0.57
0.59
0.62
0.63
0.86
1.19
1.31
1.22
1.24
1.21
1.2
0.93
0.95
0.95
1.02
1.16
1.14
1.12
1.1
x4
0.689
0.672
0.641
0.633
0.786
0.794
0.775
0.763
0.715
0.852
1.074
1.125
1.033
0.998
0.942
0.917
0.703
0.706
0.683
0.713
0.774
0.729
0.705
0.678
x5
63343.08
65101.89
68744.35
69386.65
67145.35
65655.36
67220.3
68675.98
69544.69
68185.04
64967.36
64572.5
64335.49
66089.14
66605.88
66293.46
66950.7
66791.85
66691.45
66659.72
66031.08
65650.74
66192.02
66736.92
Correlations
y
x1
x2
x3
x4
x5
y
1.0000
-0.2337
0.4484
-0.3940
-0.2701
0.7521
x1
-0.2337
1.0000
-0.9476
0.7992
0.0487
-0.1168
x2
0.4484
-0.9476
1.0000
-0.8619
-0.2193
0.4236
x3
-0.3940
0.7992
-0.8619
1.0000
0.6298
-0.3560
x4
-0.2701
0.0487
-0.2193
0.6298
1.0000
-0.4383
x5
0.7521
-0.1168
0.4236
-0.3560
-0.4383
1.0000
Variable
Scatterplot Matrix
85
y
80
75
70
x1
250
230
210
x2
310
280
260
x3
1.2
0.8
0.4
x4
1.1
0.9
0.7
70000
x5
68000
66000
64000
70
75
80 85 210 230 250 260 280
310
.4 .6 .81.0
2
1.4 .7 .8 .9 1.0
1.2
64000 67000 70000
y' By x2
0.38
0.36
0.34
0.32
0.30
0.28
0.26
250
260
270
280
290
x2
300
Linear Fit
Linear Fit
y' = -0.0662 + 0.00133 x2
Summary of Fit
RSquare
0.835699
RSquare Adj
0.828231
Root Mean Square Error
0.013311
Mean of Response
0.319792
Observations (or Sum Wgts)
24
Analysis of Variance
Source
DF
Sum of Squares
Mean Square
F Ratio
1
0.01982645
0.019826
111.9007
Error
22
0.00389794
0.000177
Prob>F
C Total
23
0.02372439
Model
<.0001
Parameter Estimates
Term
Estimate
Std Error
t Ratio
Prob>|t|
Intercept
-0.066219
0.036592
-1.81
0.0840
x2
0.0013324
0.000126
10.58
<.0001
3
310
320
330
Response: y'
Summary of Fit
0.86164
RSquare
RSquare Adj
0.848463
Root Mean Square Error
0.012502
Mean of Response
0.319792
24
Observations (or Sum Wgts)
Parameter Estimates
Term
Estimate
Std Error
t Ratio
Prob>|t|
0.07819
0.122224
0.64
0.5293
x1
-0.001541
0.000164
-9.42
<.0001
x5
0.000009
0.000002
5.34
<.0001
Intercept
Effect Test
Source
Nparm
DF
Sum of Squares
F Ratio
Prob>F
x1
1
1
0.01386884
88.7268
<.0001
x5
1
1
0.00445715
28.5149
<.0001
4
Whole-Model Test
0.38
0.36
0.34
0.32
0.30
0.28
0.26
.26
.28
.30
.32
.34
y' Predicted
.36
.38
Analysis of Variance
Source
DF
Sum of Squares
Mean Square
F Ratio
2
0.02044189
0.010221
65.3892
Error
21
0.00328250
0.000156
Prob>F
C Total
23
0.02372439
Model
<.0001
0.02
0.01
Residual
0.00
-0.01
-0.02
-0.03
.26
.28
.30
.32
.34
y' Predicted
.36
.38
5
x1
0.38
0.36
0.34
y'
0.32
0.30
0.28
0.26
200
210
220
230
240
x1 Leverage
250
260
Effect Test
Sum of Squares
F Ratio
DF
Prob>F
0.01386884
88.7268
1
<.0001
x5
0.38
0.36
0.34
y'
0.32
0.30
0.28
0.26
62000
64000
66000
x5 Leverage
68000
70000
Effect Test
Sum of Squares
F Ratio
DF
Prob>F
0.00445715
28.5149
1
<.0001
6
AutoFuel
Rows
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Year
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
y'
0.33057
0.335635
0.350167
0.368098
0.346891
0.354167
0.361468
0.366485
0.37646
0.356286
0.315766
0.308696
0.301765
0.298081
0.290732
0.290566
0.296635
0.290774
0.293469
0.293975
0.288115
0.279889
0.28935
0.290972
x1
205.1
207.1
209.9
211.9
213.9
216
218
220.2
222.6
225.1
227.7
230
232.3
234.5
236.3
238.5
240.7
242.8
245
247.3
249.9
252.6
255.4
258.1
x5
63343.08
65101.89
68744.35
69386.65
67145.35
65655.36
67220.3
68675.98
69544.69
68185.04
64967.36
64572.5
64335.49
66089.14
66605.88
66293.46
66950.7
66791.85
66691.45
66659.72
66031.08
65650.74
66192.02
66736.92
Predicted y'
0.330591
0.343294
0.371671
0.374353
0.351154
0.334544
0.345507
0.355182
0.35928
0.343223
0.310336
0.303247
0.297575
0.309923
0.311787
0.305592
0.3081
0.303438
0.299146
0.295316
0.285666
0.278091
0.278634
0.279363
7
Residual y'
-0.00002
-0.00766
-0.0215
-0.00625
-0.00426
0.019622
0.01596
0.011303
0.01718
0.013063
0.00543
0.005449
0.00419
-0.01184
-0.02105
-0.01503
-0.01147
-0.01266
-0.00568
-0.00134
0.002449
0.001798
0.010716
0.01161
hy'
0.377233
0.191471
0.184017
0.22516
0.093032
0.10008
0.074328
0.132417
0.203671
0.090936
0.09163
0.114295
0.130581
0.047313
0.046835
0.052121
0.062455
0.068079
0.076881
0.088698
0.104674
0.129494
0.143484
0.171117
Cook’s D Influencey'
8.865e-7
0.036642
0.272538
0.031289
0.004383
0.101469
0.047122
0.047932
0.202166
0.040043
0.006983
0.009224
0.006468
0.01559
0.048735
0.027931
0.019918
0.026807
0.006199
0.00041
0.00167
0.001178
0.047898
0.071588
Correlations
Variable
esubi
esub(i-1)
esubi
1.0000
0.7274
esub(i-1)
0.7274
1.0000
1
rows not used due to missing values.
Scatterplot Matrix
0.02
esubi
0.01
0.00
-0.01
-0.02
0.02
esub(i-1)
0.01
0.00
-0.01
-0.02
-0.02
-0.01
.00
.01
.02
-0.02
8
-0.01
.00
.01
.02
supermarket
Rows Combination Display Price Sales
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
1
1
1
2
2
2
3
3
3
4
4
4
5
5
5
6
6
6
7
7
7
8
8
8
9
9
9
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
1
1
1
2
2
2
3
3
3
1
1
1
2
2
2
3
3
3
1
1
1
2
2
2
3
3
3
9.89
10.25
10.3
12.11
12.15
11.82
15.77
15.59
15.98
11.91
12.33
12.21
18.6
19.1
19.26
24.92
25.27
25.11
12.26
12.02
11.8
15.16
15.01
14.98
18.01
18.33
18.52
1
1
Plot
25
20
Sales
15
10
0
2
4
6
Combination
Sales
2
8
10
Response: Sales
Summary of Fit
RSquare
0.99832
RSquare Adj
0.997573
Root Mean Square Error
0.222428
Mean of Response
15.50593
Observations (or Sum Wgts)
27
Parameter Estimates
Estimate
Std Error
t Ratio
Prob>|t|
Intercept
15.505926
0.042806
362.24
<.0001
Combinat[1-9]
-5.359259
0.121074
-44.26
<.0001
Combinat[2-9]
-3.479259
0.121074
-28.74
<.0001
Combinat[3-9]
0.2740741
0.121074
2.26
0.0362
Combinat[4-9]
-3.355926
0.121074
-27.72
<.0001
Combinat[5-9]
3.4807407
0.121074
28.75
<.0001
Combinat[6-9]
9.5940741
0.121074
79.24
<.0001
Combinat[7-9]
-3.479259
0.121074
-28.74
<.0001
Combinat[8-9]
-0.455926
0.121074
-3.77
0.0014
Term
Effect Test
Source
Combination
Nparm
DF
8
8
Sum of Squares
F Ratio
Prob>F
529.11512 1336.849
<.0001
3
Whole-Model Test
Combination
25
25
20
20
Sales
15
15
10
10
10
15
20
Sales Predicted
25
10
Analysis of Variance
Source
Sum of Squares
8
529.11512
Error
18
0.89053
C Total
26
530.00565
Mean Square
F Ratio
Sum of Squares
66.1394 1336.849
0.0495
Prob>F
DF
Prob>F
529.11512 1336.849
8
<.0001
Least Squares Means
0.3
0.2
0.1
Residual
-0.0
-0.1
-0.2
-0.3
-0.4
15
20
Sales Predicted
F Ratio
<.0001
Least Sq Mean
Std Error
Mean
1
10.14666667
0.1284186825
10.1467
2
12.02666667
0.1284186825
12.0267
3
15.78000000
0.1284186825
15.7800
4
12.15000000
0.1284186825
12.1500
5
18.98666667
0.1284186825
18.9867
6
25.10000000
0.1284186825
25.1000
7
12.02666667
0.1284186825
12.0267
8
15.05000000
0.1284186825
15.0500
9
18.28666667
0.1284186825
18.2867
Level
10
25
Effect Test
DF
Model
15
20
Combination Leverage
25
4
Response: Sales
Summary of Fit
RSquare
0.901961
RSquare Adj
0.884136
Root Mean Square Error
1.536836
Mean of Response
15.50593
Observations (or Sum Wgts)
27
Lack of Fit
Source
DF
Sum of Squares
Lack of Fit
4
51.070481
Pure Error
18
0.890533
Total Error
22
51.961015
Mean Square
F Ratio
12.7676 258.0669
0.0495
Prob>F
<.0001
Max RSq
0.9983
Parameter Estimates
Term
Estimate
Std Error
t Ratio
Prob>|t|
Intercept
15.505926
0.295764
52.43
<.0001
Display[1-3]
-2.854815
0.418274
-6.83
<.0001
Display[2-3]
3.2396296
0.418274
7.75
<.0001
Price[1-3]
-4.064815
0.418274
-9.72
<.0001
Price[2-3]
-0.151481
0.418274
-0.36
0.7207
Effect Test
Source
Nparm
DF
Sum of Squares
F Ratio
Prob>F
Display
2
2
169.13925
35.8063
<.0001
Price
2
2
308.90539
65.3944
<.0001
5
Whole-Model Test
Display
25
25
20
20
Sales
15
15
10
10
10
15
20
Sales Predicted
25
12
Analysis of Variance
Source
13
14
15
16
17
Display Leverage
Effect Test
DF
Sum of Squares
Mean Square
F Ratio
Sum of Squares
F Ratio
DF
Prob>F
4
478.04464
119.511
50.6003
169.13925
35.8063
2
<.0001
Error
22
51.96101
2.362
Prob>F
C Total
26
530.00565
Model
Least Squares Means
<.0001
Level
6
Least Sq Mean
Std Error
Mean
1
12.65111111
0.5122786036
12.6511
2
18.74555556
0.5122786036
18.7456
3
15.12111111
0.5122786036
15.1211
18
19
Price
25
20
15
10
10
12
14
16
Price Leverage
18
20
Effect Test
Sum of Squares
F Ratio
DF
Prob>F
308.90539
65.3944
2
<.0001
Least Squares Means
Level
Least Sq Mean
Std Error
Mean
1
11.44111111
0.5122786036
11.4411
2
15.35444444
0.5122786036
15.3544
3
19.72222222
0.5122786036
19.7222
7
Response: Sales
Summary of Fit
RSquare
0.99832
RSquare Adj
0.997573
Root Mean Square Error
0.222428
Mean of Response
15.50593
Observations (or Sum Wgts)
27
Parameter Estimates
Estimate
Std Error
t Ratio
Prob>|t|
Intercept
15.505926
0.042806
362.24
<.0001
Display[1-3]
-2.854815
0.060537
-47.16
<.0001
Display[2-3]
3.2396296
0.060537
53.51
<.0001
Price[1-3]
-4.064815
0.060537
-67.15
<.0001
Price[2-3]
-0.151481
0.060537
-2.50
0.0222
Display[1-3]*Price[1-3]
1.5603704
0.085612
18.23
<.0001
Display[1-3]*Price[2-3]
-0.472963
0.085612
-5.52
<.0001
Display[2-3]*Price[1-3]
-2.530741
0.085612
-29.56
<.0001
Display[2-3]*Price[2-3]
0.3925926
0.085612
4.59
0.0002
Term
Effect Test
Source
Nparm
DF
Display
2
Price
Display*Price
Sum of Squares
F Ratio
Prob>F
2
169.13925 1709.373
<.0001
2
2
308.90539 3121.892
<.0001
4
4
51.07048
258.0669
<.0001
8
Whole-Model Test
Display
25
25
20
20
Sales
15
15
10
10
10
15
20
Sales Predicted
25
12
Analysis of Variance
Source
Sum of Squares
8
529.11512
Error
18
0.89053
C Total
26
530.00565
Mean Square
F Ratio
Sum of Squares
66.1394 1336.849
0.0495
Prob>F
F Ratio
DF
Prob>F
169.13925 1709.373
2
<.0001
Least Squares Means
<.0001
Level
Press
14
15
16
17
Display Leverage
Effect Test
DF
Model
13
2.0037
9
Least Sq Mean
Std Error
Mean
1
12.65111111
0.0741425609
12.6511
2
18.74555556
0.0741425609
18.7456
3
15.12111111
0.0741425609
15.1211
18
19
Price
Display*Price
Profile Plot
25
2
25
Sales 20
LS Means
20
3
1
15
10
15
1
2
Price
3
10
25
10
12
14
16
Price Leverage
18
20
20
Effect Test
Sum of Squares
F Ratio
DF
Prob>F
308.90539 3121.892
2
<.0001
Sales
15
Least Squares Means
Level
Least Sq Mean
Std Error
Mean
1
11.44111111
0.0741425609
11.4411
2
15.35444444
0.0741425609
15.3544
3
19.72222222
0.0741425609
19.7222
10
10
15
20
Display*Price Leverage
Effect Test
Sum of Squares
F Ratio
DF
Prob>F
51.070481 258.0669
4
<.0001
Least Squares Means
Least Sq Mean
Std Error
1,1
10.14666667
0.1284186825
1,2
12.02666667
0.1284186825
1,3
15.78000000
0.1284186825
2,1
12.15000000
0.1284186825
2,2
18.98666667
0.1284186825
2,3
25.10000000
0.1284186825
3,1
12.02666667
0.1284186825
3,2
15.05000000
0.1284186825
3,3
18.28666667
0.1284186825
Level
10
25
Download