Stat 328 Exam II

advertisement
Stat 328 Exam II
Summer 2003
Prof. Vardeman
This exam concerns the analysis of a set of home sale price data obtained from the Ames City
Assessor’s Office. Data on sales May 2002 through June 2003 of 1 12 and 2 story homes built 1945 and
before, with (above grade) size of 2500 sq ft or less and lot size 20,000 sq ft or less, located in Lowand Medium-Density Residential zoning areas are summarized on the JMP reports attached to this
exam. n = 88 different homes fitting this description were sold in Ames during this period. (2 were
actually sold twice, but only the second sales prices of these were included in our data set.) For each
home, the value of the response variable
Price = recorded sales price of the home
and the values of 14 potential explanatory variables were obtained. These variables are
-
Size , the floor area of the home above grade in sq ft
Land , the area of the lot the home occupies in sq ft
Bed Rooms , a count of the number in the home
Central Air , a dummy variable that is 1 if the home has central air conditioning and is
0 if it does not
Fireplace , a count of the number in the home
Full Bath , a count of the number of full bathrooms above grade
Half Bath , a count of the number of half bathrooms above grade
Basement , the floor area of the home’s basement (including both finished and
unfinished parts) in sq ft
Finished Bsmt , the area of any finished part of the home’s basement in sq ft
Bsmt Bath , a dummy variable that is 1 if there is a bathroom of any sort (full or half)
in the home’s basement and is 0 otherwise
Garage , a dummy variable that is 1 if the home has a garage of any sort and is 0
otherwise
Multiple Car , dummy variable that is 1 if the home has a garage that holds more than
one vehicle and is 0 otherwise
1
Style (2 Story) , a dummy variable that is 1 if the home is a 2 story (or a 2 2 story)
home and is 0 otherwise
Zone ( Town Center ) , a dummy variable that is 1 if the home is in an area zoned as
“Urban Core Medium Density” and 0 otherwise
The last two pages of the JMP report provide a small part of the data table. Remember that in total
there are n = 88 cases/rows in the full table. Only a few rows are given on the printout.
Write all answers you want Varde man to read on this exam, not on the printout.
1
a) The first JMP report gives some correlations and a set of scatterplots. Of the predictors represented
on this report, which one is the best single predictor of Price ? Which is the 2nd best single predictor
of Price ?
Best Single Predictor:
2nd Best Single Predictor:
b) What about this initial report alerts us to be careful in our interpreting of these data, in view of the
existence of multicollinearity?
There is next a simple linear regression report for predicting Price in terms of Size . Use this to
answer questions until further notice. (It was made using default JMP confidence levels.)
c) Give a single-number estimate of the standard deviation of home price for any fixed home size
under the simple linear regression model.
d) Give 95% confidence limits for the increase in mean price that is associated with a 100 sq ft
increase in size for homes of this type under the SLR model. (Plug in, but there is no need to
simplify.)
e) Give 95% prediction limits for the selling price of an additional 1500 sq ft home of this type.
2
f) The figure on the SLR report has n = 88 points plotted on it. Most of those are outside the inside
set of limits drawn around the least squares line. Is this evidence of a problem with our analysis?
Explain. Also, 6 of the 88 points on the figure on the SLR report are outside the wider limits on the
figure. Is this about what you expect? Explain.
Narrower Limits:
|
|
|
|
|
|
|
|
|
|
|
Wider Limits:
There are 3 different JMP MLR reports following the SLR report. These are for progressively
smaller/simpler models (involving progressively fewer predictor variables). Use these to answer the
following questions.
g) Give the value of an F statistic and associated degrees of freedom for testing whether all 14
predictors may simultaneously be dropped from a MLR model for Price .
F = __________
df = _____ , _____
h) Find the value of a partial F statistic and the associated degrees of freedom for testing whether the
increase in R2 seen going from the smallest of the 3 MLR models to the largest is “statistically
significant.”
F = __________
df = _____ , _____
3
i) Taking account of the 3 MLR reports, fill in the table below. Then say what the values indicate
about which of the 3 models is initially most attractive.
Model
k
R2
s
PRESS
Large
Medium
Small
j) In those cases where it is safe/sensible to make interpretations of individual regression coefficients
(the b j's ), what is that interpretation? (For example, how would one interpret bFireplace if that were
sensible to do?)
k) Looking at the 3 MLR reports, one can see that the b's for Fireplace are around $13,000-$14,000
per fireplace. Notice that the 1st case in the data set is a fairly small home with 0 fireplaces, that sold
for a modest price. As a practical matter, would you advise the owner of that home to install 3
fireplaces in it as a means of increasing the value of the home by about $40,000? (If not, why not, and
how do you reconcile the values of bFireplace and the fact that these models do a decent job of predicting
Price ? You don’t have the whole data set to “look at,” but how likely do you think it is that there was
a home in the data set comparable to a “fireplaces-added version of home #1”?)
4
l) If you thought it desirable to consider a MLR model with one less predictor than the smallest of the
3 MLR models summarized on the JMP reports, which predictor would you consider dropping from
the smallest model? Explain the basis of your choice. (Give some quantitative rationale.)
The last few columns in the data table refer to predictions and other summaries made using the 2nd of
the 3 MLR models.
m) If another home essentially matching the characteristics of home #1 is sold tomorrow, what would
you use for 95% prediction limits for the sale price. (Plug in numbers, but you don’t need to simplify.)
n) Of the cases listed on the partial data table, which case has the “most unusual/extreme set of
predictor values” (as measured by some appropriate summary statistic)? Explain. Which case is most
influential in the fitting of the model, if one considers both the values of the predictors and the selling
price for that case? Explain.
Most unusual set of predictors:
|
|
|
|
|
|
|
|
|
|
|
|
Most influential case:
5
Multivariate
Correlations
Price
Size
Bed Rooms
Fireplace
Full Bath
Basement (Total)
Land
Price
1.0000
0.6649
0.2974
0.6346
0.4112
0.3597
0.4353
Size
0.6649
1.0000
0.4647
0.3945
0.4878
0.4028
0.1975
Bed Rooms
0.2974
0.4647
1.0000
0.1489
0.1024
0.1794
-0.0240
Fireplace
0.6346
0.3945
0.1489
1.0000
0.1872
0.2019
0.3592
Full Bath
0.4112
0.4878
0.1024
0.1872
1.0000
0.2749
0.0838
Basement (Total)
0.3597
0.4028
0.1794
0.2019
0.2749
1.0000
-0.0157
Land
0.4353
0.1975
-0.0240
0.3592
0.0838
-0.0157
1.0000
Scatterplot Matrix
250000
200000
150000
100000
Price
2000
1500
Size
1000
5
Bed Rooms
3
1
2
Fireplace
1
0
3
Full Bath
2
1
1100
800
500
200
Basement (Total)
16000
12000
8000
4000
Land
100000 2500001000
2000 1 2 3 4 5 0 .5 1 1.5 2 1 1.5 2 2.5 3 200 600 11004000 12000
1
Bivariate Fit of Price By Size
250000
Price
200000
150000
100000
50000
500
1000
1500
2000
Size
Linear Fit
Linear Fit
Price = 15050.977 + 75.101965 Size
Summary of Fit
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.44212
0.435633
28061.1
123976.1
88
Analysis of Variance
Source
Model
Error
C. Total
DF
1
86
87
Sum of Squares
5.3667e+10
6.77186e10
1.21386e11
Mean Square
5.3667e10
787425229
F Ratio
68.1550
Prob > F
<.0001
Parameter Estimates
Term
Intercept
Size
Estimate
15050.977
75.101965
Std Error
13528.93
9.097088
t Ratio
1.11
8.26
Prob>|t|
0.2690
<.0001
2
Response Price
Whole Model
Actual by Predicted Plot
250000
Price Actual
200000
150000
100000
50000
50000
100000
150000
200000 250000
Price Predicted P<.0001 RSq=0.75
RMSE=20322
Summary of Fit
Rsquare
Rsquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.751644
0.704014
20321.67
123976.1
88
Analysis of Variance
Source
Model
Error
C. Total
DF
14
73
87
Sum of Squares
9.12387e10
3.01468e10
1.21386e11
Mean Square
6.51705e9
412970397
F Ratio
15.7809
Prob > F
<.0001
Parameter Estimates
Term
Intercept
Size
Land
Bed Rooms
Central Air
Fireplace
Full Bath
Half Bath
Basement (Total)
Finished Bsmt
Bsmt Bath
Garage
Mutiple Car
Style (2 Story)
Zone (Town Center)
Estimate
-20521.8
22.551802
2.155705
1854.4436
9419.1945
13301.722
16469.802
16345.405
36.4518
-14.11978
20148.503
18008.237
658.69293
8086.7239
-2506.852
Std Error
18329.7
11.52026
0.803363
3530.225
5581.791
3812.67
6304.273
5998.914
15.21141
10.93261
7285.129
10364.73
5394.207
5657.079
5065.468
t Ratio
-1.12
1.96
2.68
0.53
1.69
3.49
2.61
2.72
2.40
-1.29
2.77
1.74
0.12
1.43
-0.49
Prob>|t|
0.2666
0.0541
0.0090
0.6010
0.0958
0.0008
0.0109
0.0080
0.0191
0.2006
0.0072
0.0865
0.9031
0.1571
0.6222
Residual by Predicted Plot
Price Residual
60000
40000
20000
0
-20000
-40000
50000
100000
150000
200000 250000
Price Predicted
Press
44944965456
3
Response Price
Whole Model
Actual by Predicted Plot
250000
Price Actual
200000
150000
100000
50000
50000
100000
150000
200000 250000
Price Predicted P<.0001 RSq=0.75
RMSE=19985
Summary of Fit
Rsquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.749935
0.713741
19984.97
123976.1
88
Analysis of Variance
Source
Model
Error
C. Total
DF
11
76
87
Sum of Squares
9.10312e10
3.03543e10
1.21386e11
Mean Square
8.27556e9
399399145
F Ratio
20.7200
Prob > F
<.0001
Parameter Estimates
Term
Intercept
Size
Land
Central Air
Fireplace
Full Bath
Half Bath
Basement (Total)
Finished Bsmt
Bsmt Bath
Garage
Style (2 Story)
Estimate
-17925.64
24.398734
2.185702
10016.114
13878.892
16417.647
15989.478
36.05159
-14.41274
20252.121
16531.943
8593.22
Std Error
16187.89
10.51736
0.75733
5247.736
3547.244
5924.338
5768.836
14.88656
10.68338
7137.606
9773.037
5516.095
t Ratio
-1.11
2.32
2.89
1.91
3.91
2.77
2.77
2.42
-1.35
2.84
1.69
1.56
Prob>|t|
0.2716
0.0230
0.0051
0.0601
0.0002
0.0070
0.0070
0.0178
0.1813
0.0058
0.0948
0.1234
Residual by Predicted Plot
Price Residual
60000
40000
20000
0
-20000
-40000
50000
100000
150000
200000 250000
Price Predicted
Press
41681819873
4
Response Price
Whole Model
Actual by Predicted Plot
250000
Price Actual
200000
150000
100000
50000
50000
100000
150000
200000 250000
Price Predicted P<.0001 RSq=0.73
RMSE=20421
Summary of Fit
Rsquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.728601
0.701118
20420.86
123976.1
88
Analysis of Variance
Source
Model
Error
C. Total
DF
8
79
87
Sum of Squares
8.84416e10
3.29439e10
1.21386e11
Mean Square
1.1055e10
417011544
F Ratio
26.5105
Prob > F
<.0001
Parameter Estimates
Term
Intercept
Size
Land
Central Air
Fireplace
Full Bath
Half Bath
Basement (Total)
Bsmt Bath
Estimate
577.43584
34.079126
2.0549333
9439.654
14221.474
13714.481
12875.709
21.830393
17474.389
Std Error
13120.86
9.126617
0.766345
5314.799
3464.123
5789.375
5743.457
13.75504
6029.468
t Ratio
0.04
3.73
2.68
1.78
4.11
2.37
2.24
1.59
2.90
Prob>|t|
0.9650
0.0004
0.0089
0.0796
<.0001
0.0203
0.0278
0.1165
0.0049
Residual by Predicted Plot
Price Residual
60000
40000
20000
0
-20000
-40000
-60000
50000
100000
150000
200000 250000
Price Predicted
Press
41804612766
5
328 Final 03 Data B
Rows
1
5
10
13
18
19
25
37
41
42
44
51
52
56
57
58
62
64
69
70
72
77
81
86
Price
Size
Garage
74900
130000
123500
86500
124000
130000
141900
105000
119000
126500
111000
112000
144000
160000
117000
104500
126000
97000
150000
104000
127000
164000
170000
86400
906
1281
1306
1750
1456
1595
1762
1220
1344
1506
1609
1168
1302
1493
1157
1305
1356
865
1502
1224
1298
1567
1560
1123
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Mutiple
Car
Bed
Rooms
0
1
1
0
1
1
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
Central Air Fireplace
2
1
3
3
4
3
3
4
3
3
3
3
2
3
3
2
3
2
3
3
2
2
3
3
1
1
1
1
1
1
1
0
1
0
0
0
1
1
1
0
1
1
1
1
0
1
1
1
0
0
0
0
0
1
0
0
0
0
0
0
2
1
1
0
2
1
2
0
1
2
2
0
Full
Bath
1
2
1
1
2
1
1
1
1
1
1
1
1
2
2
1
2
1
1
1
2
1
1
1
Half
Bath
0
0
1
0
0
1
1
0
0
1
0
0
1
0
0
0
0
0
1
0
0
1
1
0
Basement
(Total)
348
1151
240
636
869
745
596
756
672
780
796
808
680
983
767
920
808
660
602
936
651
840
783
732
Finished
Bsmt
120
1096
0
0
0
340
0
0
0
0
0
478
258
454
0
0
381
422
0
0
113
0
293
0
Bsmt
Bath
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
0
Land
4882
10284
15660
9000
5820
12445
7200
6120
12816
12900
4347
7520
6960
12600
9400
9350
8400
10042
10593
4710
6821
13680
10150
6060
Style (2
Story)
1
0
0
1
0
1
1
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
Zone
(Town
Center)
Predicted Price StdErr Pred Price
0
0
1
1
1
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
77225.562
141141.39
115774.766
118931.2
121031.888
151576.802
150089.257
85421.9067
118663.811
124073.72
92479.828
102450.67
136563.561
148199.09
131763.109
100468.077
144298.508
119936.678
158883.646
98943.0621
113739.667
167203.884
153039.691
92074.9633
7945.5874
10604.8393
9076.07007
6800.72314
5714.70427
6307.37083
9190.89994
4768.74425
7272.52215
7540.98589
6818.48493
7103.3598
7160.20639
6273.95219
6674.80528
5725.51852
6975.81994
7361.55511
6858.90346
5845.61655
7294.78158
6811.51347
6115.81276
4617.64729
328 Final 03 Data B
Rows StdErr Indiv Price
1
5
10
13
18
19
25
37
41
42
44
51
52
56
57
58
62
64
69
70
72
77
81
86
21506.5456
22624.3621
21949.3552
21110.3998
20785.9806
20956.6713
21997.0859
20546.0475
21267.0808
21360.3748
21116.1285
21209.8295
21228.9355
20946.6375
21070.1725
20788.9564
21167.4564
21297.6909
21129.2144
20822.3529
21274.7029
21113.8784
20899.8161
20511.5044
h Price
0.15806834
0.28157951
0.20624743
0.11579853
0.08176744
0.09960694
0.21149931
0.05693783
0.13242286
0.14238004
0.1164042
0.12633407
0.12836421
0.09855423
0.11155013
0.0820772
0.12183818
0.13568505
0.11778833
0.0855566
0.13323473
0.11616629
0.09364859
0.05338686
Cook's D
Influence Price
0.00025163
0.01412974
0.00407618
0.03250411
0.00017826
0.01193469
0.00475998
0.00512003
0.00000415
0.00023777
0.01066997
0.00314911
0.00194947
0.00352402
0.00642646
0.0003304
0.01103767
0.01993704
0.00249202
0.00054592
0.00650631
0.0003185
0.00684206
0.00040034
Download