Stat 328 Exam II Summer 2003 Prof. Vardeman This exam concerns the analysis of a set of home sale price data obtained from the Ames City Assessor’s Office. Data on sales May 2002 through June 2003 of 1 12 and 2 story homes built 1945 and before, with (above grade) size of 2500 sq ft or less and lot size 20,000 sq ft or less, located in Lowand Medium-Density Residential zoning areas are summarized on the JMP reports attached to this exam. n = 88 different homes fitting this description were sold in Ames during this period. (2 were actually sold twice, but only the second sales prices of these were included in our data set.) For each home, the value of the response variable Price = recorded sales price of the home and the values of 14 potential explanatory variables were obtained. These variables are - Size , the floor area of the home above grade in sq ft Land , the area of the lot the home occupies in sq ft Bed Rooms , a count of the number in the home Central Air , a dummy variable that is 1 if the home has central air conditioning and is 0 if it does not Fireplace , a count of the number in the home Full Bath , a count of the number of full bathrooms above grade Half Bath , a count of the number of half bathrooms above grade Basement , the floor area of the home’s basement (including both finished and unfinished parts) in sq ft Finished Bsmt , the area of any finished part of the home’s basement in sq ft Bsmt Bath , a dummy variable that is 1 if there is a bathroom of any sort (full or half) in the home’s basement and is 0 otherwise Garage , a dummy variable that is 1 if the home has a garage of any sort and is 0 otherwise Multiple Car , dummy variable that is 1 if the home has a garage that holds more than one vehicle and is 0 otherwise 1 Style (2 Story) , a dummy variable that is 1 if the home is a 2 story (or a 2 2 story) home and is 0 otherwise Zone ( Town Center ) , a dummy variable that is 1 if the home is in an area zoned as “Urban Core Medium Density” and 0 otherwise The last two pages of the JMP report provide a small part of the data table. Remember that in total there are n = 88 cases/rows in the full table. Only a few rows are given on the printout. Write all answers you want Varde man to read on this exam, not on the printout. 1 a) The first JMP report gives some correlations and a set of scatterplots. Of the predictors represented on this report, which one is the best single predictor of Price ? Which is the 2nd best single predictor of Price ? Best Single Predictor: 2nd Best Single Predictor: b) What about this initial report alerts us to be careful in our interpreting of these data, in view of the existence of multicollinearity? There is next a simple linear regression report for predicting Price in terms of Size . Use this to answer questions until further notice. (It was made using default JMP confidence levels.) c) Give a single-number estimate of the standard deviation of home price for any fixed home size under the simple linear regression model. d) Give 95% confidence limits for the increase in mean price that is associated with a 100 sq ft increase in size for homes of this type under the SLR model. (Plug in, but there is no need to simplify.) e) Give 95% prediction limits for the selling price of an additional 1500 sq ft home of this type. 2 f) The figure on the SLR report has n = 88 points plotted on it. Most of those are outside the inside set of limits drawn around the least squares line. Is this evidence of a problem with our analysis? Explain. Also, 6 of the 88 points on the figure on the SLR report are outside the wider limits on the figure. Is this about what you expect? Explain. Narrower Limits: | | | | | | | | | | | Wider Limits: There are 3 different JMP MLR reports following the SLR report. These are for progressively smaller/simpler models (involving progressively fewer predictor variables). Use these to answer the following questions. g) Give the value of an F statistic and associated degrees of freedom for testing whether all 14 predictors may simultaneously be dropped from a MLR model for Price . F = __________ df = _____ , _____ h) Find the value of a partial F statistic and the associated degrees of freedom for testing whether the increase in R2 seen going from the smallest of the 3 MLR models to the largest is “statistically significant.” F = __________ df = _____ , _____ 3 i) Taking account of the 3 MLR reports, fill in the table below. Then say what the values indicate about which of the 3 models is initially most attractive. Model k R2 s PRESS Large Medium Small j) In those cases where it is safe/sensible to make interpretations of individual regression coefficients (the b j's ), what is that interpretation? (For example, how would one interpret bFireplace if that were sensible to do?) k) Looking at the 3 MLR reports, one can see that the b's for Fireplace are around $13,000-$14,000 per fireplace. Notice that the 1st case in the data set is a fairly small home with 0 fireplaces, that sold for a modest price. As a practical matter, would you advise the owner of that home to install 3 fireplaces in it as a means of increasing the value of the home by about $40,000? (If not, why not, and how do you reconcile the values of bFireplace and the fact that these models do a decent job of predicting Price ? You don’t have the whole data set to “look at,” but how likely do you think it is that there was a home in the data set comparable to a “fireplaces-added version of home #1”?) 4 l) If you thought it desirable to consider a MLR model with one less predictor than the smallest of the 3 MLR models summarized on the JMP reports, which predictor would you consider dropping from the smallest model? Explain the basis of your choice. (Give some quantitative rationale.) The last few columns in the data table refer to predictions and other summaries made using the 2nd of the 3 MLR models. m) If another home essentially matching the characteristics of home #1 is sold tomorrow, what would you use for 95% prediction limits for the sale price. (Plug in numbers, but you don’t need to simplify.) n) Of the cases listed on the partial data table, which case has the “most unusual/extreme set of predictor values” (as measured by some appropriate summary statistic)? Explain. Which case is most influential in the fitting of the model, if one considers both the values of the predictors and the selling price for that case? Explain. Most unusual set of predictors: | | | | | | | | | | | | Most influential case: 5 Multivariate Correlations Price Size Bed Rooms Fireplace Full Bath Basement (Total) Land Price 1.0000 0.6649 0.2974 0.6346 0.4112 0.3597 0.4353 Size 0.6649 1.0000 0.4647 0.3945 0.4878 0.4028 0.1975 Bed Rooms 0.2974 0.4647 1.0000 0.1489 0.1024 0.1794 -0.0240 Fireplace 0.6346 0.3945 0.1489 1.0000 0.1872 0.2019 0.3592 Full Bath 0.4112 0.4878 0.1024 0.1872 1.0000 0.2749 0.0838 Basement (Total) 0.3597 0.4028 0.1794 0.2019 0.2749 1.0000 -0.0157 Land 0.4353 0.1975 -0.0240 0.3592 0.0838 -0.0157 1.0000 Scatterplot Matrix 250000 200000 150000 100000 Price 2000 1500 Size 1000 5 Bed Rooms 3 1 2 Fireplace 1 0 3 Full Bath 2 1 1100 800 500 200 Basement (Total) 16000 12000 8000 4000 Land 100000 2500001000 2000 1 2 3 4 5 0 .5 1 1.5 2 1 1.5 2 2.5 3 200 600 11004000 12000 1 Bivariate Fit of Price By Size 250000 Price 200000 150000 100000 50000 500 1000 1500 2000 Size Linear Fit Linear Fit Price = 15050.977 + 75.101965 Size Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.44212 0.435633 28061.1 123976.1 88 Analysis of Variance Source Model Error C. Total DF 1 86 87 Sum of Squares 5.3667e+10 6.77186e10 1.21386e11 Mean Square 5.3667e10 787425229 F Ratio 68.1550 Prob > F <.0001 Parameter Estimates Term Intercept Size Estimate 15050.977 75.101965 Std Error 13528.93 9.097088 t Ratio 1.11 8.26 Prob>|t| 0.2690 <.0001 2 Response Price Whole Model Actual by Predicted Plot 250000 Price Actual 200000 150000 100000 50000 50000 100000 150000 200000 250000 Price Predicted P<.0001 RSq=0.75 RMSE=20322 Summary of Fit Rsquare Rsquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.751644 0.704014 20321.67 123976.1 88 Analysis of Variance Source Model Error C. Total DF 14 73 87 Sum of Squares 9.12387e10 3.01468e10 1.21386e11 Mean Square 6.51705e9 412970397 F Ratio 15.7809 Prob > F <.0001 Parameter Estimates Term Intercept Size Land Bed Rooms Central Air Fireplace Full Bath Half Bath Basement (Total) Finished Bsmt Bsmt Bath Garage Mutiple Car Style (2 Story) Zone (Town Center) Estimate -20521.8 22.551802 2.155705 1854.4436 9419.1945 13301.722 16469.802 16345.405 36.4518 -14.11978 20148.503 18008.237 658.69293 8086.7239 -2506.852 Std Error 18329.7 11.52026 0.803363 3530.225 5581.791 3812.67 6304.273 5998.914 15.21141 10.93261 7285.129 10364.73 5394.207 5657.079 5065.468 t Ratio -1.12 1.96 2.68 0.53 1.69 3.49 2.61 2.72 2.40 -1.29 2.77 1.74 0.12 1.43 -0.49 Prob>|t| 0.2666 0.0541 0.0090 0.6010 0.0958 0.0008 0.0109 0.0080 0.0191 0.2006 0.0072 0.0865 0.9031 0.1571 0.6222 Residual by Predicted Plot Price Residual 60000 40000 20000 0 -20000 -40000 50000 100000 150000 200000 250000 Price Predicted Press 44944965456 3 Response Price Whole Model Actual by Predicted Plot 250000 Price Actual 200000 150000 100000 50000 50000 100000 150000 200000 250000 Price Predicted P<.0001 RSq=0.75 RMSE=19985 Summary of Fit Rsquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.749935 0.713741 19984.97 123976.1 88 Analysis of Variance Source Model Error C. Total DF 11 76 87 Sum of Squares 9.10312e10 3.03543e10 1.21386e11 Mean Square 8.27556e9 399399145 F Ratio 20.7200 Prob > F <.0001 Parameter Estimates Term Intercept Size Land Central Air Fireplace Full Bath Half Bath Basement (Total) Finished Bsmt Bsmt Bath Garage Style (2 Story) Estimate -17925.64 24.398734 2.185702 10016.114 13878.892 16417.647 15989.478 36.05159 -14.41274 20252.121 16531.943 8593.22 Std Error 16187.89 10.51736 0.75733 5247.736 3547.244 5924.338 5768.836 14.88656 10.68338 7137.606 9773.037 5516.095 t Ratio -1.11 2.32 2.89 1.91 3.91 2.77 2.77 2.42 -1.35 2.84 1.69 1.56 Prob>|t| 0.2716 0.0230 0.0051 0.0601 0.0002 0.0070 0.0070 0.0178 0.1813 0.0058 0.0948 0.1234 Residual by Predicted Plot Price Residual 60000 40000 20000 0 -20000 -40000 50000 100000 150000 200000 250000 Price Predicted Press 41681819873 4 Response Price Whole Model Actual by Predicted Plot 250000 Price Actual 200000 150000 100000 50000 50000 100000 150000 200000 250000 Price Predicted P<.0001 RSq=0.73 RMSE=20421 Summary of Fit Rsquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.728601 0.701118 20420.86 123976.1 88 Analysis of Variance Source Model Error C. Total DF 8 79 87 Sum of Squares 8.84416e10 3.29439e10 1.21386e11 Mean Square 1.1055e10 417011544 F Ratio 26.5105 Prob > F <.0001 Parameter Estimates Term Intercept Size Land Central Air Fireplace Full Bath Half Bath Basement (Total) Bsmt Bath Estimate 577.43584 34.079126 2.0549333 9439.654 14221.474 13714.481 12875.709 21.830393 17474.389 Std Error 13120.86 9.126617 0.766345 5314.799 3464.123 5789.375 5743.457 13.75504 6029.468 t Ratio 0.04 3.73 2.68 1.78 4.11 2.37 2.24 1.59 2.90 Prob>|t| 0.9650 0.0004 0.0089 0.0796 <.0001 0.0203 0.0278 0.1165 0.0049 Residual by Predicted Plot Price Residual 60000 40000 20000 0 -20000 -40000 -60000 50000 100000 150000 200000 250000 Price Predicted Press 41804612766 5 328 Final 03 Data B Rows 1 5 10 13 18 19 25 37 41 42 44 51 52 56 57 58 62 64 69 70 72 77 81 86 Price Size Garage 74900 130000 123500 86500 124000 130000 141900 105000 119000 126500 111000 112000 144000 160000 117000 104500 126000 97000 150000 104000 127000 164000 170000 86400 906 1281 1306 1750 1456 1595 1762 1220 1344 1506 1609 1168 1302 1493 1157 1305 1356 865 1502 1224 1298 1567 1560 1123 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Mutiple Car Bed Rooms 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 Central Air Fireplace 2 1 3 3 4 3 3 4 3 3 3 3 2 3 3 2 3 2 3 3 2 2 3 3 1 1 1 1 1 1 1 0 1 0 0 0 1 1 1 0 1 1 1 1 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 2 1 1 0 2 1 2 0 1 2 2 0 Full Bath 1 2 1 1 2 1 1 1 1 1 1 1 1 2 2 1 2 1 1 1 2 1 1 1 Half Bath 0 0 1 0 0 1 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 1 1 0 Basement (Total) 348 1151 240 636 869 745 596 756 672 780 796 808 680 983 767 920 808 660 602 936 651 840 783 732 Finished Bsmt 120 1096 0 0 0 340 0 0 0 0 0 478 258 454 0 0 381 422 0 0 113 0 293 0 Bsmt Bath 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 Land 4882 10284 15660 9000 5820 12445 7200 6120 12816 12900 4347 7520 6960 12600 9400 9350 8400 10042 10593 4710 6821 13680 10150 6060 Style (2 Story) 1 0 0 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 Zone (Town Center) Predicted Price StdErr Pred Price 0 0 1 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 77225.562 141141.39 115774.766 118931.2 121031.888 151576.802 150089.257 85421.9067 118663.811 124073.72 92479.828 102450.67 136563.561 148199.09 131763.109 100468.077 144298.508 119936.678 158883.646 98943.0621 113739.667 167203.884 153039.691 92074.9633 7945.5874 10604.8393 9076.07007 6800.72314 5714.70427 6307.37083 9190.89994 4768.74425 7272.52215 7540.98589 6818.48493 7103.3598 7160.20639 6273.95219 6674.80528 5725.51852 6975.81994 7361.55511 6858.90346 5845.61655 7294.78158 6811.51347 6115.81276 4617.64729 328 Final 03 Data B Rows StdErr Indiv Price 1 5 10 13 18 19 25 37 41 42 44 51 52 56 57 58 62 64 69 70 72 77 81 86 21506.5456 22624.3621 21949.3552 21110.3998 20785.9806 20956.6713 21997.0859 20546.0475 21267.0808 21360.3748 21116.1285 21209.8295 21228.9355 20946.6375 21070.1725 20788.9564 21167.4564 21297.6909 21129.2144 20822.3529 21274.7029 21113.8784 20899.8161 20511.5044 h Price 0.15806834 0.28157951 0.20624743 0.11579853 0.08176744 0.09960694 0.21149931 0.05693783 0.13242286 0.14238004 0.1164042 0.12633407 0.12836421 0.09855423 0.11155013 0.0820772 0.12183818 0.13568505 0.11778833 0.0855566 0.13323473 0.11616629 0.09364859 0.05338686 Cook's D Influence Price 0.00025163 0.01412974 0.00407618 0.03250411 0.00017826 0.01193469 0.00475998 0.00512003 0.00000415 0.00023777 0.01066997 0.00314911 0.00194947 0.00352402 0.00642646 0.0003304 0.01103767 0.01993704 0.00249202 0.00054592 0.00650631 0.0003185 0.00684206 0.00040034