Stat 301 – Exam 2 November 5, 2013 Name: ________________________ INSTRUCTIONS: Read the questions carefully and completely. Answer each question and show work in the space provided. Partial credit will not be given if work is not shown. Use the JMP output. It is not necessary to calculate something by hand that JMP has already calculated for you. When asked to explain, describe, or comment, do so within the context of the problem and support statements with statistical summaries. Be sure to include units of measurements when discussing quantitative variables. For the first project in Stat 301 you looked at predicting the sale price for a random sample of 50 homes selected from the 2930 homes that were sold in Ames between 2006 and 2010. In this exam we will examine further the price of homes based on a random sample of 50 homes. 1 1. [15 pts] Below are summaries of the sale price of homes for the random sample of 50. 25 15 10 Count 20 Mean Std Dev Std Err Mean Upper 95% Mean Lower 95% Mean N Median Range IQR 170.450 73.567 10.404 191.357 149.542 50 151.000 437.567 66.375 5 0 100 200 300 400 500 6 Sale Price $1000 a) [5] Describe the distribution of sale price for the sample of 50 homes. Use information from the histogram, box plot, and summary statistics. b) [5] Could the mean sale price of all 2930 homes sold in Ames between 2006 and 2010 be $160,000? Support your answer statistically. c) [5] Is the condition that the errors are normally distributed satisfied for these data? Explain briefly. 2 2. [17 pts] One variable that can be used to predict sale price is the total amount of living area in square feet. Below is JMP output for the simple linear regression of sale price on living area. Analysis of Variance Source DF Model 1 Error 48 C. Total 49 Parameter Estimates Term Intercept Living Area (sqft) Sum of Squares 94650.56 170539.79 265190.35 Estimate 19.525494 0.1050199 Mean Square 94650.6 3552.9 Std Error 30.4316 0.020347 t Ratio 0.64 5.16 F Ratio 26.6403 Prob > F <.0001* Prob>|t| 0.5242 <.0001* a) [3] Give the prediction equation for predicting sale price from living area. b) [3] What is the predicted sale price of a home that has 2000 square feet of living area? c) [4] What is the average price per square foot of living area? d) [3] How much of the variation in sale price can be explained by the linear relationship with living area? 3 30 25 20 15 Price ($1000) Residuals Sale 10 50 0 -50 -10 -15 -20 -25 -30 500 1000 1500 2000 2500 3 Living Area (sqft) e) [4] What does the plot of residuals versus living area indicate about the predictions using the simple linear regression of sale price on living area? Support your answer by referring to the plot of residuals. 3. [21] Another variable that may be helpful in predicting the sale price of a home is the age of the home. Below is JMP output for the multiple linear regression of sale price on living area and age. Parameter Estimates Term Intercept Living Area (sqft) Age Effect Tests Source Living Area (sqft) Age Estimate 84.913905 0.0883434 –1.063208 DF 1 1 Std Error 30.54042 0.017995 0.253925 Sum of Squares 63696.21 46331.73 t Ratio 2.78 4.91 –4.19 F Ratio 24.1025 17.5318 Prob>|t| 0.0078* <.0001* 0.0001* Prob > F <.0001* 0.0001* a) [3] Give the prediction equation for predicting sale price from living area and age. 4 b) [4] Give an interpretation of the slope estimate for living area within the context of the problem. c) [4] Give an interpretation of the slope estimate for age within the context of the problem. d) [5] Does age add significantly to the model that already contains living area? Support your answer statistically. e) [5] Fill in the analysis of variance table for the simple linear regression of sale price on Age. Source DF Sum of Squares Mean Square F Ratio Model Error C. Total 5 4. [23] Below is the JMP output for a model that has living area, age and living area*age for explanatory variables that is fit using Fit Model with the Center Polynomials option used. Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.631945 0.607942 46.06342 170.4496 50 Analysis of Variance Source DF Sum of Squares Model 3 167585.78 Error 46 97604.56 C. Total 49 265190.34 Mean Square 55861.9 2121.8 Parameter Estimates Term Intercept Living Area (sqft) Age (Living Area (sqft) – 1437.1)*(Age – 38.96) Estimate 60.843078 0.0919971 –0.744125 –0.002373 F Ratio 26.3271 Prob > F <.0001* Std Error 28.19732 0.016157 0.244723 0.00067 t Ratio 2.16 5.69 –3.04 –3.54 Prob>|t| 0.0362* <.0001* 0.0039* 0.0009* a) [4] What is the predicted sale price of a home that has the average living area and average age for this sample of 50 homes? b) [5] Is there a statistically significant interaction between living area and age? Support our answer statistically. c) [4] What does your result in b) indicate about the average price per square foot of living area? 6 20 15 Price ($1000) Residual Sale 10 50 0 -50 -10 -15 -20 0 20 40 60 80 100 1 Age d) [4] Describe the plot of residuals versus Age. What does this indicate about what can be done do to improve the prediction of sale price? e) [6] If the model with living area, age and living area*age is fit using JMP but the Center Polynomials option is turned off. What would be the numerical value for … • RSquare? • RMSE? • Model Utility F-Ratio? • Estimate of the slope coefficient for Living Area: • Estimate of the slope coefficient for Age: • Estimate of the slope coefficient for Living Area*Age: 7 5. [24 pts] A home with a large basement area may be desirable. Consider the dummy variable X that takes on the value 1 if a house has a basement area of 2000 or more square feet and 0 if a house has a basement area of less than 2000 square feet. Below is JMP output for predicting sale price using, living area, age, age*age, lot area and X. The Center Polynomials option is used. Consider a house with 2000 square feet of living area that is 20 years old with less than 2000 square feet of basement area and is on a lot that is 20,000 square feet. Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) Analysis of Variance Source DF Model 5 Error 44 C. Total 49 Sum of Squares 219800.34 45390.00 265190.34 Parameter Estimates Term Intercept Living Area (sqft) Age (Age-38.96)*(Age-38.96) Lot Area (sqft) X Effect Tests Source Living Area (sqft) Age Age*Age Lot Area (sqft) X 0.82884 0.80939 32.11839 170.4496 50 Estimate 96.321197 0.0400267 –1.222449 0.0194635 0.0046082 203.42296 DF 1 1 1 1 1 Mean Square 43960.1 1031.6 Std Error 21.01299 0.012878 0.181058 0.005597 0.001153 35.13514 Sum of Squares 9965.484 47025.759 12472.815 16484.874 34579.922 F Ratio 42.6139 Prob > F <.0001* t Ratio 4.58 3.11 –6.75 3.48 4.00 5.79 F Ratio 9.6603 45.5857 12.0909 15.9800 33.5210 Prob>|t| <.0001* 0.0033* <.0001* 0.0012* 0.0002* <.0001* Prob > F 0.0033* <.0001* 0.0012* 0.0002* <.0001* a) [3] For a similar home how much, on average, would having a basement with 2000 or more square feet change the sale price of the home? b) [3] For a similar home how much, on average, would the sale price change if the lot size is changed to 10000 square feet? 8 c) [4] If lot area is removed from this model what would the RSquare for the reduced model be? d) [5] Would the change in RSquare be statistically significant if lot area is removed from this model? Explain statistically. 10 50 50 0 0 -50 -50 -10 Price ($1000) 10 Residual Sale Price ($1000) Residual Sale e) [4] What do the plots of residuals versus explanatory variables indicate about the equal standard deviation condition? Be sure to support your answer by referring to the plots. 1000 1500 Living Area (sqft) 2000 2500 -10 -20 0 20 40 60 80 100 1 Age 9 f) [5] Describe the distribution of residuals. Be sure to comment on all three of the graphs. What does the distribution of residuals indicate about the condition of normally distributed errors? 0.95 Normal Quantile Plot 1.64 1.28 0.85 0.67 0.0 0.75 0.60 0.45 -0.67 0.30 0.20 -1.28 -1.64 0.10 0.05 20 10 Count 15 5 -100 -50 0 50 100 Residual Sale Price ($1000) 10