Stat 301 – Exam 2 Name: ________________________ November 5, 2013

advertisement
Stat 301 – Exam 2
November 5, 2013
Name: ________________________
INSTRUCTIONS: Read the questions carefully and completely. Answer each question
and show work in the space provided. Partial credit will not be given if work is not
shown. Use the JMP output. It is not necessary to calculate something by hand that JMP
has already calculated for you. When asked to explain, describe, or comment, do so
within the context of the problem and support statements with statistical summaries. Be
sure to include units of measurements when discussing quantitative variables.
For the first project in Stat 301 you looked at predicting the sale price for a random
sample of 50 homes selected from the 2930 homes that were sold in Ames between 2006
and 2010. In this exam we will examine further the price of homes based on a random
sample of 50 homes.
1
1. [15 pts] Below are summaries of the sale price of homes for the random sample of 50.
Mean
Std Dev
Std Err Mean
Upper 95% Mean
Lower 95% Mean
N
Median
Range
IQR
170.450
73.567
10.404
191.357
149.542
50
151.000
437.567
66.375
Sale Price $1000
a) [5] Describe the distribution of sale price for the sample of 50 homes. Use
information from the histogram, box plot, and summary statistics.
b) [5] Could the mean sale price of all 2930 homes sold in Ames between 2006 and
2010 be $160,000? Support your answer statistically.
c) [5] Is the condition that the errors are normally distributed satisfied for these data?
Explain briefly.
2
2. [17 pts] One variable that can be used to predict sale price is the total amount of
living area in square feet. Below is JMP output for the simple linear regression of
sale price on living area.
Analysis of Variance
Source
DF
Model
1
Error
48
C. Total
49
Parameter Estimates
Term
Intercept
Living Area (sqft)
Sum of Squares
94650.56
170539.79
265190.35
Estimate
19.525494
0.1050199
Mean Square
94650.6
3552.9
Std Error
30.4316
0.020347
t Ratio
0.64
5.16
F Ratio
26.6403
Prob > F
<.0001*
Prob>|t|
0.5242
<.0001*
a) [3] Give the prediction equation for predicting sale price from living area.
b) [3] What is the predicted sale price of a home that has 2000 square feet of living
area?
c) [4] What is the average price per square foot of living area?
d) [3] How much of the variation in sale price can be explained by the linear
relationship with living area?
3
e) [4] What does the plot of residuals versus living area indicate about the
predictions using the simple linear regression of sale price on living area?
Support your answer by referring to the plot of residuals.
3. [21] Another variable that may be helpful in predicting the sale price of a home is the
age of the home. Below is JMP output for the multiple linear regression of sale price
on living area and age.
Parameter Estimates
Term
Intercept
Living Area (sqft)
Age
Effect Tests
Source
Living Area (sqft)
Age
Estimate
84.913905
0.0883434
–1.063208
DF
1
1
Std Error
30.54042
0.017995
0.253925
Sum of Squares
63696.21
46331.73
t Ratio
2.78
4.91
–4.19
F Ratio
24.1025
17.5318
Prob>|t|
0.0078*
<.0001*
0.0001*
Prob > F
<.0001*
0.0001*
a) [3] Give the prediction equation for predicting sale price from living area and age.
4
b) [4] Give an interpretation of the slope estimate for living area within the context
of the problem.
c) [4] Give an interpretation of the slope estimate for age within the context of the
problem.
d) [5] Does age add significantly to the model that already contains living area?
Support your answer statistically.
e) [5] Fill in the analysis of variance table for the simple linear regression of sale
price on Age.
Source
DF
Sum of Squares
Mean Square
F Ratio
Model
Error
C. Total
5
4. [23] Below is the JMP output for a model that has living area, age and living area*age
for explanatory variables that is fit using Fit Model with the Center Polynomials
option turned off.
Summary of Fit
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.631945
0.607942
46.06342
170.4496
50
Analysis of Variance
Source
DF Sum of Squares
Model
3
167585.78
Error
46
97604.56
C. Total
49
265190.34
Parameter Estimates
Term
Intercept
Living Area (sqft)
Age
Living Area (sqft)*Age
Mean Square
55861.9
2121.8
Estimate
–70.02298
0.18445
2.66619
–0.002373
F Ratio
26.3271
Prob > F
<.0001*
Std Error
52.08886
0.03157
1.07753
0.00067
t Ratio
–1.38
5.84
2.47
–3.54
Prob>|t|
0.1734
<.0001*
0.0171*
0.0009*
a) [4] What is the predicted sale price of a home that has the 1600 square feet of
living area and is 10 years old?
b) [5] Is there a statistically significant interaction between living area and age?
Support our answer statistically.
c) [4] What does your result in b) indicate about the average price per square foot of
living area?
6
d) [4] Describe the plot of residuals versus Age. What does this indicate about what
can be done do to improve the prediction of sale price?
e) [6] If the model with living area, age and living area*age is fit using JMP but the
Center Polynomials option is turned on. For each of the following, indicate
whether or not the numerical values would change.
 RSquare?

RMSE?

Model Utility F-Ratio?

Estimate of the slope coefficient for Living Area:

Estimate of the slope coefficient for Age:

Estimate of the slope coefficient for Living Area*Age:
7
5.
[24 pts] A home with a large basement area may be desirable. Consider the dummy
variable X that takes on the value 1 if a house has a basement area of 2000 or more
square feet and 0 if a house has a basement area of less than 2000 square feet. Below
is JMP output for predicting sale price using, living area, age, age*age, lot area and
X. The Center Polynomials option is turned off for the analysis.
Summary of Fit
RSquare
0.82884
RSquare Adj
0.80939
Root Mean Square Error
32.11839
Mean of Response
170.4496
Observations (or Sum Wgts)
50
Analysis of Variance
Source
DF
Model
5
Error
44
C. Total
49
Sum of Squares
219800.34
45390.00
265190.34
Parameter Estimates
Term
Intercept
Living Area (sqft)
Age
Age*Age
Lot Area (sqft)
X
Effect Tests
Source
Living Area (sqft)
Age
Age*Age
Lot Area (sqft)
X
Estimate
125.8645
0.0400267
–1.739046
0.0194635
0.0046082
203.42296
DF
1
1
1
1
1
Mean Square
43960.1
1031.6
Std Error
24.39866
0.012878
0.544707
0.005597
0.001153
35.13514
Sum of Squares
9965.484
47025.759
12472.815
16484.874
34579.922
F Ratio
42.6139
Prob > F
<.0001*
t Ratio
5.16
3.11
–5.03
3.48
4.00
5.79
F Ratio
9.6603
45.5857
12.0909
15.9800
33.5210
Prob>|t|
<.0001*
0.0033*
<.0001*
0.0012*
0.0002*
<.0001*
Prob > F
0.0033*
<.0001*
0.0012*
0.0002*
<.0001*
Consider a house with 2000 square feet of living area that is 20 years old with less than
2000 square feet of basement area and is on a lot that is 20,000 square feet.
a) [3] For a similar home how much, on average, would having a basement with
2000 or more square feet change the sale price of the home?
b) [3] For a similar home how much, on average, would the sale price change if the
lot size is changed to 10000 square feet?
8
c) [4] If lot area is removed from this model what would the RSquare for the
reduced model be?
d) [5] Would the change in RSquare be statistically significant if lot area is removed
from this model? Explain statistically.
e) [4] What do the plots of residuals versus explanatory variables indicate about the
equal standard deviation condition? Be sure to support your answer by referring
to the plots.
9
f) [5] Describe the distribution of residuals. Be sure to comment on all three of the
graphs. What does the distribution of residuals indicate about the condition of
normally distributed errors?
Residual Sale Price ($1000)
10
Download