Assignment
Problem: For the homeprice dataset,
1.
What does a half bathroom do for the sale price?
2.
How do the coefficients change if you force the intercept, b
0 to be 0? (Use a ‐ 1 in the model formula notation.) Does it make any sense for this model to have no intercept term?
3.
What is the effect of neighbourhood on the difference between sale price and list price?
4.
Do nicer neighbourhoods mean it is more likely to have a house go over the asking price?
5.
Is there a relationship between houses which sell for more than predicted (a positive residual) and houses which sell for more than asking?
6.
(If so, then perhaps the real estate agents aren't pricing the home correctly.)
Ans:
Description
The homeprice data frame has 29 rows and 7 columns, representing Sale price of homes in
New Jersey in the year 2001. This dataset is a random sampling of the homes sold in
Maplewood, NJ during the year 2001. Of course the prices will either seem incredibly high or fantastically cheap depending on where you live, and if you have recently purchased a home.
This data frame contains the following columns: list: list price of home (in thousands) sale: actual sale price full: number of full bathrooms half: number of half bathrooms bedrooms: number of bedrooms rooms : total number of rooms neighbourhood : Subjective assessment of neighbourhood on scale of 1-5
1. Half bathroom and the Sale price
> plot(sale~half, data=homeprice)
> hlm <- lm(sale~half , data=homeprice)
> abline(hlm)
> summary(hlm)
Call: lm(formula = sale ~ half, data = homeprice)
Residuals:
Min 1Q Median 3Q Max
-180.27 -75.27 -22.34 72.66 246.58
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 228.27 28.78 7.932 1.59e-08 *** half 69.08 31.00 2.229 0.0344 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 109.8 on 27 degrees of freedom
Multiple R-squared: 0.1554, Adjusted R-squared: 0.1241
F-statistic: 4.966 on 1 and 27 DF, p-value: 0.03436
From the plot we can observe that increase in number of half bathrooms results in increase of the sale price.
2. Changing coefficient b
0 to be 0
> hlm <- lm(sale~half-1 , data=homeprice)
> abline(hlm)
> summary(hlm)
Call: lm(formula = sale ~ half - 1, data = homeprice)
Residuals:
Min 1Q Median 3Q Max
-222.62 6.44 117.70 215.00 450.00
Coefficients:
Estimate Std. Error t value Pr(>|t|) half 242.56 39.36 6.163 1.18e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 196.8 on 28 degrees of freedom
Multiple R-squared: 0.5757, Adjusted R-squared: 0.5605
F-statistic: 37.98 on 1 and 28 DF, p-value: 1.181e-06
The coefficients will change if we force the intercept. Initially the coefficients are 228.27 and
69.08 and after changing the intercepts the coefficients are 242.56 and 0.
It is not making any sense for this model to have no intercept term, we can observe the variation of the regression line (Red line) compared to original regression line (Black line).
The red line is not considering from the Zero number of half bathrooms.
3. Effect of neighbourhood on the difference between sale price and list price
> xyplot(sale ~ list | neighborhood,panel=panel.lm,data=homeprice)
> nbd =as.numeric(cut(neighborhood,c(0,2,3,5),labels=c(1,2,3)))
> table(nbd) # check that we partitioned well nbd
1 2 3
10 12 7
> xyplot(sale ~ list | nbd, panel=panel.lm,layout=c(3,1))
For different neighbourhood we can observe linear relation between the sale price and the list
price.
> a=(list-sale)
> plot(a~neighborhood, data=homeprice)
> hlm <- lm(a~neighborhood , data=homeprice)
> abline(hlm)
> summary(hlm)
Call: lm(formula = a ~ neighborhood, data = homeprice)
Residuals:
Min 1Q Median 3Q Max
-33.05 -5.80 0.85 7.50 30.05
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.800 7.435 -1.049 0.303 neighborhood 3.150 2.428 1.298 0.205
Residual standard error: 13 on 27 degrees of freedom
Multiple R-squared: 0.0587, Adjusted R-squared: 0.02383
F-statistic: 1.684 on 1 and 27 DF, p-value: 0.2054
With increasing quality of neighbourhood, we can observe that there is an ascending trend in the difference between the sales price and the list price of the houses.
4. Yes. It has been noted that in nicer neighbourhoods a house tends to be sold at a price over the asked price.