Uploaded by satish.samayams

Multiple Linear Regression in R

advertisement

Assignment

Problem: For the homeprice dataset,

1.

What does a half bathroom do for the sale price?

2.

How do the coefficients change if you force the intercept, b

0 to be 0? (Use a ‐ 1 in the model formula notation.) Does it make any sense for this model to have no intercept term?

3.

What is the effect of neighbourhood on the difference between sale price and list price?

4.

Do nicer neighbourhoods mean it is more likely to have a house go over the asking price?

5.

Is there a relationship between houses which sell for more than predicted (a positive residual) and houses which sell for more than asking?

6.

(If so, then perhaps the real estate agents aren't pricing the home correctly.)

Ans:

Description

The homeprice data frame has 29 rows and 7 columns, representing Sale price of homes in

New Jersey in the year 2001. This dataset is a random sampling of the homes sold in

Maplewood, NJ during the year 2001. Of course the prices will either seem incredibly high or fantastically cheap depending on where you live, and if you have recently purchased a home.

This data frame contains the following columns: list: list price of home (in thousands) sale: actual sale price full: number of full bathrooms half: number of half bathrooms bedrooms: number of bedrooms rooms : total number of rooms neighbourhood : Subjective assessment of neighbourhood on scale of 1-5

1. Half bathroom and the Sale price

> plot(sale~half, data=homeprice)

> hlm <- lm(sale~half , data=homeprice)

> abline(hlm)

> summary(hlm)

Call: lm(formula = sale ~ half, data = homeprice)

Residuals:

Min 1Q Median 3Q Max

-180.27 -75.27 -22.34 72.66 246.58

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 228.27 28.78 7.932 1.59e-08 *** half 69.08 31.00 2.229 0.0344 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 109.8 on 27 degrees of freedom

Multiple R-squared: 0.1554, Adjusted R-squared: 0.1241

F-statistic: 4.966 on 1 and 27 DF, p-value: 0.03436

From the plot we can observe that increase in number of half bathrooms results in increase of the sale price.

2. Changing coefficient b

0 to be 0

> hlm <- lm(sale~half-1 , data=homeprice)

> abline(hlm)

> summary(hlm)

Call: lm(formula = sale ~ half - 1, data = homeprice)

Residuals:

Min 1Q Median 3Q Max

-222.62 6.44 117.70 215.00 450.00

Coefficients:

Estimate Std. Error t value Pr(>|t|) half 242.56 39.36 6.163 1.18e-06 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 196.8 on 28 degrees of freedom

Multiple R-squared: 0.5757, Adjusted R-squared: 0.5605

F-statistic: 37.98 on 1 and 28 DF, p-value: 1.181e-06

The coefficients will change if we force the intercept. Initially the coefficients are 228.27 and

69.08 and after changing the intercepts the coefficients are 242.56 and 0.

It is not making any sense for this model to have no intercept term, we can observe the variation of the regression line (Red line) compared to original regression line (Black line).

The red line is not considering from the Zero number of half bathrooms.

3. Effect of neighbourhood on the difference between sale price and list price

> xyplot(sale ~ list | neighborhood,panel=panel.lm,data=homeprice)

> nbd =as.numeric(cut(neighborhood,c(0,2,3,5),labels=c(1,2,3)))

> table(nbd) # check that we partitioned well nbd

1 2 3

10 12 7

> xyplot(sale ~ list | nbd, panel=panel.lm,layout=c(3,1))

For different neighbourhood we can observe linear relation between the sale price and the list

price.

> a=(list-sale)

> plot(a~neighborhood, data=homeprice)

> hlm <- lm(a~neighborhood , data=homeprice)

> abline(hlm)

> summary(hlm)

Call: lm(formula = a ~ neighborhood, data = homeprice)

Residuals:

Min 1Q Median 3Q Max

-33.05 -5.80 0.85 7.50 30.05

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -7.800 7.435 -1.049 0.303 neighborhood 3.150 2.428 1.298 0.205

Residual standard error: 13 on 27 degrees of freedom

Multiple R-squared: 0.0587, Adjusted R-squared: 0.02383

F-statistic: 1.684 on 1 and 27 DF, p-value: 0.2054

With increasing quality of neighbourhood, we can observe that there is an ascending trend in the difference between the sales price and the list price of the houses.

4. Yes. It has been noted that in nicer neighbourhoods a house tends to be sold at a price over the asked price.

Download