Lab_Activity_7 - Sites at Penn State

advertisement
Lab Activity 7 (07/25/2013)
Part 1: Recall that we collected a data set of shoes size at the beginning of this course. Previously, we
only include height as one of the predictor for response shoes size. Now we have learnt indicator
variables, we can include gender as one other predictor. Data can be found in Minitab file called
“Shoessize”
a. (5pts) How will you code the variable gender?
Code Male = 1 and Female = 0.
b. (5pts) Coding the variable gender as you described (Maybe name it “Gender_M” if you coded
Male as 1, or “Gender_F” if you coded F as 1). Then fit a multiple linear regression with both
height and Gender_M/Gender_F as predictors. What is your estimated equation?
(Minitab: Calc>Make indicator variables, and select variable “Gender”, Minitab will generate two
columns, and you can decide which one to use according to your coding scheme in a. In other words,
you have to drop one. Think about why?)
Fitted shoe
size = - 5.44 + 0.188 Height(inch) + 2.51 Gender_M
I drop Female because I just code Male as 1.
c. (5pts) Based on the estimated equation you found in b, how to you interpret the slope for the
variable Gender_M/Gender_F?
When height is held constant, the estimated vertical distance difference between two regression
lines is 2.51 in this sample.
d. (10pts)
With one inch increase in a male’s height, the estimate average shoe size will _increase 0.188
inches.__________________
With one inch increase in a female’s height, the estimate average shoe size will
______increase 0.188 inches._____________
So the effect of height on shoe size __does not___ depend on gender.
e. (5pts) Do you get all slopes and intercepts significant? If some are significant, how to interpret this
significance?
Predictor
Constant
Height(inch)
Gender_M
Coef SE Coef
-5.444
6.834
0.1879 0.1073
2.5125 0.9078
T
P
-0.80 0.444
1.75 0.111
2.77 0.020
No. The p value of both height and intercept are greater than 0.05. It shows that only gender M is
significant since its p value is less than 0.05. It means that the distance between two regression
lines of population does not equal to 0.
f. (5pts) The fitted regression actually represents two separate regression lines. What are they for?
What is the relation between these two lines, and what is their vertical distance?
They are for the estimated shoes sizes weather if it is male or female.
Fitted shoe size = - 5.44 + 0.188 Height(inch) + 2.51 Gender_M
Fitted shoe size = - 2.93 + 0.188 Height(inch) - 2.51 Gender_F
They are parallel because their slopes are both 0.188 with an estimated vertical distance difference 2.51.
Part 2: This problem is a modification of problem 8.24 in the text. The variables are Price (selling
price of the house, in thousands), Value (assessed valuation of house for tax purposes, in thousands),
and Lot which = 0 for non-corner lots and =1 for corner lots. Use dataset “Valuation” on Angel.
a. (10pts) Plot Price versus Value using different symbols for the two locations
(GraphScatterplotWith Groups). Comment on the relationship between each predictor and the
response. Do you think there is an interactive effect between the two predictors?
S ca tt er pl ot o f Pr ic e vs V al ue
100
Lot
0
1
Price
90
80
70
60
70.0
72.5
75.0
77.5
80.0
Value
The two groups of dots are both showing increasing relationships between price and value. But the
slope of non-corner lot is steeper than that of corner lot. Because they are not parallel, I think there is
an interactive effect between the two predictors.
b. (5pts) Fit the regression equation for predicting Price that includes Value, Lot, and the interaction
between Value and Lot as predictors. (Note: Here, you need to create a new column of data called
interaction by choosing CalcCalculator…) Write the estimated regression equation.
Fitted Price = - 127 + 2.78 Value + 76.0 Lot - 1.11 Interaction
c. (10pts) State the hypotheses testing the significance of the interaction term, report the test statistic
and the p-value. Is the interaction term statistically significant?
Ho: β3=0 VS Ha: β3≠0
Because the p value of interaction is lesser than 0.05, we reject the Ho. β3 is not equaled to 0.
According to T-test, it is statistically significant.
d. (10pts) Refer back to part b where you wrote the estimated regression equation. What is the
estimated equation that relates Price to Value...
For corner lots?
Fitted Price = - 127 + 2.78 Value + 76.0 - 1.11 Value (Lot = 1)
For noncorner lots?
Fitted Price = - 127 + 2.78 Value (Lot = 0)
g. (10pts)
With one thousand increase in the assessed valuation, the estimate average selling price for a house
will ____increase 1.67_________for cornor lots.
With one thousand increase in the assessed valuation, the estimate average selling price for a house
will ____increase 2.78_________for non cornor lots.
So the effect valuation on selling price of a hours__does___ depend on whether the hours is a
corner lots or not.
e. (5pts)Compare this to d in Part 1, what conclusion do you reach while comparing model with
interaction and without interaction terms?
The value of response does depend on the categorical predictors when with interaction, while the
value of response does not depend on the categorical predictors when without interaction.
g. (10pts) Suppose a corner lot and a noncorner lot both have an assessed valuation = 70. What is the
difference between predicted sale prices? How about if the two lots (corner and noncorner) both
have an assessed valuation = 80. What is the estimated difference between predicted sale prices?
Value=70
Corner lots: Price = -51+1.67 Value=65.9
Non-corner lots: Price = - 127 + 2.78 Value=67.6
The estimated difference = 67.6-65.9= 1.7
Value=80
Corner lots: Price = -51+1.67 Value=82.6
Non-corner lots: Price = - 127 + 2.78 Value=95.4
The estimated difference = 95.4-82.6=12.8
f. (5pts) Refer to the previous part. Explain how the answer to that part demonstrates the effect of
interaction. Can you explain the reason for getting different values in part (d) and (g) by referring to
the graphical feature of the two individual regression lines for homes with corner lots and without
corner lots based on the fitted model?
The difference between non-corner lots and corner lots when value = 70 is 1.7. The difference
between non-corner lots and corner lots when value = 80 is 12.8. When value increase, the two lines
spread away from each other, so the vertical distance is increasing. However, if the two lines are
parallel, the vertical distance keeps the same and the difference by the categorical variables is nothing.
Download