Lab Activity 7 (07/25/2013) Part 1: Recall that we collected a data set of shoes size at the beginning of this course. Previously, we only include height as one of the predictor for response shoes size. Now we have learnt indicator variables, we can include gender as one other predictor. Data can be found in Minitab file called “Shoessize” a. (5pts) How will you code the variable gender? Code Male = 1 and Female = 0. b. (5pts) Coding the variable gender as you described (Maybe name it “Gender_M” if you coded Male as 1, or “Gender_F” if you coded F as 1). Then fit a multiple linear regression with both height and Gender_M/Gender_F as predictors. What is your estimated equation? (Minitab: Calc>Make indicator variables, and select variable “Gender”, Minitab will generate two columns, and you can decide which one to use according to your coding scheme in a. In other words, you have to drop one. Think about why?) Fitted shoe size = - 5.44 + 0.188 Height(inch) + 2.51 Gender_M I drop Female because I just code Male as 1. c. (5pts) Based on the estimated equation you found in b, how to you interpret the slope for the variable Gender_M/Gender_F? When height is held constant, the estimated vertical distance difference between two regression lines is 2.51 in this sample. d. (10pts) With one inch increase in a male’s height, the estimate average shoe size will _increase 0.188 inches.__________________ With one inch increase in a female’s height, the estimate average shoe size will ______increase 0.188 inches._____________ So the effect of height on shoe size __does not___ depend on gender. e. (5pts) Do you get all slopes and intercepts significant? If some are significant, how to interpret this significance? Predictor Constant Height(inch) Gender_M Coef SE Coef -5.444 6.834 0.1879 0.1073 2.5125 0.9078 T P -0.80 0.444 1.75 0.111 2.77 0.020 No. The p value of both height and intercept are greater than 0.05. It shows that only gender M is significant since its p value is less than 0.05. It means that the distance between two regression lines of population does not equal to 0. f. (5pts) The fitted regression actually represents two separate regression lines. What are they for? What is the relation between these two lines, and what is their vertical distance? They are for the estimated shoes sizes weather if it is male or female. Fitted shoe size = - 5.44 + 0.188 Height(inch) + 2.51 Gender_M Fitted shoe size = - 2.93 + 0.188 Height(inch) - 2.51 Gender_F They are parallel because their slopes are both 0.188 with an estimated vertical distance difference 2.51. Part 2: This problem is a modification of problem 8.24 in the text. The variables are Price (selling price of the house, in thousands), Value (assessed valuation of house for tax purposes, in thousands), and Lot which = 0 for non-corner lots and =1 for corner lots. Use dataset “Valuation” on Angel. a. (10pts) Plot Price versus Value using different symbols for the two locations (GraphScatterplotWith Groups). Comment on the relationship between each predictor and the response. Do you think there is an interactive effect between the two predictors? S ca tt er pl ot o f Pr ic e vs V al ue 100 Lot 0 1 Price 90 80 70 60 70.0 72.5 75.0 77.5 80.0 Value The two groups of dots are both showing increasing relationships between price and value. But the slope of non-corner lot is steeper than that of corner lot. Because they are not parallel, I think there is an interactive effect between the two predictors. b. (5pts) Fit the regression equation for predicting Price that includes Value, Lot, and the interaction between Value and Lot as predictors. (Note: Here, you need to create a new column of data called interaction by choosing CalcCalculator…) Write the estimated regression equation. Fitted Price = - 127 + 2.78 Value + 76.0 Lot - 1.11 Interaction c. (10pts) State the hypotheses testing the significance of the interaction term, report the test statistic and the p-value. Is the interaction term statistically significant? Ho: β3=0 VS Ha: β3≠0 Because the p value of interaction is lesser than 0.05, we reject the Ho. β3 is not equaled to 0. According to T-test, it is statistically significant. d. (10pts) Refer back to part b where you wrote the estimated regression equation. What is the estimated equation that relates Price to Value... For corner lots? Fitted Price = - 127 + 2.78 Value + 76.0 - 1.11 Value (Lot = 1) For noncorner lots? Fitted Price = - 127 + 2.78 Value (Lot = 0) g. (10pts) With one thousand increase in the assessed valuation, the estimate average selling price for a house will ____increase 1.67_________for cornor lots. With one thousand increase in the assessed valuation, the estimate average selling price for a house will ____increase 2.78_________for non cornor lots. So the effect valuation on selling price of a hours__does___ depend on whether the hours is a corner lots or not. e. (5pts)Compare this to d in Part 1, what conclusion do you reach while comparing model with interaction and without interaction terms? The value of response does depend on the categorical predictors when with interaction, while the value of response does not depend on the categorical predictors when without interaction. g. (10pts) Suppose a corner lot and a noncorner lot both have an assessed valuation = 70. What is the difference between predicted sale prices? How about if the two lots (corner and noncorner) both have an assessed valuation = 80. What is the estimated difference between predicted sale prices? Value=70 Corner lots: Price = -51+1.67 Value=65.9 Non-corner lots: Price = - 127 + 2.78 Value=67.6 The estimated difference = 67.6-65.9= 1.7 Value=80 Corner lots: Price = -51+1.67 Value=82.6 Non-corner lots: Price = - 127 + 2.78 Value=95.4 The estimated difference = 95.4-82.6=12.8 f. (5pts) Refer to the previous part. Explain how the answer to that part demonstrates the effect of interaction. Can you explain the reason for getting different values in part (d) and (g) by referring to the graphical feature of the two individual regression lines for homes with corner lots and without corner lots based on the fitted model? The difference between non-corner lots and corner lots when value = 70 is 1.7. The difference between non-corner lots and corner lots when value = 80 is 12.8. When value increase, the two lines spread away from each other, so the vertical distance is increasing. However, if the two lines are parallel, the vertical distance keeps the same and the difference by the categorical variables is nothing.