Chapter 7 Multiple linear regression • In this chapter we consider multiple linear regression problems involving modeling the relationship between a dependent variable, Y and two or more predictor variables X1, X2, X3 etc. 5.1 Polynomial regression • The predictors are a single predictor, x, and its polynomial powers (x2 , x3 , etc.). • In polynomial regression, we can display the result of our multiple regression on a single two-dimensional graph. Example: modeling salary from years of experience • We want to develop a regression equation to model the relationship between Y, salary (in thousands of dollars) and x, the number of years of experience and find a 95% prediction interval for Y when x=10. Example: modeling salary from years of experience, scatter plot and residual plot if a simple linear regression is fit. Example: modeling salary from years of experience, after fitting a polynomial regression model, the random pattern in the standardized residuals indicates it is a valid model. Example: modeling salary from years of experience, leverage point Example: modeling salary from years of experience, diagnostic plots Example: modeling salary from years of experience, 95% prediction interval for 10 years of experience. > m2 <- lm(Salary~Experience + I(Experience^2)) > predict(m2,newdata=data.frame(Experience=c(10)),interval="prediction",level=0.95) fit lwr upr 1 58.11164 52.50481 63.71847 5.2 Estimation and inference in multiple linear regression • The response random variable is predicted from p predictor (explanatory) variables X1, X2, …, Xp and the relationship between Y and X1, X2, …, Xp is linear in the parameters β0 β1 β2,... βp . The ei ’s are random errors. E(Y | X1 x1, X 2 x2 ,..., X p x p ) 0 1x1 ... p x p Yi 0 1x1i ... p x pi ei Least square estimates • The least square estimates of β0 β1 β2,... βp are the values of b0 b1 b2,... bp for which the sum of the squared residuals, n n i 1 i 1 RSS ( yi yˆ i ) 2 ( yi b0 b1 x1i ... bp x pi ) 2 is minimal. xi is a vector xi=(x1i x2i … xpi). Residual sum of squares n n 2 ˆ ˆ ˆ RSS ( yi yˆi ) ( yi 0 1 x1i ... p x pi ) 2 i 1 i 1 Testing whether there is a linear association between Y and a subset/all of the predictors • H0 : β0 = β1 = β2 = … = βp = 0 • HA : at least one of the βi ≠0. • Total correlated sum of squares SST or SYY n 2 ( y y ) i i 1 • The residual sum of squares , RSS n 2 ˆ ( y y ) i i 1 • The regression sum of squares, SSreg n 2 ˆ ( y y ) i i 1 • SST=SSreg+RSS; if there is linear relationship, or if H0 is true, then SSreg should be very close to SST. F test F= SSreg / p RSS /(n p 1) Reject H if F > F a, p, n-p-1 , or if p-value <a. The F test is always is used first to test for the existence of a linear association between Y and ANY of the p x=-variables. • If the F test is significant then a natural question to ask is: for which of the p xvariables is there evidence of a linear association with Y? • H0 : β0 = β1 = β2 = … = βk = 0 where k<p i.e., reduced model Yi 0 k 1xk 1 ... p x p ei • HA : H0 is not right. Yi 0 1x1 ... p x p ei i.e., full model F test: reduced model versus full model • F=(RSS(reduced) -RSS(full) )/k / RSS(full) /(n-p-1) 5.3 Analysis of Covariance • Consider the situation in which we want to model a response variable, Y based on a continuous predictor, x and a dummy variable, d. • There are four possibilities • whether d=0 or d=1, Y x e • Parallel regression lines, d=0 Y x e d=1 Y x e • Same intercepts but different slopes d=0 Y x e d=1 Y x e 0 0 1 0 1 1 0 1 0 2 1 • Different intercepts and different slopes, called unrelated regression lines. d=0, Y x e d=1, Y ( ) x e Additive change in Y due to the dummy variable 0 1 0 2 1 3 The change in the size of the effect x on Y due to the dummy variable Example: amount spent on travel • Data are available on 925 customers, 466 purchased an adventure, 459 purchased a cultural tour • Y, amount of money spent in the last twelve months • X, age • C, dummy variable, 1 if customer purchased a cultural tour. Example: amount spent on travel, the unrelated regression lines Y 0 2C (1 3C) x e c=0, c=1, Y 0 1 x e Y 0 2 (1 3 ) x e Example: amount spent on travel > mfull <- lm(Amount~Age+C+C:Age) > summary(mfull) Call: lm(formula = Amount ~ Age + C + C:Age) Residuals: Min 1Q Median 3Q Max -143.29750 -30.54140 -0.03431 31.10816 130.74317 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1814.5445 8.6011 211.0 <2e-16 *** Age -20.3175 0.1878 -108.2 <2e-16 *** C -1821.2337 12.5736 -144.8 <2e-16 *** Age:C 40.4461 0.2724 148.5 <2e-16 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 47.63 on 921 degrees of freedom Multiple R-squared: 0.9601, Adjusted R-squared: 0.9599 F-statistic: 7379 on 3 and 921 DF, p-value: < 2.2e-16 • H0 : β2= β3 = 0 i.e., reduced model • HA : H0 is not right. i.e., full model Yi 0 1 xi ei Y 0 2C (1 3C) x e summary(mreduced) Call: lm(formula = Amount ~ Age) Residuals: Min 1Q Median 3Q Max -545.059 -199.033 6.336 198.739 497.389 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 957.9103 31.3056 30.599 <2e-16 *** Age -1.1140 0.6784 -1.642 0.101 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 237.7 on 923 degrees of freedom Multiple R-squared: 0.002913, Adjusted R-squared: 0.001833 F-statistic: 2.697 on 1 and 923 DF, p-value: 0.1009 Example: amount spent on travel, Analysis of variance, pick the model with the best fit > anova(mreduced,mfull) Analysis of Variance Table Model 1: Amount ~ Age Model 2: Amount ~ Age + C + C:Age Res.Df RSS Df Sum of Sq F Pr(>F) 1 923 52158945 2 921 2089377 2 50069568 11035 < 2.2e-16 *** --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 There is very strong evidence the reduced model in favor of the full model. Thus we prefer the unrelated regression lines model to the coincident lines model. Exercises: Menu pricing in a new Italian restaurant in New York • Actual data from 168 Italian restaurants in the target area. Develop a regression model to predict the price of a dinner. • Y, Price: price of dinner X1, Food: customer rating of food X2, Décor: customer rating of décor X3, Service: customer rating of service X4, East: dummy variable, 1( east of Fifth Avenue) • The full model: Y 0 1x1 2 x2 3 x3 4 East 5 x1 East 6 x2 East 7 x3 East e • H0 : β3= β5=β6= β7 = 0 i.e., reduced model HA : H0 is not right. i.e., full model 1. 2. 3. Use F test to pick the best model. If the aim is to choose the location of the restaurant so that the price achieved for dinner is maximized, should the new restaurant be on the east or west of Fifth Avenue? Does it seem possible to achieve a price premium for “setting a new standard for high-quality service in Manhattan” for Italian restaurants?