Effects plots, collinearity, log-transformed response variable Linear Regression Models Statistics 4205/5205 Fall 2020 October 12, 2020 1 Agenda (Chapter 4) 1. Effects plots 2. Collinearity 3. Response variable in log scale 4. Dropping regressors 5. Lurking variables 6. Multivariate normality, R2 2 Effects plots An effects plot is a graphical summary of the effect of XJ in the multiple linear regression model 0 x E(Y |Xj = xj , X(j) = x(j)) = β0 + βj xj + β(j) (j) where the subscript j indicates the jth variable, and the subscript (j) indicates everything else. In an effects plot we set x(j) equal to the sample mean values, and let the value of xj vary. Compute 0 x̄ Ê(Y |Xj = xj , X(j) = x̄(j)) = β̂0 + β̂j xj + β̂(j) (j) and plot it versus xj . 3 Add the 95% pointwise confidence limits Ê ± t(.975, n − p − 1)se.fit(ŷ|x∗) for x∗ = (xj , x(j)), again letting xj vary. It is interesting to do this for xj in the model 0 x E(Y |Xj = xj , X(j) = x(j)) = β0 + βj xj + β(j) (j) . • The effects plot for log xj is linear; • the effects plot for xj is a curve. Ex: Courseworks → Files → Examples → Example10a EffectsPlots 4 Linear transformations of regressors Example: Berkeley Guidance Study Response variable is BMI18, predictor variables are WT2, WT9, WT18 m1 <- lm(BMI18 ~ WT2 + WT9 + WT18) Now define two new regressors, DW9 = WT9 − WT2 and DW18 = WT18 − WT9 and fit m2 <- lm(BMI18 ~ WT2 + DW9 + DW18) 5 Coefficients change in predictable ways Denote the coefficients in m2 by γj , it must be that β0 + β1x1 + β2x2 + β3x3 = γ0 + γ1x1 + γ2(x2 − x1) + γ3(x3 − x2) for all x. Thus β0 = γ0 and β1 = γ1 − γ2 and β2 = γ2 − γ3 and β3 = γ3 and thus γ0 = β0 and γ1 = β1 + β2 + β3 and γ2 = β2 + β3 and γ3 = β3 and these relationships hold equally for all β̂j and γ̂j . 6 Fitted values, residuals, σ̂ and R2 don’t change at all Since the reparameterization requires that β 0x = γ 0z where z = Ax, it must also be the case that β̂ 0x = γ̂ 0z for all z = Ax, and thus the fitted values ŷi = β̂ 0xi = γ̂zi do not change under this reparameterization. If the fitted values ŷ are unchanged, then so are the residuals ê = y −ŷ , and is the residual sum of squares RSS = (y −ŷ )0(y −ŷ ). If the residual sum of squares is unchanged, then so is the estimated variance σ̂ 2, and the coefficient of determination R2. Ex: Courseworks → Files → Examples → Example10b Aliasing 7 Collinearity Consider the nmultiple linear regression model in matrix form Y = Xβ + e where E(e) = 0 and Var(e) = σ 2I . The least squares estimate of β is given by β̂ = (X0X)−1X0y . 8 If there exists a (p + 1)-vector a such that Xa = 0 then • (X0X)−1 does not exist, and thus • there is no unique solution β̂ . • The model is overparametrized. • At least one column of X is redundant. 9 This is sometimes called aliasing, and results from being careless in defining our regressors. Ex: Courseworks → Files → Examples → Example10b Aliasing This situation is easily resolved, just get rid of the redundant regressor(s) and move on (though there are necessarily multiple ways this can be accomplished). A situation more commonly encountered in practice is the following. 10 Suppose there exists a (p + 1)-vector a such that Xa ≈ 0. This is called collinearity. Example: MinnWater data See Courseworks → Files → Examples → Example10c Collinearity Consider the regression lm(log(MuniUse) ~ year + muniPrecip + log(muniPop) The regressors year and log(muniPop) are highly correlated. 11 Why collinearity is a concern If Xa = 0, then (X0X)−1 doesn’t exist — it is like trying to divide by zero. If Xa ≈ 0, then (X0X)−1 blows up — we can think of this as like dividing by a number very close to zero. In this case Var(β̂ ) = σ 2(X0X)−1 likewise blows up. Ex: Courseworks → Files → Examples → Example10c Collinearity The regressors year and log(muniPop) are highly correlated, so estimates of both coefficients have very high standard errors. 12 Response variable in logarithmic scale Suppose 0 x E[log(Y )|Xj = xj , Xj = x(j)] = β0 + βj xj + β(j) (j) . Then E(Y |Xj = xj , X(j) = x(j)) ≈ exp {E[log(Y )|X = x]} o n o n 0 = exp βj xj exp β0 + β(j)x(j) and E(Y |Xj = xj + 1,X(j) = x(j)) n o n o 0 ≈ exp βj (xj + 1) exp β0 + β(j)x(j) ≈ eβj E(Y |Xj = xj , X(j) = x(j)) 13 Thus a one-unit increase in xj , holding other variables fixed, implies an eβj multiplicative change in E(Y ). Example: UN11 data \ ) = 3.507 − .065 log(ppgdp) − .028 lifeExpF log(fertility Given two countries with the same ppgdp, the second one has a higher lifeExpF by one year, we expect that country to have the lower fertility rate by e−.028 = .972, i.e., lower by 2.8%. 14 Regressor and response in log-scale Now consider 0 x E[log(Y )|Xj = xj , X(j) = x(j)] = β0 + βj log xj + β(j) (j) . Then E(Y |Xj = xj , X(j) = x(j)) ≈ exp {E[log(Y )|X = x]} n o βj 0 = xj exp β0 + β(j)x(j) and E(Y |Xj = cxj ,X(j) = x(j)) n o β 0 ≈ (cxj ) j exp β0 + β(j)x(j) ≈ cβj E(Y |Xj = xj , X(j) = x(j)) 15 Thus, changing xj by a factor of c, holding other variables fixed, implies a cβj multiplicative change in E(Y ). Example: UN11 data \ ) = 3.507 − .065 log(ppgdp) − .028 lifeExpF log(fertility Given two countries with the same lifeExpF, the second one has double the ppgdp of the first, we expect the second country to have the lower fertility rate by 2−.065 = .956, i.e., lower by 4.4%. 16