Objectives of Multiple Regression • Establish the linear equation that best predicts values of a dependent variable Y using more than one explanatory variable from a large set of potential predictors {x1, x2, ... xk}. • Find that subset of all possible predictor variables that explains a significant and appreciable proportion of the variance of Y, trading off adequacy of prediction against the cost of measuring more predictor variables. 15-1 Expanding Simple Linear Regression • Quadratic model. Y = b0 + b1x1 + b2x1 + e 2 • General polynomial model. Y = b0 + b1 x 1 + b2 x 1 2 + b3x13 + ... + bkx1k + e Adding one or more polynomial terms to the model. Any independent variable, xi, which appears in the polynomial regression model as xik is called a kth-degree term. 15-2 Polynomial model shapes. 10 20 30 x Adding one more terms to the model significantly improves the model fit. 40 1.5 2.0 2.5yn 3.0 3.5 1.5 2.0 2.5y 3.0 3.5 Linear 10 Quadratic 20 30 40 xn 15-3 Incorporating Additional Predictors Simple additive multiple regression model y = b0 + b1x1 + b2x2 + b3x3 + ... + bkxk + e Additive (Effect) Assumption - The expected change in y per unit increment in xj is constant and does not depend on the value of any other predictor. This change in y is equal to bj. 15-4 Additive regression models: For two independent variables, the response is modeled as a surface. 15-5 Interpreting Parameter Values (Model Coefficients) • “Intercept” - value of y when all predictors are 0. b0 • “Partial slopes” b1, b2, b3, ... bk bj - describes the expected change in y per unit increment in xj when all other predictors in the model are held at a constant value. 15-6 Graphical depiction of bj. b1 - slope in direction of x1. b2 - slope in direction of x2. 15-7 Multiple Regression with Interaction Terms Y = b0 + b1x1 + b2x2 + b3x3 + ... + bkxk + b12x1x2 + b13x1x3 + cross-product terms quantify the interaction among predictors. ... + b1kx1xk + ... + bk-1,kxk-1xk + e Interactive (Effect) Assumption: The effect of one predictor, xi, on the response, y, will depend on the value of one or more of the other predictors. 15-8 Interpreting Interaction Interaction Model y i b 0 b 1 x i1 b 2 x i 2 b 12 x i1 x i 2 e i or Define: x i 3 x i1 x i 2 No difference y i b 0 b 1 x i1 b 2 x i 2 b 12 x i 3 e i b1 – No longer the expected change in Y per unit increment in X1! b12 – No easy interpretation! The effect on y of a unit increment in X1, now depends on X2. 15-9 x2=2 no-interaction y }b } x2=1 x2=0 2 b2 b1 b0 x1 y b1 b 0+ 2 b 2 b0+b2 b0 x2=0 interaction x2=1 b1+2b12 x2=2 x1 15-10 Multiple Regression models with interaction: Lines move apart Lines come together 15-11 Effect of the Interaction Term in Multiple Regression Surface is twisted. 15-12 A Protocol for Multiple Regression Identify all possible predictors. Establish a method for estimating model parameters and their standard errors. Develop tests to determine if a parameter is equal to zero (i.e. no evidence of association). Reduce number of predictors appropriately. Develop predictions and associated standard error. 15-13 Estimating Model Parameters Least Squares Estimation Assuming a random sample of n observations (yi, xi1,xi2,...,xik), i=1,2,...,n. The estimates of the parameters for the best predicting equation: yˆ i bˆ 0 bˆ 1 x i1 bˆ 2 x i 2 bˆ k x ik Is found by choosing the values: bˆ 0 , bˆ 1 , , bˆ k which minimize the expression: n SSE ( y i yˆ i ) 2 i1 n ( y i b 0 b 1x i1 b 2 x i 2 b k x ik ) 2 i 1 15-14 Normal Equations Take the partial derivatives of the SSE function with respect to b0, b1,…, bk, and equate each equation to 0. Solve this system of k+1 equations in k+1 unknowns to obtain the equations for the parameter estimates. n 1 y i n bˆ 0 i1 n x i1 i1 x i1bˆ 0 i1 bˆ 1 i1 n x i1bˆ 1 2 x ik bˆ k x i1 x ik bˆ k i1 n i1 i1 n x ik x n i1 n x i1 y i n i1 n x ik y i i1 x ik bˆ 0 n i1 x ik x i1bˆ 1 n x ik bˆ k 2 i1 15-15 An Overall Measure of How Well the Full Model Performs Coefficient of Multiple Determination • Denoted as R2. • Defined as the proportion of the variability in the dependent variable y that is accounted for by the independent variables, x1, x2, ..., xk, through the regression model. • With only one independent variable (k=1), R2 = r2, the square of the simple correlation coefficient. 15-16 Computing the Coefficient of Determination R 2 y x1 x 2 x k SSR S yy SSE S yy , 0 R 1 2 S yy n S yy ( y i y ) TSS 2 i 1 n SSE 2 ( y i yˆ i ) y bˆ 0 y i bˆ1 x1 i y i bˆ k x ik y i i 1 n i 1 n n n i 1 i 1 i 1 2 i 15-17 Multicollinearity A further assumption in multiple regression (absent in SLR), is that the predictors (x1, x2, ... xk) are statistically uncorrelated. That is, the predictors do not co-vary. When the predictors are significantly correlated (correlation greater than about 0.6) then the multiple regression model is said to suffer from problems of multicollinearity. r=0 r = 0.8 r = 0.6 5 5 6 4 3 x2 4 x2 x2 3 1 2 1 0 2 1 1 2 3 x 1 4 5 1 0 1 2 3 x 1 4 5 6 0 2 4 6 x 1 15-18 Effect of Multicollinearity on the Fitted Surface Extreme collinearity y x x x xx x x 2 x x x1 15-19 Multicollinearity leads to • Numerical instability in the estimates of the regression parameters – wild fluctuations in these estimates if a few observations are added or removed. • No longer have simple interpretations for the regression coefficients in the additive model. Ways to detect multicollinearity • Scatterplots of the predictor variables. • Correlation matrix for the predictor variables – the higher these correlations the worse the problem. • Variance Inflation Factors (VIFs) reported by software packages. Values larger than 10 usually signal a substantial amount of collinearity. What can be done about multicollinearity • Regression estimates are still OK, but the resulting confidence/prediction intervals are very wide. • Choose explanatory variables wisely! (E.g. consider omitting one of two highly correlated variables.) • More advanced solutions: principal components analysis; ridge regression. 15-20 Testing in Multiple Regression • Testing individual parameters in the model. • Computing predicted values and associated standard errors. Y b 0 b1 X 1 b k X k e, e ~ N (0, 2 ) Overall AOV F-test H0: None of the explanatory variables is a significant predictor of Y F Reject if: SSR / k SSE /( n k 1) MSR MSE F F k , n k 1 , 15-21 Standard Error for Partial Slope Estimate The estimated standard error for: s bˆ ˆ e j and bˆ j 1 n ( k 1) where S x j x j (1 R x j x1 x 2 x j 1 x j 1 x k ) 2 2 R x j x1 x 2 x j 1 x j 1 x k n S xjxj 0 s bˆ j ij xj) 2 i 1 If there is high dependency? R x j x 1 x 2 x j 1 x j 1 x k 1 2 2 x j x1 x 2 x j 1 x j 1 x k (x is the coefficient of determination for the model with xj as the dependent variable and all other x variables as predictors. What happens if all the predictors are truly independent of each other? R SSE ˆ e ˆ e s bˆ j S xjxj 15-22 Confidence Interval 100(1-)% Confidence Interval for bˆ j bˆ j t n ( k 1), s bˆ 2 j Reflects the number of data points minus the number of parameters that have to be estimated. df for SSE 15-23 Testing whether a partial slope coefficient is equal to zero. H0 bj 0 Rejection Region: Alternatives: Ha bj 0 t t n ( k 1), bj 0 t t n ( k 1), bj 0 | t | t n ( k 1), Test Statistic: t 2 bˆ j s bˆ j 15-24 Predicting Y • We use the least squares fitted value, yˆ , as our predictor of a single value of y at a particular value of the explanatory variables (x1, x2, ..., xk). • The corresponding interval about the predicted value of y is called a prediction interval. • The least squares fitted value also provides the best predictor of E(y), the mean value of y, at a particular value of (x1, x2, ..., xk). The corresponding interval for the mean prediction is called a confidence interval. • Formulas for these intervals are much more complicated than in the case of SLR; they cannot be calculated by hand (see the book). 15-25 Minimum R2 for a “Significant” Regression Since we have formulas for R2 and F, in terms of n, k, SSE and TSS, we can relate these two quantities. We can then ask the question: what is the min R2 which will ensure the regression model will be declared significant, as measured by the appropriate quantile from the F distribution? The answer (below), shows that this depends on n, k, and SSE/TSS. R 2 min k SSE n k 1 TSS F k , n k 1 , 15-26 Minimum R2 for Simple Linear Regression (k=1) 15-27