Multiple Regression I KNNL – Chapter 6 Models with Multiple Predictors • Most Practical Problems have more than one potential predictor variable • Goal is to determine effects (if any) of each predictor, controlling for others • Can include polynomial terms to allow for nonlinear relations • Can include product terms to allow for interactions when effect of one variable depends on level of another variable • Can include “dummy” variables for categorical predictors First-Order Model with 2 Numeric Predictors Yi 0 1 X i1 2 X i 2 i E i 0 E Y 0 1 X1 2 X 2 Plane in 3-dimensions E(Y)=100-5X1+10X2 200 E(Y) 150 100 150-200 50 100-150 01 4 7 10 X1 13 16 19 50-100 0-50 Interpretation of Regression Coefficients • Additive: E{Y} = 0 1X1 + 2X2 ≡ Mean of Y @ X1, X2 • 0 ≡ Intercept, Mean of Y when X1=X2=0 • 1 ≡ Slope with Respect to X1 (effect of increasing X1 by 1 unit, while holding X2 constant) • 2 ≡ Slope with Respect to X2 (effect of increasing X2 by 1 unit, while holding X1 constant) • These can also be obtained by taking the partial derivatives of E{Y} with respect to X1 and X2, respectively • Interaction Model: E{Y} = 0 1X1 + 2X2 + 3X1X2 • When X2 = 0: Effect of increasing X1 by 1: 1(1)+3(1)(0)= 1 • When X2 = 1: Effect of increasing X1 by 1: 1(1)+3(1)(1)= 1+3 • The effect of increasing X1 depends on level of X2, and vice versa General Linear Regression Model Yi 0 1 X i1 2 X i 2 ... p 1 X i , p 1 i p 1 Yi 0 k X ik i k 1 p 1 Yi k X ik i k 0 where: X i 0 1 E i 0 E Y 0 1 X 1 2 X 2 ... p 1 X p 1 (Hyperplane in p -dimensions) p 1 1 Simple linear regression Normality, independence, and constant variance for errors: i ~ NID 0, 2 Yi ~ N 0 1 X i1 2 X i 2 ... p 1 X i , p 1 , 2 Yi , Y j 0 i j Special Types of Variables/Models - I • p-1 distinct numeric predictors (attributes) Y = Sales, X1=Advertising, X2=Price • Categorical Predictors – Indicator (Dummy) variables, representing m-1 levels of a m level categorical variable Y = Salary, X1=Experience, X2=1 if College Grad, 0 if Not • Polynomial Terms – Allow for bends in the Regression Y=MPG, X1=Speed, X2=Speed2 • Transformed Variables – Transformed Y variable to achieve linearity Y’=ln(Y) Y’=1/Y Special Types of Variables/Models - II • Interaction Effects – Effect of one predictor depends on levels of other predictors Y = Salary, X1=Experience, X2=1 if Coll Grad, 0 if Not, X3=X1X2 E(Y) = 0 1X1 2X2 3X1X2 Non-College Grads (X2 =0): E(Y) = 0 1X1 2(0) 3X1(0)= 0 1X1 College Grads (X2 =1): E(Y) = 0 1X1 2(1) 3X1(1)= 0 21 3 X1 • Response Surface Models E(Y) = 0 1X1 2X12 3X2 4X22 5X1X2 • Note: Although the Response Surface Model has polynomial terms, it is linear wrt Regression parameters Matrix Form of Regression Model Yi 0 1 X i1 2 X i 2 ... p 1 X i , p 1 i i 1,..., n Matrix Form: Y1 Y Y 2 n1 Yn 1 X 11 1 X 21 X n p 1 X n1 0 1 β p1 p 1 1 ε 2 n1 n Y X β ε n1 n p p1 n1 X 12 X 22 X n2 X 1, p 1 X 2, p 1 X n , p 1 0 0 E ε n×1 0 E Y = EX β + ε X β n×1 n×p p×1 n×1 n p p1 2 0 2 0 2 σ ε n1 0 0 σ 2 Y 2I n1 0 0 2I 2 Least Squares Estimation of Regression Coefficients Goal: Minimize: Q Yi 0 1 X i1 ... p 1 X i , p 1 n i 1 n 2 i 2 i 1 Obtain Estimates of 0 , 1 ,..., p 1 that minimize Q b0 , b1 ,..., bp 1 b0 b -1 1 Normal Equations: X ' X b X ' Y b X'X X'Y p p p1 p1 p1 b p 1 Maximum Likelihood also leads to the same estimator b : L β, 2 2 2 n /2 1 exp 2 2 Y X ... X i 0 1 i1 p 1 i , p 1 i 1 n 2 Y n since maximizing L involves minimizing i 1 i 0 1 X i1 ... p 1 X i , p 1 2 Fitted Values and Residuals ^ Y 1 ^ ^ Fitted Values: Y Y 2 n1 Y^ n ^ e1 e Residuals: e 2 n1 en Y X b X X'X X'Y = HY H = X X'X X' H = H' = HH -1 n1 -1 n p p1 ^ e = Y Y Y X b Y X X'X X'Y = I H Y n1 -1 n1 n1 2 ^ n1 n p p1 σ Y = σ HY = Hσ Y H' = σ H 2 2 2 I H I H ' I H I H ^ s Y MSE H 2 σ 2 e = σ 2 I H Y = I H σ 2 Y I H ' = σ 2 I H s 2 e MSE I H Analysis of Variance – Sums of Squares ^ Y 1 ^ ^ Y Y 2 Xb HY n1 Y^ n Y1 Y Y 2 n1 Yn n SSTO Yi Y i 1 Y 1 Y 1 Y n1 n 1 Y Y1 1 Y2 1 JY n 1 Yn Y Y ' Y Y Y' I 1n J Y 2 2 ^ ^ ^ SSE Yi Y i Y Y ' Y Y Y' I H Y Y'Y - b'X'Y i 1 n MSE 1 1 ^ ^ ^ SSR Y i Y Y Y ' Y Y Y' H J Y b'X'Y Y' JY n n i 1 SSE n p 2 n MSR E MSE 2 p 1 p 1 E MSR SS kk k k ' SS kk ' 2 k 1 E MSR E MSE 2 k k 1 k ' k n SS kk ' X ik X k i 1 X E MSR E MSE 1 ... p 1 0 ik ' X k' SSR p 1 ANOVA Table, F-test, and R2 Analysis of Variance (ANOVA) Table Source df Sum of Squares Mean Square ------------------------------------------------------------------------------------------------------------------------------------Regression p 1 1 SSR b'X'Y - Y' JY n Error n p SSE Y'Y - b'X'Y Total n 1 1 SSTO Y'Y - Y' JY n Test of H 0 : 1 ... p 1 0 Test Statistic: F * MSR MSE E Y MSR MSE SSR p 1 SSE n p H A : Not all k 0 0 Rejection Region: F * F 1 ; p 1, n p P-value= Pr F p 1, n p F * Coefficient of Multiple Determination: R 2 SSR SSE 1 SSTO SSTO SSE n p 1 n 1 SSE 2 Adjusted-R 1 SSTO n p SSTO n 1 Correlation: R R 2 places a "penalty" on models with extra predictors Inferences Regarding Regression Parameters E ε = 0 Y = Xβ + ε σ 2 ε = 2I E Y = Xβ σ 2 Y = 2I E b = E X'X X'Y = X'X X'E Y = X'X X'Xβ = β -1 -1 -1 b0 , bp 1 2 b0 b0 , b1 b1 , b0 2 b1 2 σ b bp 1 , b0 bp 1 , b1 σ 2 b = σ 2 X'X -1 s 2 b = MSE X'X bk k ~ tn p s bk b1 , bp 1 2 bp 1 s 2 b0 s b0 , b1 s b1 , b0 s 2 b1 2 s b s bp 1 , b0 s bp 1 , b1 X'Y X'X X'σ 2 Y X'X X' ' 2 X'X X'X X'X 2 X'X -1 -1 -1 -1 1 100% CI for k bk t 1 Test of H 0 : k 0 H A : k 0 Test Statistic: t* Rejection Region: t * t 1 ; n p 2 ; n p s bk 2 bk s bk P-value=2Pr t n p t * Simultaneous 1 100% CI s for g p s : bk t 1 ; n p s bk 2g -1 s b0 , b p 1 s b1 , b p 1 s 2 bp 1 -1 Estimating Mean Response at Specific X-levels Given set of levels of X 1 ,..., X p 1 : X h1 ,..., X h , p 1 1 X h1 Xh p1 X h , p 1 ^ E Yh X β ' h ^ E Yh X β Y h X'h b ' h ^ Y h X σ b X h X X'X X h 2 1 100% CI for E ^ Yh ' h 2 2 -1 ' h ^ s Y h MSE X 'h X'X X h 2 -1 ^ : Y h t 1 ; n p s Y h 2 ^ 1 100% Confidence Region for Regression Surface: ^ ^ ^ 1 100% CI for several (g ) E Y h : Y h Bs Y h ^ ^ Y h Ws Y h W 2 pF 1 ; p, n p B t 1 ;n p 2g Predicting New Response(s) at Specific X-levels Given set of levels of X 1 ,..., X p 1 : X h1 ,..., X h , p 1 1 X h1 Xh p1 X h , p 1 E Yh X β ' h s 2 pred MSE 1 X'h X'X X h -1 ^ Y h X'h b -1 1 mean of m observations (at same X-levels): s 2 predmean MSE X'h X'X X h m 1 100% CI for Yh (new) : Y h t 1 ; n p s pred 2 ^ ^ Scheffe: 1 100% CI for several (g ) Yh (new) : Y h Ss pred ^ Bonferroni: 1 100% CI for several (g ) Yh (new) : Y h Bs pred S 2 gF 1 ; g , n p B t 1 ;n p 2g