Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung 1 3.1 Multiple Regression Models • Multiple regression model: involve more than one regressor variable. • Example: The yield in pounds of conversion depends on temperature and the catalyst concentration. 2 • E(y) = 50 +10 x1 + 7 x2 3 • The response y may be related to k regressor or predictor variables: (multiple linear regression model) • The parameter j represents the expected change in the response y per unit change in xi when all of the remaining regressor variables xj are held constant. 4 • Multiple linear regression models are often used as the empirical models or approximating functions. (True model is unknown) • The cubic model: • The model with interaction effects: • Any regression model that is linear in the parameters is a linear regression model, regardless of the shape of the surface that it generates. 5 6 • The second-order model with interaction: 7 8 3.2 Estimation of the Model Parameters 3.2.1 Least-squares Estimation of the Regression Coefficients • n observations (n > k) • Assume – The error term , E() = 0 and Var() = 2 – The errors are uncorrelated. – The regressor variables, x1,…, xk are fixed. 9 • The sample regression model: • The least-squares function: • The normal equations: 10 • Matrix notation: 11 • The least-squares function: 12 • The fitted model corresponding to the levels of the regressor variable, x: • The hat matrix, H, is an idempotent matrix and is a symmetric matrix. i.e. H2 = H and HT = H • H is an orthogonal projection matrix. • Residuals: 13 • Example 3.1 The Delivery Time Data – y: the delivery time, – x1: the number of cases of product stocked, – x2: the distance walked by the route driver – Consider y = 0 + 1 x1 + 2 x2 + 14 15 16 3.2.2 A Geometrical Interpretation of Least Square • y = (y1,…,yn) is the vector of observations. • X contains p (p = k+1) column vectors (n ×1), i.e. X = (1,x1,…,xk) • The column space of X is called the estimation space. • Any point in the estimation space is X. • Minimize square distance S()=(y-X)’(y-X) 17 • Normal equation: X ' ( y Xˆ ) 0 18 3.2.3 Properties of the Least Square Estimators • Unbiased estimator: E(ˆ ) E((X ' X ) 1 X ' y) E((X ' X ) 1 X ' X ) • Covariance matrix: Cov(ˆ ) 2 ( X ' X ) 1 • Let C=(X’X)-1 • The LSE is the best linear unbiased estimator • LSE = MLE under normality assumption 19 3.2.4 Estimation of 2 • Residual sum of squares: SSRe s e' e ( y Xˆ )' ( y Xˆ ) y ' y 2ˆ ' X ' y ˆ ' ( X ' X ) ˆ y ' y ˆ ' X ' y • The degree of freedom: n – p • The unbiased estimator of 2: Residual mean squares SSRe s MS Re s n p 20 • Example 3.2 The Delivery Time Data • Both estimates are in a sense correct, but they depend heavily on the choice of model. • The model with small variance would be better. 21 3.2.5 Inadequacy of Scatter Diagrams in Multiple Regression • For the simple linear regression, the scatter diagram is an important tool in analyzing the relationship between y and x. • However it may not be useful in multiple regression. – y = 8 – 5 x1 + 12 x2 – The y v.s. x1 plot do not exhibit any apparent relationship between y and x1 – The y v.s. x2 plot indicates the linear relationship with the slope 8. 22 23 • In this case, constructing scatter diagrams of y v.s. xj (j = 1,2,…,k) can be misleading. • If there is only one (or a few) dominant regressor, or if the regressors operate nearly independently, the matrix scatterplots is most useful. 24 3.2.6 Maximum-Likelihood Estimation • The Model is y = X + • ~N(0, 2I) • The likelihood function and log-likelihood function: L( , ) 2 1 exp(( y X )' ( y X ) /(2 2 )) (2 2 ) n / 2 n 1 2 2 l ( , ) (ln(2 ) ln( )) ( y X )' ( y X ) 2 2 2 • The MLE of 2 25 3.3 Hypothesis Testing in Multiple Linear Regression • Questions: – What is the overall adequacy of the model? – Which specific regressors seem important? • Assume the errors are independent and follow a normal distribution with mean 0 and variance 2 26 3.3.1 Test for Significance of Regression • Determine if there is a linear relationship between y and xj, j = 1,2,…,k. • The hypotheses are H0: β1 = β2 =…= βk = 0 H1: βj 0 for at least one j • ANOVA • SST = SSR + SSRes • SSR/2 ~ 2k, SSRes/2 ~ 2n-k-1, and SSR and SSRes are independent SSR / k MS R F0 ~ Fk ,nk 1 SSRe s /(n k 1) MS Re s 27 • E ( MS Re s ) 2 *' ' * X X c c E ( MS R ) 2 k 2 * ( 1 ,..., k )' x11 x1 x1k x k Xc x x x x nk k n1 1 • Under H1, F0 follows F distribution with k and nk-1 and a noncentrality parameter of *' X c' X c * 2 28 • ANOVA table 29 30 • Example 3.3 The Delivery Time Data 31 • R2 and Adjusted R2 – R2 always increase when a regressor is added to the model, regardless of the value of the contribution of that variable. – An adjusted R2: 2 adj R SSRe s /(n p) 1 SST /(n 1) – The adjusted R2 will only increase on adding a variable to the model if the addition of the variable reduces the residual mean squares. 32 3.3.2 Tests on Individual Regression Coefficients • For the individual regression coefficient: – H0: βj = 0 v.s. H1: βj 0 – Let Cjj be the j-th diagonal element of (X’X)-1. The test statistic: ˆ j ˆ j t0 ~ t n k 1 se( ˆ ) ˆ 2 C jj j – This is a partial or marginal test because any estimate of the regression coefficient depends on all of the other regression variables. – This test is a test of contribution of xj given the other regressors in the model 33 • Example 3.4 The Delivery Time Data 34 • The subset of regressors: 35 • For the full model, the regression sum of square SSR ( ) ˆ ' X ' y • Under the null hypothesis, the regression sum of squares for the reduce model SS ( ) ˆ ' X ' y R 1 1 1 • The degree of freedom is p-r for the reduce model. • The regression sum of square due to β2 given β1 SSR ( 2 | 1 ) SSR ( ) SSR (1 ) • This is called the extra sum of squares due to β2 and the degree of freedom is p - (p - r) = r • The test statistic SSR ( 2 | 1 ) / r F0 ~ Fr ,n p MS Re s 36 • If β2 0, F0 follows a noncentral F distribution with 1 2 2' X 2' [ I X 1 ( X 1' X 1 ) 1 X 1' ] X 2 2 • Multicollinearity: this test actually has no power! • This test has maximal power when X1 and X2 are orthogonal to one another! • Partial F test: Given the regressors in X1, measure the contribution of the regressors in X2. 37 • Consider y = β0 + β1 x1 + β2 x2 + β3 x3 + SSR(β1| β0 , β2, β3), SSR(β2| β0 , β1, β3) and SSR(β3| β0 , β2, β1) are signal-degree-of – freedom sums of squares. • SSR(βj| β0 ,…, βj-1, βj, … βk) : the contribution of xj as if it were the last variable added to the model. • This F test is equivalent to the t test. • SST = SSR(β1 ,β2, β3|β0) + SSRes • SSR(β1 ,β2 , β3|β0) = SSR(β1|β0) + SSR(β2|β1, β0) + SSR(β3 |β1, β2, β0) 38 • Example 3.5 Delivery Time Data 39 3.3.3 Special Case of Orthogonal Columns in X • Model: y = Xβ + = X1β1+ X2β2 + • Orthogonal: X1’X2 = 0 • Since the normal equation (X’X)β= X’y, X 1' X 1 0 0 ˆ1 X 1' y ' ' X 2 X 2 ˆ 2 X 2 y • ˆ1 ( X 1' X 1 ) 1 X 1' y and ˆ2 ( X 2' X 2 ) 1 X 2' y 40 41 3.3.4 Testing the General Linear Hypothesis • Let T be an m p matrix, and rank(T) = r • Full model: y = Xβ + SSRe s (FM ) y' y ˆ ' X ' y (n - p degree of freedom) • Reduced model: y = Z + , Z is an n (p-r) matrix and is a (p-r) 1 vector. Then ˆ (Z ' Z ) 1 Z ' y SSRe s ( RM ) y' y ˆ' Z ' y (n - p r degree of freedom) • The difference: SSH = SSRes(RM) – SSRes(FM) with r degree of freedom. SSH is called the sum of squares due to the hypothesis H0: Tβ = 0 42 • The test statistic: SSH / r F ~ Fr ,n p SSRe s ( FM ) /(n p) 43 44 • Another form: ˆ 'T '[T ( X ' X ) 1 T ' ]1 Tˆ / r F SSRe s ( FM ) /(n p) • H0: Tβ = c v.s. H1: Tβ c Then (Tˆ c)'[T ( X ' X ) 1 T ' ]1 (Tˆ c) / r F ~ Fr ,n p SSRe s ( FM ) /(n p) 45