FURTHER TOPICS IN REGRESSION ANALYSIS (MULTIPLE LINEAR REGRESSION MODEL) Dr. E. N. Aidoo Department of Statistics and Actuarial Science en.aidoo@yahoo.com 0202901980 September, 2020 Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 1 / 39 OUTLINE 1 INTRODUCTION 2 LEAST SQUARE ESTIMATION OF THE PARAMETERS 3 MATRIX APPROACH TO MULTIPLE LINEAR REGRESSION 4 REAL DATA LAB SESSION 5 INFERENCE UNDER THE PARAMETER Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 2 / 39 INTRODUCTION Many applications of regression analysis involve situations in which there are more than one regressor variable. A regression model that contains more than one regressor variable is called a multiple regression model. Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 3 / 39 For example, suppose that the effective life of a cutting tool depends on the cutting speed and the tool angle. A possible multiple regression model could be Y = β0 + β1 x1 + β2 x2 + ε where; Y - tool life x1 - cutting speed x2 - tool angle Y denotes the dependent variable X1 and X2 denote the independent variables Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 4 / 39 Y = β0 + β1 x1 + β2 x2 + ε β1 x1 and β2 x2 are partial regression coefficients β1 measures the expected change in Y per unit change in x1 when x1 is held constant, β2 measures the expected change in Y per unit change in x2 when x1 is held constant. ε denotes the error term or residuals (the residuals is assumed to be normally distributed with mean 0 and variance constant σ 2 ) Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 5 / 39 LEAST SQUARE ESTIMATION OF THE PARAMETERS The least square function is given by n P L= ε2i = i=1 n P yi − β0 − k P !2 βj xij j=1 i=1 The least square function must satisfy ∂L ∂βj = −2 n P yi − β̂0 − ! β̂j xij =0 (1) j=1 i=1 β̂0 ,β̂1 ,...,β̂k k P and ∂L ∂βj = −2 β̂0 ,β̂1 ,...,β̂k Dr Eric Nimako Aidoo n P i=1 yi −β̂0 − k P ! β̂j xij xij = 0; j = 1, 2, ..., k (2) j=1 General Linear Regression Models September, 2020 6 / 39 Simplifying Equation (1-2), we obtain the least squares normal equations Equations nβ̂0 + β̂1 n P xi1 + β̂2 n P xi1 + β̂1 i=1 i=1 .. . β̂0 n P i=1 n P xik + β̂1 .. . n P i=1 xi2 + ··· 2 xi1 xik xi1 + β̂2 n P xi1 xi2 i=1 + β̂2 .. . n P xik = + ··· + ··· i=1 + β̂k n P xi1 xik = .. . + β̂k i=1 yi n P xi1 yi i=1 i=1 n P n P i=1 i=1 .. . xik xi2 n P + β̂k i=1 i=1 β̂0 n P .. . xik2 = n P xik yi i=1 The solution to the normal Equations are the least squares estimators of the regression coefficients Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 7 / 39 MATRIX APPROACH TO MULTIPLE LINEAR REGRESSION Suppose the model relating the regressors to the response is yi = β0 + 0 + β2 xi2 + ... + βk xik + εi i = 1, 2, ..., n In matrix notation this model can be written as y = Xβ + ε Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 8 / 39 y = Xβ + ε where 1 x11 x12 · · · y1 1 x21 x22 · · · y2 y = . X = . .. .. .. .. .. . . . 1 xn1 xn2 · · · yn Dr Eric Nimako Aidoo β0 ε1 x1k x2k β1 ε2 .. β = .. and ε = .. . . . xnk βk εn General Linear Regression Models September, 2020 9 / 39 The least square function: S(β) = n P i=1 = ε2i ε0 ε = (y − X β)0 (y − X β) = y 0 y − 2β 0 X 0 y + β 0 X 0 X β ∂S ∂β = −2X 0 y + 2X 0 X β̂ = 0 β̂ X 0 X β̂ = X 0 y normal equations β̂ = (X 0 X )−1 X 0 y Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 10 / 39 The fitted model corresponding to the levels of the regressor variable, x: ŷ = X β̂ ŷ = X (X 0 X )−1 X 0 y = Hy H: Hat matrix The hat matrix, H, is an idempotent matrix and is a symmetric matrix. i.e. H 2 = H and H T = H H is an orthogonal projection matrix. Residuals: e = y − ŷ = y − Hy = (I − H)y Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 11 / 39 Estimating σ 2 An unbiased estimator of σ 2 is σ2 n P σ̂ 2 = i=1 ei2 n−p = SSE n−p where; e represents the estimated residuals from the model p represents the number regression coefficients n represents the number of observations used Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 12 / 39 Example Fit a multiple regression to the data below using the matrix approach y 3 2 4 5 8 Dr Eric Nimako Aidoo X1 2 3 5 7 8 X2 1 5 3 6 7 General Linear Regression Models September, 2020 13 / 39 y = Xβ + ε 3 2 4 5 8 Dr Eric Nimako Aidoo 1 1 1 1 1 2 3 5 7 8 ε1 1 5 β̂0 ε2 3 β̂1 + ε = ε3 ε4 6 β̂2 ε5 7 General Linear Regression Models September, 2020 14 / 39 y = Xβ + ε . y 3 2 4 5 8 Dr Eric Nimako Aidoo 1 1 1 1 1 2 3 5 7 8 1 ε1 β̂ 5 0 ε2 3 β̂1 + ε = ε3 ε4 6 β̂2 7 ε5 General Linear Regression Models September, 2020 15 / 39 1 1 1 1 1 1 1 X’X = 2 3 5 7 8 1 1 5 3 6 7 1 1 Dr Eric Nimako Aidoo 2 3 5 7 8 1 5 25 22 5 3 = 25 151 130 22 130 120 6 7 P P PN P x12 P x2 = P x1 P x1 Px1 x22 x2 x2 x1 x2 General Linear Regression Models September, 2020 16 / 39 −1 5 25 22 1.201 −0.138 −0.071 25 151 130 = −1.138 0.114 −0.098 22 130 120 −0.071 −0.098 0.128 XX inverse Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 17 / 39 1 1 1 1 1 X’y = 2 3 5 7 8 1 5 3 6 7 Dr Eric Nimako Aidoo 3 P 3 22 P y 4 = 131 = x1 y P 5 x2 y 111 8 General Linear Regression Models September, 2020 18 / 39 β̂ = (X 0 X )−1 X 0 y 1.201 −0.138 −0.071 22 0.50 β0 −1.138 0.114 −0.098 131 = 1 = β1 −0.071 −0.098 0.128 111 −0.25 β2 ŷi = 0.50 + 1Xi1 + (−0.25)Xi2 Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 19 / 39 R codes > y = c(3,2,4,5,8) > x1 = c(2,3,5,7,8) > x2 = c(1,5,3,6,7) > > ExpA = data.frame(y,x1,x2) > ExpA 1 2 3 4 5 > y 3 2 4 5 8 x1 2 3 5 7 8 Dr Eric Nimako Aidoo x2 1 5 3 6 7 General Linear Regression Models September, 2020 20 / 39 R output > Cor(ExpA) y x1 x2 y 1.000 0.894 0.640 x1 0.894 1.000 0.814 x2 0.640 0.814 1.000 > modelA=lm(y∼x1+x2,data=ExpA) > summary(modelA) Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 21 / 39 R output Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 22 / 39 Interpretation of model parameters The equation is: ŷ = 0.5 + 1.0x1 − 0.21x2 −ŷ is expected to increase by 1 for a unit increase in x1 whilst keeping x2 constant −ŷ is expected to decrease by 0.2 for a unit increase in x2 whilst keeping x1 constant −Ŷ is expected to be 0.5 on average when x1 and x2 are zero Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 23 / 39 REAL DATA LAB SESSION Example: Actuaries Salary Survey An insurance firm collected data for a sample of 20 actuaries. A suggestion was made that regression analysis could be used to determine if salary was related to the years of experience and the score on the firm’s aptitude test. The years of experience, score on the aptitude test, and corresponding annual salary (GHc 1,000) for a sample of 20 actuaries is shown on the next slide. Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 24 / 39 Exper. 4 7 1 5 8 10 0 1 6 6 Score 78 100 86 82 86 84 75 80 83 91 Dr Eric Nimako Aidoo Salary 24.0 43.0 23.7 34.3 35.8 38.0 22.2 23.1 30.0 33.0 Exper. 9 2 10 5 6 8 4 6 3 3 Score 88 73 75 81 74 87 79 94 70 89 Salary 38.0 26.6 36.2 31.6 29.0 34.0 30.1 33.9 28.2 30.0 General Linear Regression Models September, 2020 25 / 39 Example: Actuaries Salary Survey Suppose we believe that salary (y) is related to the years of experience (x1 ) and the score on the actuaries aptitude test (x2 ) by the following regression model: y = β0 + β1 x1 + β2 x2 + ε where y = annual salary (GHc 1000) x1 = years of experience x2 = score on aptitude test Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 26 / 39 R code > Cor(salary1) y x1 x2 Salary 1.000 0.855 0.589 Exper 0.855 1.000 0.336 Score 0.589 0.336 1.000 > modelc = lm(Salary ∼ Expert + Score, data = salary1)) > summary(modelc) Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 27 / 39 R output Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 28 / 39 Interpreting the Coefficients Model Salary = 3.174 + 1.404(Exper) + 0.251(Score) Note: Predicted salary will be in thousands of cedis Salary is expected to increase by GHc 1,404 for each additional year of experience (when the variable score on attitude test is held constant). Salary is expected to increase by GHc 251 for each additional point scored on the aptitude test (when the variable years of experience is held constant). Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 29 / 39 Multiple Line Regression with three predictors y1 y2 y3 ··· y1 = = = = = β̂0 β̂0 β̂0 ··· β̂0 + β̂1 X11 + β̂1 X21 + β̂1 X31 ··· + β̂1 Xn1 + β̂2 X12 + β̂2 X22 + β̂2 X32 ··· + β̂2 Xn2 + β̂3 X13 + β̂3 X23 + β̂3 X33 ··· + β̂3 Xn3 + ε1 + ε2 + ε3 ··· + εn . In matrix notation briefly expressed: y = X β̂ + ε Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 30 / 39 Multiple Line Regression with three predictors 1 y1 y2 1 y3 = 1 · · · 1 1 yn Dr Eric Nimako Aidoo X11 X21 X31 ··· Xn1 X12 X22 X32 ··· Xn2 X13 X23 X33 ··· Xn3 ε1 ε2 β̂0 β̂1 = ε3 · · · β̂2 εn General Linear Regression Models September, 2020 31 / 39 yi = β̂0 + β̂1 Xi1 + β̂2 Xi2 + β̂i3 + εi Try it your self Use R to fit a multiple regression model to this data Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 32 / 39 ANOVA IN MULTIPLE REGRESSION Sum of Squares The least squares method allows to check the following equality: ε0 ε = (y − X β̂)0 (y − X β̂) = y 0 y − 2β̂ 0 X 0 y + β̂ 0 X 0 X β̂ = y 0 y − 2β̂ 0 X 0 y + β̂ 0 X 0 X [X 0 X ]−1 X 0 y = y 0 y − 2β̂ 0 X 0 y + β̂X 0 y = y 0 y − β̂ 0 X 0 y Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 33 / 39 Partition of Sum of Squares Since in general: = y 0 y − β̂ 0 X 0 y it’s possible to derive that the sum of squares of the distance of y from its average can be decomposed into the sum of squares due to regression and the sum of squares due to error, according to: y 0 y − nȳ 2 = (β̂ 0 X 0 y − nȳ 2 ) + ε0 ε y 0 y − nȳ 2 = (β̂ 0 X 0 y − nȳ 2 ) + (y 0 y − β̂ 0 X 0 y ) Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 34 / 39 In summary: SSreg SSres SStot = β̂ 0 X 0 y − P ( y )2 n = y 0 y − β̂ 0 X 0 y = y 0 y − nȳ 2 SStot = SSres + SSreg Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 35 / 39 ANOVA Table for Salary Example Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 36 / 39 Coefficient of Determination Coefficient of Determination R 2 R 2 = SSR/SST R 2 = 500.3285/599.7855 = 0.83418 For the salary data, we find that R2 = 0.83 Thus, the model accounts for about 83% of the variability in the salary response Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 37 / 39 Adjusted Coefficient of Determination Ra2 Because the coefficient of determination depends on both the number of observation (n) and the number of independent variables (p) it is convenient to correct by the degrees of freedom. Hence, the use of adjusted coefficient of determination. Adjusted Coefficient of Determination Ra2 Ra2 = 1 − (1 − R 2 ) Ra2 = 1 − (1 − 0.834179) Dr Eric Nimako Aidoo n−1 n−p−1 20 − 1 = 0.814671 20 − 2 − 1 General Linear Regression Models September, 2020 38 / 39 R 2 and Ra2 (The output from R software) Dr Eric Nimako Aidoo General Linear Regression Models September, 2020 39 / 39