Multiple regression: OLS method (Mostly from Maddala) The Ordinary Least Squares method of estimation can easily be extended to models involving two or more explanatory variables, though the algebra becomes progressively more complex. In fact, when dealing with the general regression problem with a large number of variables, we use matrix algebra, but that is beyond the scope of this course. We illustrate the case of two explanatory variables, X1 and X2, with Y the dependant variable. We therefore have a model Yi = α + 1X1i + 2X2i + ui Where ui~N(0,σ2). We look for estimators ˆ , ˆ1 , ˆ 2 so as to minimise the sum of squared errors, n S= (Yi ˆ ˆ1 X 1i ˆ2 X 2i ) 2 i 1 Differentiating, and setting the partial differentials to zero we get n S 2(Yi ˆ ˆ1 X 1i ˆ 2 X 2i )( 1) =0 ˆ i 1 (1) n S 2(Yi ˆ ˆ1 X 1i ˆ 2 X 2i )( X 1i ) =0 ˆ1 i 1 (2) n S 2(Yi ˆ ˆ1 X 1i ˆ 2 X 2i )( X 2i ) =0 ˆ 2 i 1 (3) These three equations are called the “normal equations”. They can be simplified as follows: Equation (1) can be written as n n n i 1 i 1 i 1 Yi nˆ ˆ1 X 1i ˆ2 X 2i or Y ˆ ˆ1 X 1 ˆ 2 X 2 (4) Where the bar over Y, X1 and X2 indicates sample mean. Equation (3) can be written as n n n n i 1 i 1 i 1 i 1 X 1iYi ˆ X 1i ˆ1 X 1i 2 ˆ2 X 1i X 2i Substituting in the value of ̂ from (4), we get X1iYi nX1 (Y ˆ1 X1 ˆ2 X 2 ) ˆ1 X1i 2 ˆ2 X1i X 2i (5) A similar equation results from (3) and (4). We can simplify this equation using the following notation. Let us define: S11 X 1i 2 nX 12 S1Y X1iYi nX1Y S12 X1i X 2i nX1 X 2 S 22 X 2i 2 nX 2 2 S2Y X 2iYi nX 2Y SYY Yi 2 nY 2 Equation (5) can then be written S1Y = ˆ1S11 ˆ 2 S12 (6) Similarly, equation (3) becomes S2Y = ˆ1S12 ˆ 2 S 22 (7) We can solve these two equations to get: ̂1 S 22 S1Y S12 S 2Y and ̂ 2 S11 S 2Y S12 S1Y Where =S11S22 – S122. We may therefore obtain ̂ from equation (4). We can calculate the RSS, ESS and TSS from these estimators in the same way as for simple regression, that is RSS (Yi ˆ ˆ1 X 1i ˆ2 X 2i ) 2 ESS= (Yˆi Y ) 2 TSS = (Yi Y ) 2 And, the coefficient of multiple determination is R2 = ESS/TSS That is, R2 is the proportion of the variation in Y explained by the regression. The variances of our estimators are given by Var( ˆ1 ) Var( ˆ2 ) 2 S11 (1 r12 2 ) and 2 S 22 (1 r12 2 ) Where r122 is the squared correlation coefficient between X1 and X2. Thus, the greater the correlation between the two explanatory variables, the greater the variance in the estimators, i.e. the harder it is to get significant results. We similarly obtain estimates for σ2 and therefore the standard errors of the parameter estimates, and thus the t-ratios.