Multiple regression: OLS method

advertisement
Multiple regression: OLS method
(Mostly from Maddala)
The Ordinary Least Squares method of estimation can easily be extended
to models involving two or more explanatory variables, though the
algebra becomes progressively more complex. In fact, when dealing with
the general regression problem with a large number of variables, we use
matrix algebra, but that is beyond the scope of this course.
We illustrate the case of two explanatory variables, X1 and X2, with Y the
dependant variable. We therefore have a model
Yi = α + 1X1i + 2X2i + ui
Where ui~N(0,σ2).
We look for estimators ˆ , ˆ1 , ˆ 2 so as to minimise the sum of squared
errors,
n
S=
 (Yi  ˆ  ˆ1 X 1i  ˆ2 X 2i ) 2
i 1
Differentiating, and setting the partial differentials to zero we get
n
S
  2(Yi  ˆ  ˆ1 X 1i  ˆ 2 X 2i )( 1) =0
ˆ i 1
(1)
n
S
  2(Yi  ˆ  ˆ1 X 1i  ˆ 2 X 2i )(  X 1i ) =0
ˆ1 i 1
(2)
n
S
  2(Yi  ˆ  ˆ1 X 1i  ˆ 2 X 2i )(  X 2i ) =0
ˆ 2 i 1
(3)
These three equations are called the “normal equations”. They can be
simplified as follows: Equation (1) can be written as
n
n
n
i 1
i 1
i 1
 Yi  nˆ  ˆ1  X 1i  ˆ2  X 2i or
Y  ˆ  ˆ1 X 1  ˆ 2 X 2
(4)
Where the bar over Y, X1 and X2 indicates sample mean. Equation (3) can
be written as
n
n
n
n
i 1
i 1
i 1
i 1
 X 1iYi  ˆ  X 1i  ˆ1  X 1i 2  ˆ2  X 1i X 2i
Substituting in the value of ̂ from (4), we get
 X1iYi  nX1 (Y  ˆ1 X1  ˆ2 X 2 )  ˆ1  X1i
2
 ˆ2  X1i X 2i (5)
A similar equation results from (3) and (4). We can simplify this equation
using the following notation. Let us define:
S11   X 1i 2  nX 12
S1Y   X1iYi  nX1Y
S12   X1i X 2i  nX1 X 2
S 22   X 2i 2  nX 2 2
S2Y   X 2iYi  nX 2Y
SYY   Yi 2  nY 2
Equation (5) can then be written
S1Y = ˆ1S11  ˆ 2 S12
(6)
Similarly, equation (3) becomes
S2Y = ˆ1S12  ˆ 2 S 22
(7)
We can solve these two equations to get:
̂1 
S 22 S1Y  S12 S 2Y

and
̂ 2 
S11 S 2Y  S12 S1Y

Where =S11S22 – S122. We may therefore obtain ̂ from equation (4).
We can calculate the RSS, ESS and TSS from these estimators in the
same way as for simple regression, that is
RSS   (Yi  ˆ  ˆ1 X 1i  ˆ2 X 2i ) 2
ESS=  (Yˆi  Y ) 2
TSS =
 (Yi  Y ) 2
And, the coefficient of multiple determination is
R2 = ESS/TSS
That is, R2 is the proportion of the variation in Y explained by the
regression.
The variances of our estimators are given by
Var( ˆ1 ) 
Var( ˆ2 ) 
2
S11 (1  r12 2 )
and
2
S 22 (1  r12 2 )
Where r122 is the squared correlation coefficient between X1 and X2.
Thus, the greater the correlation between the two explanatory variables,
the greater the variance in the estimators, i.e. the harder it is to get
significant results.
We similarly obtain estimates for σ2 and therefore the standard errors of
the parameter estimates, and thus the t-ratios.
Download