Lecture 2: Multiple Regression

advertisement
Lecture 2: Multiple Regression
Digression
Differentiation involving vectors and matrices
Consider the vectors a , x and matrix A defined as
 a1 
 
a   a2 
 
 a3
 x1 
 
x   x2 
 
 x3 
a11 a12
A  a21 a22

a31 a32
a13 
a23 

a33 
Then
L = a x = a1 x1 + a 2 x2 + a 3 x3 is linear in x s
Q = x'Ax = a11 x12 + a12 x1 x2 + a13 x1 x3 + a 21 x1 x2
+ a 22 x22 + x23 x2 x3 + a 31 x1 x3 + a 32 x2 x3 + a 33 x23
where Q is called a quadratic form in Xs
Note
L
= a1
 x1
Denote
 L

 x1 

L
 L

= 
 x2 
x
 L


 x3 

,
L
= a2
 x2
,
L
= a3
 x3
1
G26/27/28: Core Econometrics 2
1
Rules of differentiation involving vectors and matrices
R1)
L
= a
x
R2)
2 L
= 0
 x x'
Now
Q
=
 x1

where 0(nxn)
a11 x1 + a12 x 2 + a13 x 3
with similar expressions for
+
a11 x1 + a 21 x 2 + a 31 x 3

Q
Q
and
 x2
 x3
Hence
Q
2 Q
= A x + A' x and
= A + A'
R3)
x
 x x
R4) If A is symmetric, ie A=A, then
Q
2 Q
= 2A x and
= 2A
x
 x x'
The Multiple Regression Model in Matrix Notation
2
G26/27/28: Core Econometrics 2
2
Consider the multiple regression model with k explanatory variables
Yi =  1 X1i +  2 X2i + ... +  k Xki + ei
i = 1,2, ... , n
which can be written as
 Y1   X11
Y   X
 2    12
  
Y   X
 n   1n
X 21  X k 1    1   e1 
X 22  X k 2    2  e2 
  

    
X 2 n  X kn   k  en 
or Y = X  + e
Y = nx1 vector of observations on explained variables
X = nxk matrix of observations on the explanatory variables
e = nx1 vector of errors
 = K x 1 vector of parameters to be estimated
Assumptions
1.) e ~ IID (0, 2 I) where I is identity matrix
The errors are independently and identically distributed with mean 0
and variance 2 I
2.) The Xs are non-stochastic and hence are independent of es
3.) The Xs are linearly independent.
Hence rank (XX) = rank (X) = k which implies that (XX)-1 exists.
3
G26/27/28: Core Econometrics 2
3
Under these assumptions the best (minimum variance) linear unbiased
estimators of  is obtained by minimising the error sum of squares
n
Q =  e2i = e'e = (Y - X  )'(Y - X )
i 1
which is known as the Gauss-Markoff theorem
Derivation
Multiplying out the brackets gives
Q = Y 'Y - 2  X'Y +  ' X' X
Q
= - 2 X'Y + 2 X' X

Setting
Q
= 0

as X' X is symmetric
 - 2 X'Y + 2 X' X = 0
  = (X' X) -1 X'Y
Consider the following data relating to real investment.
The columns of X are:
constant, time, Real GDP, interest rate and inflation.
4
G26/27/28: Core Econometrics 2
4
. 
 0161
1
 0172
1
. 



. 
 0158
1
 0173

1
.



 0195
1
. 



0
.
217


1
 0199
1
. 



.  X  1
Y   0163
 0195
1
. 



0
.
231


1
0.257
1



0.259
1
 0.225
1



 0.241
1
0.204
1



1
2
3
4
5
1058
.
1088
.
1086
.
1122
.
1186
.
516
.
587
.
5.95
4.88
4.50
6
7
8
1254
.
1246
.
1.232
6.44
7.83
6.25
9
10
11
12
13
14
15
1298
.
1370
.
1439
.
1479
.
1474
.
1503
.
1475
.
550
.
5.46
7.46
10.28
1177
.
13.42
1102
.
4.40
515
. 

5.37 
4.99
4.16 

5.75 
8.82 

9.31
5.21

583
. 
7.40

8.64 
9.31

9.44
5.99 
The aim is to model real investment by estimating the equation
Y = X + e
where  minimises the sum of squared residuals ee.
5
G26/27/28: Core Econometrics 2
5
.
 15.00 120.00 19.31 11179
120.00 1240.00 164.30 1035.90

(X' X) = 19.310 164.30 25.218 148.98
11179
1035.90 148.98 95386
.
 .
 99.77 875.60 13122
.
799.02
( X' X ) -1
99.77 
875.60 

13122
. 
799.02
716.67 
2.270
66.77
01242
.
0.0711 
 67.41
 2.27
0.08624 2.257 0.0064 0.0009 


67.09
01614
.
0.0506 
=  -66.77 2.257
 01242

.

0
.
0064

01614
.
0
.
03295

0
.
01665


0.0711 0.0009 0.0506 0.01665 0.040428
 3.050 
26.004 


X ' Y   3.993 
 23521

.


20.732 
 0.5090
0.0166


 = (X’X ) -1 X’Y =  0.6704 
 0.0023


 0.0001
Since (XX)-1X is a matrix of constants the elements of  are linear function
of Y, which implies that  is a linear estimator.
6
G26/27/28: Core Econometrics 2
6
Recall that
Y = X+ e
and substitute into  = (X' X )-1 X'Y to give
 = (X' X)-1 X'(X + e ) =  + (X' X)-1 X'e
E(  )  
as E(e)  0
  is an unbiased estimator
-1
V(  ) = E[(  - ) (  - )'] = (X' X) X ' E( e e' )X(X ' X )-1
= (X' X)-1 2
Since E(e e') = I  2
To show that least squares estimates have the minimum variance consider
any other linear estimator
 * =  + CY
Then
 * =  + CX + [(X' X) -1 X' + C]e
E(  ) =  + CX
*
*
Require CX = 0 for  to be unbiased.
7
G26/27/28: Core Econometrics 2
7
V(  ) = E[(  - ) (  - )']
*
*
*
= [(X' X) -1 X' + C]E(e e')[(X' X) -1 X' + C]'
Since
E(e e') = I  2 and CX = 0
V(  ) = (X' X) -1 2 + (CC')  2
*
Hence
*
V(  )  V(  )
Example
Calculating the variance-covariance matrix corresponding to the real
investment function
V(  ) =  2 (X' X) -1
where  2 =
e 'e
n- k
e’e = (Y- X )' (Y- X ) = Y'Y -  ' X'Y
= 0.0004507
  2 =
0.0004507
= 0.00004507
15 - 5
8
G26/27/28: Core Econometrics 2
8
 0.00304

 0.000102

0.0000039



0.000102
0.00302
V (  )   0.00301

 0.0000056


0
.
0000003

0
.
000007
0
.
0000015


8
-5
0.0000032 0.00000004 0.0000022 8x10
1.8x10 
Constant
Time
Real GDP
Interest rate
Inflation
Coefficient
-0.50907
-0.01658
0.67038
-0.00232
-0.00009
Standard Error
0.0551
0.001972
0.05499
0.01219
0.001347
Hypotheses Tests and Analysis of Variance
The test for r restrictions of the multiple regression model with K
explanatory variables
Yi = 0 + 1X1i + 2X2i + ... + kXki
is given by
( RRSS  URSS )
r
F( r ,n  k 1) 
URSS
n  k 1
where
URSS = unrestricted residual sum of squares
9
G26/27/28: Core Econometrics 2
9
RRSS = restricted residual sum of squares obtained by
imposing restrictions of hypotheses
Example
Consider the restriction
H0:1 = 2 = ... = k = 0
Now
ESS Syy
2
=
=
R
Syy
- RSS
=1-
Syy
1 - R2 =
RSS
Syy
RSS
Syy
URSS = (1 -R2)Syy
RRSS = Syy
Hence
S yy  S yy (1  R 2 )
k
F( k ,n  k 1) 
S yy (1  R 2 )
n  k 1

R2
n  k 1
k
1 R 2

10
G26/27/28: Core Econometrics 2
10
Analysis of Variance for Multiple Regression Model
Source of
variation
Sum of Degrees of
squares freedom
Mean
square
Regression
R2Syy
MS1= R2Syy/k
k
F-test
=MS1/MS2
Regression
(1-R2)Syy
Total
Syy
n-k-1
MS2=[(1-R2)Syy]/(n-k-1)
n-1
Measuring the goodness of fit
The question is how best to measure the goodness of fit of a multiple
regression equation.
The problem with
R2 
ESS
TSS
is that as more explanatory variables are added to the regression equation
the R2 at worst will remain constant but will generally increase.
Consequently, consideration must be given to the number of explanatory
variables when assessing the goodness of fit of a multiple regression
equation.
a) Adjusted R 2 , R 2
11
G26/27/28: Core Econometrics 2
11
2 = 1 - n- 1 (1 - 2)
R
R
n- k- 1
= 1-
n- 1  RSS 


n- k- 1  TSS 
2
2
R adjusts R to take into account the loss of degrees of freedom when
adding more explanatory variables.
b) Standard error of the regression (SER)
SER = ˆ =
RSS
n- k- 1
RSS = Residual Sum of Squares
As the number of explanatory variables increases RSS will tend to decline
but there is a corresponding proportional decrease in the degrees of
freedom, n-k-1.
Information criteria are used as a guide in model selection, especially for
autoregressive models. In general, the information contained in a model is
the distance from the “true” model, which is measured by the log
likelihood function. The information criteria provides a measure of the
information that balances the goodness of fit and the parsimonious
specification of the model.
c) Akaike information criteria (AIC)
where
Lmax
 Lmax   k 
2
AIC  2 ln 
 n   n 


= Maximum likelihood value
12
G26/27/28: Core Econometrics 2
12
n = number of parameters
k = number of estimated parameters
d) Schwarz information criteria (SC)
 Lmax  2k ln n

SC  2 ln 
 n 
n


Dummy variables in the regression model
A dummy variable (also described as an indicator variable or binary
variable) takes the value 1 if a particular event occurs and 0 otherwise.
Consider
X1 =
1 if over 35

0 otherwise
in a consumption function
Ci = 0.73 + 0.21X1i + 0.83Ii
The inclusion of a dummy variable shifts the intercept upwards but keeps
the marginal propensity to consume the same for all ages. The size of the
increase of the intercept is the coefficient on the dummy variable.
13
G26/27/28: Core Econometrics 2
13
Ci
C = 0.73 + 0.21X + 0.83I
Ii
The introduction of dummy variables means that the X matrix has been
altered to
X=
1

1

1

1



1
individual.
0
0
1
0

1

I1

I2 

I3 

I4 



In 
where Ii denotes the income level of the ith
Hence the OLS estimate of the coefficient on the dummy variable is
obtained from the second element of the vector
14
G26/27/28: Core Econometrics 2
14
ˆ = (X’ X )-1 X’ Y
Dummy variables can be used in time series analysis to remove any strange
observations, for example strikes or stock market crashes, and to proxy
policy changes, especially changes in taxation that cannot be quantified.
Dummy variables are most frequently used in multiple regression models
to remove the seasonal pattern from the data.
{ 1 if Spring
Q1 = {
{ 0 otherwise
{ 1 if Summer
Q2 = {
{ 0 otherwise
{ 1 if Autumn
Q3 = {
{ 0 otherwise
Hence the X matrix is transformed to
X=
1

1

1

1

1



1
X11
X12
X13
X14
X15

X1 n

X 21 1 0 0 

X 22 0 1 0 

X 23 0 0 1

X 24 0 0 0 

X 25 1 0 0 

   

0
0
1
X2 n

Omission of Relevant Variables
Consider
True model
Yi = 1X1i + 2X2i + ui
15
G26/27/28: Core Econometrics 2
15
Yi = 1X1i + Vi
Estimated model
Estimate of ̂ 1 is
n
 X1i Yi
̂ 1 =
i 1
n
2
 X1i
i 1
Substituting Yi in from the true model gives
ˆ1 =
n
 X1i (  1 X1i +  2 X 2 i + Ui)
i =1
n
2
 X 1i
i 1
n
= 1 
 2  X 1i X 2i
i 1
n
n
 X 1iU i
 i 1
2
 X 1i
i 1
n
2
 X 1i
i 1
E( ˆ1) =  1 + b21  2
n
 X 1i X 2i
n
as E (  X 1iU i )  0 and b 21  i 1
n 2
i 1
 X 1i
i 1
 ̂ 1 is biased.
16
G26/27/28: Core Econometrics 2
16
Inclusion of Irrelevant Variables
Consider
True model: Y = 1X1 + U
Estimated model: Y = 1X1 + 2X2 + U
~ = S22 S1 y - S12 S2 y
1
2
S11S22 - S12
Now
~ = S11S2 y - S12 S1 y
2
2
S11S22 - S12
S1 y =  X1i Yi =  X1i (  1 X1i + Ui)
 E(S1 y) =  1 S11
Likewise E(S2y) = 1S12
 E( ~1) =  1
E( ~ 2) = 0
Variance of correct model is
2

var(ˆ1) =
S11
compared to
var(ˆ1) =
2
2 )S
(1 - r12
11
 var(~1) > var(ˆ1)
17
G26/27/28: Core Econometrics 2
17
Tests for parameter stability
Suppose there are two independent sets of data with sample sizes n1 and
n2 respectively. The equations are
First data set: Y1t = 10 + 11X1t + 12X2t + 13X3t +  + 1kXkt + 1t
Second data set: Y2t = 20 + 21X1t + 22X2t + 23X3t +  + 2kXkt + 2t
A test for stability of the parameters between the populations that
generate the two data sets is a test of the following hypothesis:
H0: 10 = 20 , 11 = 21 , 12 = 22 ,  , 1k = 2k
If this equation is true, a single equation can be estimated for the data set
obtained by pooling the two data sets.
Let
RSS1 = residual sum of squares for the first data set.
RSS2 = residual sum of squares for the second data set.
RRSS = Restricted residual sum of squares.
a)
Chow test
( RRSS  URSS )
F( r , n  k 1) 
URSS
k 1
(n1  n2  2 k  2)
where n1 and n2 are the respective sample sizes.
18
G26/27/28: Core Econometrics 2
18
b)
Predictive stability test
F( n2 ,n1 k 1)
( RRSS  RSS1 )
n2

RSS1
n1  k  1
19
G26/27/28: Core Econometrics 2
19
Download