Notes to slides to Lecture 1. Residuals and multiple correlation

advertisement
RNy, econ4160 spring 2009
Notes to slides to Lecture 1.
Residuals and multiple correlation
0
The “residual maker” M = I X(X X)
lowing properties is worth noting:
1
X0 , plays a central role in many derivation. The fol-
M = M0 , symmetric matrix
M2 = M, idempotent matrix
MX = 0, regression of X on X gives perfect …t
0
P = X(X X)
1
X0 is the prediction or projection matrix. It is also symmetric and idempo-
tent.
M and P are orthogonal: MP = PM = 0.
From
y=y
^ + e = Py + My
and
0
y0 y
= (y P + y0 M)(Py + My)
= y
^0 y
^ + e0 e
which we also often write as
n
X
i=1
|
e.
n
X
y)2 =
(yi
{z
i=1
}
SST
(b
yi
|
y^)2 +
{z
SSR
n
X
e2i
i=1
}
| {z }
SSE
Alternative ways of writing SSE:
0
0
e0 e = (y M0 )My =y My = y0 e = e0 y
e0 e= y
0
y
= y0 y
y0 P0 Py = y0 y
0
y0 X(X X)
1
h
0
y0 X(X X)
X0 y = y 0 y
1
X0
ih
0
X(X X)
0
y0 Xb = y y
1
i
X0 y
b0 X0 y.
Inference in the regression model
The Wald approach: Testing H0 based in unrestricted estimation
Study …rst carefully the general representation of linear hypotheses in terms of
H0 : R = q:(Greene Ch5.3, p 83 )
Note that the number of restrictions is J, so R is J K and q is J 1.
To develop a so called Wald test statistics of H0 we need (only) to estimate the model
without imposing H0 . (aka unrestricted estimation). We start from the discrepancy vector
Rb
q=m
1
If b is close to hypothesized value of in H0 ; then m is small. To derive a formal test we need
to know the distribution of m under the null:Since m is a linear combination of the normally
distributed b, m is also normal with;
E [m j X] = 0
and
V ar [m j X]
= V ar [Rb j X]
= E (R(b
))(R(b
= RE (b
2
=
0
) j X R0
)(b
= RVar [b j X] R
0
R(X X)
1
0
)) j X
0
R0
The so called Wald criterion is a quadratic form in the J deviations located in the m vector. It is
de…ned by
1
W = m0 fV ar [Rb j X]g m
We know that a sum of J squared independent standard normal variables (ie each N (0; 1)) is
distribution. This generalizes to W which therefore becomes
2
W
Since
2
2
(J)
(J) jj X;H0
is unknown it must be estimated, we use
1
2
s =
n
K
n
X
e2i = e0 e
i=1
Since MX = 0, we have
e = M(X + ") = M"
and
s2 =
1
n
K
The statistic
(n
n
X
e2i =
i=1
K)
s2
2
=
1
n
K
"0 M".
"0 M"
2
2
is a quadratic form in the standard normal vector "= . However this Chi-square does not have n
degrees of freedom but n K. This is due to M being symmetric idempotent which means that
the characteristic roots of M are either 0 or 1. There are n K roots that are equal to 1, and K
roots that are zero. As a result "0 M"= 2 is a sum of n K independent standard normal variables:
(n
F =
K)
s2
2
=
"0 M"
W (n K)
W 2
2 =
s
J (n K) 2
J s2
Using the de…nition of W we see that the unknown
F =
h
0
m0 R(X X)
2J
1
R0
i
1
2
2
(n
F (J; n
2
K j X)
K j X;H0 ):
disappears from the expression for F .
h
0
m (n K)
m0 s2 R(X X)
2 =
J
(n K) s 2
1
R0
i
1
m
(1)
see Greene p 85.
In the simplest case there, is only one restriction on one parameter. Let the hypothesis be
H0 : 2 = 0. This implies that R is (1 K) and q =0 so that
m = b1
2
0.
and
n
where (X0 X)
1
2
V ar[m j X] =V ar[b1 j X] =
o
n
(X0 X)
1
o
diag 2x2
0
denotes the second element along the diagonal of (X X)
diag 2x2
1
. The Wald
criterion for this case becomes
W = (b1
1
(b1
V ar[b1 j X]
0)
(b1 0)2
n
o
2 (X0 X) 1
0) =
diag 2x2
2
and the F statistic that results form substituting
by s2 is:
(b1 0)2
(b1 0)2
2
F
f(X0 X) 1 gdiag 2x2
=
=
s2
(b1 0)
n
o
1
(X0 X)
=
s2
f(X0 X) 1 gdiag 2x2
s2
2
diag 2x2
(b1 0)2
=
1
2
1
2
f(X0 X) 1 gdiag 2x2
s2
2
(n
(n
K)
F (1; n
K)
K)
As you know from before, in this special case with a single parameter restriction:
F = t2
where
(b1 0)
n
o
1
s2 (X0 X)
t= r
=
(b1 0)
Est. V ar[b1 ]
diag 2x2
ie. the usual t-statistic which is Student t distributed under the null hypothesis: t(n
K) j H0 .
The Lagrange multiplier approach: Test based on both restricted and unrestricted
estimation
If we impose the J restrictions on the model and then do OLS on the restricted model we obtain
restricted estimates on the remaining K J parameters, see Greene p 88 for the restricted least
squares estimator.
If the restrictions holds exactly so that m = 0, then the restricted estimates are the same
as the unrestricted (see Greene p 88) and the Lagrange multiplier of the constrained minimization
problem is zero.
If the restriction do not hold exactly, then the Lagrange multiplier is di¤erent from zero and
the inference problem is whether this is due to di¤erence in population parameters or sampling
variability.
In practice we typically compute the Lagrange multiplier (LM) test by estimation both the
restricted and the unrestricted model. The sum of squared residuals from the regressions with the
J restrictions imposed is denoted SSE . If these two sums (SSE and the SSE from unrestricted
estimation) are equal, then m = 0 (and the associated Lagrange multiplier is also zero).
Moreover, we can construct 2 statistics from SSE and SSE with degrees of freedom
n (K J) and n K respectively.
"0R MR "R
"0 M"
2
since the two statistics are
2
(n
J)) and
"0 M "
SSE
F =
SSE
J
SSE
n K
(K
=
2
2
2
(n
"0 M"
2
2
"0 M"
K)) respectively. Therefore:
n
K
J
2
3
(J)
F (J; n
K) j H0 .
(2)
Hence the Wald and LM approaches leads to test statistics that have the same F distribution
under the null hypothesis.
The two test statistics are also numerically identical, see Greene p. 90. Intuitively, this is
because also in Wald case we do in fact impose exactly the same restrictions as in the LM case—
but not on the model: they are imposed instead after estimation of the unrestricted model, in the
Wald test statistic.
4
Download