9. Maximum Lik eliho

advertisement
 The estimating equations
9. Maximum Likelihood
(normal equations) are
Estimation
I. Ordinary Least Squares
Estimation:
2
@Q(b)
=
@bi
2
and
For a linear model
Yj
@Q(b)
=
@b0
n
X
j =1
n
X
j =1
(Yj
b1X1j b0
Xij (Yj
b0
3
= . 77777
r 5
2
66
66
66
66
4
3
b0 777
is any b = ..
br
77
77
75
(X T X )b = X T Y
that minimizes the sum of
squared residuals
n
X
Q(b) =
(Yj b
b X j br Xrj ) :
j
=1
0
1
2
1
br Xrj ) = 0
The matrix form of these
equations is
the OLS estimator for
0 777
b1X1j for i = 1; 2; : : : ; r
= 0 + 1X1j + + rXrj + j ;
2
66
66
66
66
4
r Xrj ) = 0
and a solution is
b = (X T X ) X T Y :
657
For a Gauss-Markov model with
The OLS estimator for an
estimable function C T is
E (Y) = X and V ar(Y) = 2I
the OLS estimator of an estimable
function C T is the unique best
linear unbiased estimator (b.l.u.e.)
of C T :
C T b = C T (X T X ) X T Y
for any solution to the normal
equations.
E(C T b) = C T V ar(C T b) =
C T (X T X ) X T X [(X T X )
where = V ar(Y).
658
]T C:
The distribution of Y is not
completely specied.
659
E(C T b) = C T V ar(C T b) = 2C T (X T X )
is
smaller than the variance of any
other linear unbiasd estimator
for C T :
The distribution of Y is not completely specied.
C
660
II. Generalized Least Squares
Estimation
The estimating equations are
(X T V 1X )b = X T V 1Y:
Consider the Aitken model
E (Y) = X and V ar(Y) = 2V
where V is a positive denite
symmetric matrix of known
constants and 2 is an unknown
variance parameter.
A solution is
bGLS = (X T V 1X ) X T V 1Y:
For any estimable function C T the unique b.l.u.e. is
A GLS estimator for is any b
that minimizes
Q(b) = (Y
X b)
TV
1
(Y X b)
(from Denition 3.8 with
= 2V ).
C T bGLS = C T (X T V
1
X) XT V
for any solution to the normal
equations.
661
E(C T b) = C T and
V ar(C T b) = 2C T (X T V
1
662
X ) C:
The distribution of Y is not
completely specied.
An unbiased estimator for 2 in
the Aitken model is
2
^ GLS
=
YT
"
V
= (Y
1
V
1 X (X T V
1X )
XT V
n rank(X )
X bGLS )T V 1(Y X bGLS)
n
663
1
#
1 Y
In practice, V may not be known.
2
Then bGLS and GLS
can be
approximated by replacing V
with a consistent estimator:
{ The estimator for C T is not
b.l.u.e.
{ The estimator for 2 is not
unbiased.
{ Both estimators are
consistent.
664
Y
III. Maximum Likelihood
Estimation
Find the parameter values that
maximize the \likelihood" of the
observed data.
The model must include a
specification of the joint
distribution of the
observations.
Example: Normal theory
Gauss-Markov model:
For the normal-theory
Gauss-Markov model, the
likelihood function is
where
L(; 2; Y1; : : : ; Yn)
Yj = 0 + 1X1j + + r Xrj + j
j NID(0; 2);
or
2
66
66
66
66
4
Y1
i = 1; : : : ; n
3
77
77
77
77
5
1
(Y
2 2
X )T (Y X )
Find values of and 2 that maximize this likelihood function.
Y = .. N (X; 2I )
Yn
1
= (2)n=
e
2 n
666
665
This is equivalent to nding
Solve the likelihood equations:
values of and 2 that
maximize the log-likelihood.
2
0 = @`(;@ ; Y)
0
= 1 n (Y `(; 2; Y1; : : : ; Yn)
0
@
1
A
= log L(; ; Y1; : : : ; Yn)
2
= n2 log(2) n2 log(2)
1 (Y X)T (Y X)
22
= n2 log(2) n2 log(2)
1 n (Y X ) 2
r rj
22 j =1 j 0
X
%
this is minimized by an OLS estimator
for regardless of the value of 2
667
X
2 j =1
j
0
2
0 = @`(;@ ; Y)
i
= 1 n X (Y
0=
X
2 j =1
ij
j
0
r Xrj )
r Xrj )
for i = 1; 2; : : : ; r
@`(; 2; Y)
=
@2
n
22
n
+2(12)2 j =1
(Yj 0 rXrj )2
X
668
Solution:
^ = bOLS = (X T X ) X T Y
Normal-theory Aitken model
and
1 n (Y ^ ^ X )2
^ 2 =
r rj
0
n j =1 j
X
= n1 YT (I PX )Y
= 1 SSE
n
%
This is a biased estimator for 2.
1
n rank
SSE is an unbiased
(X )
estimator for .
1
SSE are
n1 SSE and n rank
(X )
asymptotically equivalent.
2
Y = X + where N (0; 2V ) and V is
a known positive denite matrix.
The multivariate normal likelihood
function is
L(; Y) =
1
(2 2 )n=2
jV j
1=2
e
1
(Y
2 2
X )T V
669
1 (Y
X )
670
For any value of , the loglikelihood is maximized by nding a
that minimizes
The log-likelihood function is
1 log(jV j)
`(; Y) =
log(2
)
2
2
n
2
2 log( )
1 (Y X)T V 1(Y X)
22
n
(Y X)T V 1(Y X)
The estimating equations are
(X T V 1X ) = X T V 1Y
Solutions are of the form
^ = bGLS = (X T V
1
X) XT V
1
Y
When V is known the mle for is
also the generalized least squares
estimator.
671
The additional estimating equation
corresponding to 2 is
2
0 = @`(;@2 ; Y) = 2n2
+2(12)2 (Y X)T V 1(Y X)
Substituting the solution to the
other estimating equations for ,
the solution is
^ 2
= n1 (Y X bGLS)T V 1(Y X bGLS)
%
This is a biased estimator for 2 .
When V contains unknown parameters:
You could maximize the loglikelihood
`(; ; Y)
n
=
1 log(jV j)
log(2
)
2
2
n
log(2)
2
1 (Y X)T V 1(Y X)
22
with respect to , 2 and the
parameters in V .
673
672
General Properties of MLE's
There may be no algebraic
Regularity Conditions:
The MLE's for 2 and the
(i) The parameter space has nite
dimension, is closed and
compact, and the true
parameter vector is in the
interior of the parameter space.
REML estimates are often used.
(ii) Probability distributions dened
by any two dierent values of the
parameter vector are distinct (an
identiability condition).
formulas for the solutions to
the joint likelihood equations.
parameters in V are usually
biased (too small).
674
(iii) First three partial derivatives of
the log-likelihood function,
with respect to the parameters
(iv) The expectation of the negative
of the matrix of second partial
derivatives of the log-likelihood
is
(a) nite
(b) positive denite
in a neighborhood of the true
value of the parameter vector.
This matrix is called the
Fisher information matrix.
(a) exist
(b) are bounded by a
function with a
nite expectation.
675
676
Suppose Y1; : : : ; Yn are independent vectors of observations, with
2
66
66
66
66
4
3
77
77
77
77
5
Y1j
Yj = . ;
Ypj
The log-likelihood function is
and the density function (or
probability function) is
`(; Y1; : : : ; Yn)
f (Yj ; )
= log (L(; Y1; : : : ; Yn))
=
log f (Yj ; ) :
j =1
n
X
!
Then, the joint likelihood function
is
L(; Y1; : : : ; Yn) =
n
Y
j =1
f (Yj ; )
677
678
The score function
u() =
2
66
66
66
66
4
u1()
..
ur ()
3
77
77
77
77
5
=
2
66
66
66
66
66
66
4
@`( ;Y1;:::;Yn)
@1
3
77
77
77
77
77
@`( ;Y1;:::;Yn) 775
..
@r
is the vector of rst partial
derivatives of the loglikelihood function with respect
to the elements of
2
66
66
66
66
4
The maximum likelihood estimator
(MLE)
2
66
66
66
66
64
^1 7777
3
^ = . 77777
^r 5
3
is a solution to the likelihood
equations, that maximizes the
log-likelihood function.
1 777
= .. 77777 :
r 5
The likelihood equations are
u(; Y1; : : : ; Yn) = 0
679
680
Let
Fisher information matrix:
i()
i() denote the Fisher information
= Var(u(; Y1; : : : ; Yn))
matrix
= E (u(; Y1 : : : ; Yn)
[u(; Y1; : : : ; Yn)]
=
1
TA
@`(; Y1; : : : ; Yn) 777CCC
75C
E
A
@r @k
02
B
6
B
B666
B
@4
denote the parameter vector
31
681
^
denote the MLE for .
Then, if the Regularity Conditions
are satised, we have the following
results:
682
Result 9.2: Asymptotic normality
Result 9.1: ^ is a consistent estimator.
P r <:(^
8
)T (^
) > =; ! 0
pn(^
as
0
) dist!n N @0; nlim
!1 n[i()]
0
1
1A
n ! 1:
9
With a slight abuse of notation we
may express this as
as n ! 1; for any > 0.
N @; [i()]
^ 0
1
1A
for \large" sample sizes.
683
684
References:
Anderson, T.W. (1984). An Introduction to Multivariate Statistical
Analysis, (2nd ed.), Wiley, New
York.
Result 9.3: If ^ is the mle for ,
then the mle for g() is g(^) for any
function g( ).
Cox, C. (1984). American Statistician, 38, pp. 283{287.
Cox, D.R. and Hinkley, D.V.
(1974). Theoretical Statistics,
Chapman & Hall, London (Chapters 8 and 9).
Rao, C.R. (1973). Linear Statistical Inference, Wiley, New York
(Chapter 5).
685
686
Download