9. Maximum Lik eliho

advertisement
9. Maximum Likelihood
The estimating equations (normal
Estimation
equations) are
I. Ordinary Least Squares Estimation:
Only requires a model for the conditional
mean of the response variable.
n
X
@Q(b)
@b0
= 2 (Yj
@Q(b)
@bi
= 2
j =1
and
For a linear model
n
X
j =1
b0
Xij (Yj
b1X1j b0
r Xrj ) = 0
b1X1j br Xrj ) = 0
for i = 1; 2; : : : ; r
= 0 + 1X1j + + rXrj + j ;
the OLS estimator for
Yj
2
0
= ..
6
4
r
3
7
5
2
b0
is any b = ..
6
4
br
3
7
5
that minimizes the sum of squared residuals
n
X
Q(b) =
(Yj b0 b1X1j brXrj )2:
j =1
The matrix form of these equations is
(X T X )b = X T Y
and a solution is
b = (X T X ) X T Y:
701
700
The OLS estimator for an estimable function
C T is
C T b = C T (X T X ) X T Y
for any solution to the normal equations.
For a Gauss-Markov model with
E (Y ) = X and V ar(Y) = 2I
the OLS estimator of an estimable function
C T is the unique best linear unbiased estimator (b.l.u.e.) of C T :
E (C T b) = C T E (C T b) = C T V ar(C T b) =
V ar(C T b) = 2C T (X T X )
C
is smaller
than the variance of any other linear unbiasd estimator for C T :
2C T (X T X ) X T X (X T X ) C:
where = V ar(Y).
The distribution of
specied.
Y
is not completely
702
The distribution of
specied.
Y
is not completely
703
The estimating equations are
(X T V 1X )b = X T V 1Y:
II. Generalized Least Squares Estimation
Consider the Aitken model
E (Y) = X and V ar(Y) = 2V
where V is a positive denite symmetric matrix
of known constants and 2 is an unknown
variance parameter.
A solution is
bGLS
X b)T V
1(Y
1Y:
b.l.u.e. is
= C T (X T V 1 X ) X T V 1 Y
for any solution to the normal equations.
C T bGLS
E (C T b) = C T and
X b)
XT V
For any estimable function C T the unique
A GLS estimator for is any b that
minimizes
Q(b) = (Y
= (X T V 1 X )
V ar(C T b) = 2C T (X T V
(from Denition 3.8 with = 2V ).
The distribution of
specied.
Y
1X )
C:
is not completely
704
705
An unbiased estimator for 2 in the Aitken
III. Maximum Likelihood Estimation
The model must include a
specication of the joint distribution
of the observations
model is
2
^GLS
=
n rank(X ) Y
1
T V
1
V
1
X (X T V
1
X) XT V
1
In practice, V may not be known. Then
2 can be approximated by
and GLS
replacing V with a consistent estimator
for V :
{ The estimator for C T is not b.l.u.e.
{ The estimator for 2 is not unbiased.
{ Both estimators are consistent.
bGLS
706
Y:
Example:
Normal theory Gauss-Markov model:
Yj = 0 + 1X1j + + r Xrj + j
where
or
j
NID(0; 2);
2
Y=
6
4
Y1
i = 1; : : : ; n
3
.. 75 N (X ; 2I )
Yn
707
This is equivalent to nding values of and 2 that maximize the log-likelihood.
Find the parameter values that maximize
the \likelihood" of the observed data.
`( ; 2; Y1; : : : ; Yn)
= log L(; 2; Y1; : : : ; Yn)
For the normal-theory Gauss-Markov model,
the likelihood function is
= n2 log(2) n2 log(2)
1 (Y X )T (Y
22
L( ; 2; Y1; : : : ; Yn)
= (2)1n=2n e
1
22
(Y
X )
= n2 log(2) n2 log(2)
1 Xn (Y 22 j=1 j 0
X )T (Y X )
Find values of and 2 that maximize this
likelihood function.
r Xrj )2
%
this is minimized by an
OLS estimator for ,
regardless of the value of 2
708
709
Solve the likelihood equations:
Solution:
@`(; 2; Y)
@0
and
= 12
n
X
j =1
(Yj
0
^2
r Xrj ) = 0
@i
j =1
0
r Xrj ) = 0
for i = 1; 2; : : : ; r
@`(; 2; Y)
@ 2
n
= 2n2 + 212 X (Yj
j =1
= 0
r Xrj )2
710
j =1
%
PX )Y
n
This estimator for 2 is biased.
0
n
= n1 X (Yj ^0 ^rXrj )2
= n1 YT (I
= 1 SSE
@`(; 2; Y)
n
= 12 X Xij (Yj
^ = bOLS = (X T X ) X T Y
1
rank(X ) SSE is an unbiased
estimator for 2.
n
n1 SSE and
1
rank(X ) SSE are
asymptotically equivalent.
n
711
General normal-theory linear model:
Y = X + where N (0; ) and is known.
Maximizing the log-likelihood when is known
is equivalent to nding a that minimizes
(Y X )T 1(Y X )
The multivariate normal likelihood function is
L( ; Y)
= (2)n=12jj1=2 e 12 (Y
X )T 1(Y X )
The log-likelihood function is
1
n
`( ; Y) =
2 log(2) 2 log(jj)
1 (Y X )T 1(Y X )
2
The estimating equations are
( X T 1 X ) = X T 1 Y
Solutions are of the form
^ = bGLS = (X T 1X )
XT For the general normal theory linear model,
when is known,
maximum likelihood estimation is the
same as generalized least squares estimation.
713
712
When contains unknown parameters:
You could maximize the log-likelihood
`(; ; Y)
= n2 log(2) 12 log(jj)
1 (Y X )T 1(Y
2
with respect to both and .
1Y
Similarly, generalized least squares estimation
and maximum likelihood estimation are equivalent for in the Aitken model
Y N (X ; 2V )
when V is known.
Substitute = 2V into the previous
X )
There may be no algebraic formulas for
solutions to the joint likelihood equations,
say ^ and ^ .
The MLE for is usually biased.
714
discussion.
Then, the log-likelihood is
`( ; 2; Y)
= n2 log(2) n2 log(2)
1 log(jV j)
2
1
T
1
2 2 (Y X ) V ( Y
X )
715
The \likelihood equations" are
(X T V 1X ) = X T V 1Y
and
2 =
1 (Y
n
X )T V
1(Y
General Properties of MLE's
Regularity Conditions:
X )
(i) The parameter space has nite dimension,
is closed and compact, and the true
parameter vector is in the interior of
the parameter space.
Solutions have the form
^ = bGLS = (X T V 1X ) X T V 1Y
and
^2 =
1 (Y
n
X ^)T V
1(Y
(ii) Probability distributions dened by any two
dierent values of the parameter vector are
distinct (an identiability condition).
^)
X
The likelihood equations are more complicated
when V contains unknown parameters.
716
717
(iii) First three partial derivatives of the
log-likelihood function, with respect to
the parameters,
(a) exist
(b) are bounded by a function with
a nite expectation.
Suppose Y1; : : : ; Yn are independent vectors of
observations, with
2
3
Y1j
Yj = 64 . 75 ;
(iv) The expectation of the negative of the
matrix of second partial derivatives of
the log-likelihood is
(a) nite
(b) positive denite
in a neighborhood of the true value
of the parameter vector. This is
called the Fisher information matrix.
Then, the joint likelihood function is
n
Y
L( ; Y1; : : : ; Yn) =
f (Yj ; )
718
Ypj
and the density function (or probability
function) is
f ( Yj ; )
j =1
and the log-likelihood function is
`(; Y1; : : : ; Yn) = log (L( ; Y1; : : : ; Yn))
=
n
X
j =1
log f (Yj ; )
:
719
The score function
2
u1()
The maximum likelihood estimator (MLE)
2
3
@`( ;Y1;:::;Yn)
@1
3
2
7
7
.. 75 = 666
..
u() =
7
4
5
@`( ;Y1;:::;Yn)
ur ()
@r
6
4
is the vector of rst partial derivatives of
the log-likelihood function with respect to
the elements of
2
=
6
4
1
..
r
3
7
5
^1
3
^ = .. 75
^r
6
4
is a solution to the likelihood equations, that
maximizes the log-likelihood function.
Fisher information matrix:
i() = Var(u( ; Y1; : : : ; Yn))
= E u(; Y1 : : : ; Yn)[u(; Y1; : : : ; Yn)]T
"
#!
@`(; Y1; : : : ; Yn)
= E
:
The likelihood equations are
@r @k
u(; Y1; : : : ; Yn) = 0
720
721
Let
Result 9.2: Asymptotic normality
denote the parameter vector
i()
denote the Fisher information matrix
^
denote the MLE for .
as
Then, if the Regularity Conditions are
satised, we have the following results:
Result 9.1: ^ is a consistent estimator.
Pr
(^
n
pn(^ ) dist!0n N 0; lim n[i()] 1
n!1
)T (^ ) > o
n ! 1:
With a slight abuse of notation we may express
this as
^ N ; [i()] 1
!0
for \large" sample sizes.
as n ! 1; for any > 0.
722
723
References:
Anderson, T.W. (1984).
An Introduction to
Multivariate Statistical Analysis
Wiley, New York.
Result 9.3: If ^ is the mle for , then the mle
for g() is g(^) for any function g( ).
, (2nd ed.),
Cox, C. (1984). American Statistician, 38, pp.
283{287.
Cox, D.R. and Hinkley, D.V. (1974). Theoretical Statistics, Chapman & Hall, London
(Chapters 8 and 9).
Rao, C.R. (1973). Linear Statistical Inference,
Wiley, New York (Chapter 5).
724
725
Download