9. Maximum Likelihood Estimation

advertisement
9. Maximum Likelihood
Estimation
The estimating equations (normal
equations) are
I. Ordinary Least Squares Estimation:
Only requires a model for the conditional
mean of the response variable.
For a linear model
@Q(b)
@b0
= ;2 (Y ; b0 ; b1X1 ; X ) = 0
@Q(b)
@bi
= ;2
n
X
j
j
and
n
X
j
r
j
rj
=1
=1
Xij (Yj ; b0 ; b1X1j ; br Xrj ) = 0
for i = 1; 2; : : : ; r
= 0 + 1X1j + + rXrj + j ;
the OLS estimator for
Yj
2
3
0
6
.
= 4 . 75
r
is any
2
3
b0
6
.
b = 4 . 75
br
that minimizes the sum of squared residuals
Q(b) =
n
X
j =1
(Yj ; 0 ; ; rXrj )2:
The matrix form of these equations is
(X T X )b = X T Y
and a solution is
b = (X T X );X T Y:
584
For a Gauss-Markov model with
E (Y) = X and
V ar(Y) = 2I
the OLS estimator of an estimable function
is the unique best linear unbiased estimator
(b.l.u.e.) of :
The OLS estimator is
C T b = C T (X T X );X T Y
for any solution to the normal equations.
E (C T b) = C T and
T
2
T
V ar(C b) = C (X T X );C:
The distribution of
specied.
Y
is not completely
586
585
II. Generalized Least Squares Estimation
Consider the Aitken model
E (Y) = X and
V ar(Y) = 2V
where V is positive denite symmetric matrix
of known constants and 2 is an unknown
variance parameter.
A GLS estimator for is any b that minimizes
Q(b) = (Y ; X b)T V ;1(Y ; X b)
(from Denition 3.8 with = 2V ).
587
The estimating equations are
(X T V ;1X )b = X T V ;1Y:
An unbiased estimator for 2 in the Aitken
model is
1
2 =
T
^GLS
n;rank(X ) Y
h
i
V ;1 ; V ;1X (X T V ;1X );X T V ;1 Y:
A solution is
bGLS = (X T V ;1X );X T V ;1Y:
For any estimable function C T the unique
b.l.u.e. is
C T bGLS = C T (X T V ;1X );X T V ;1Y
for any solution to the normal equations.
E(
)=
and
2
T
V ar(
= C (X T V ;1X );C:
CT b
CT b
CT The distribution of
specied.
Y
is not completely
In practice, V may not be known. Then
2 can be approximated by
bGLS and GLS
replacing V with a consistant estimator
for V :
{ The estimator for C T is not b.l.u.e.
{ The estimator for 2 is not unbiased.
{ Both estimators are consistent.
589
588
III. Maximum Likelihood Estimation
The model must include a complete
specication of the joint distribution
of the observations
Find the parameter values that maximize
the \likelihood" of the observed data.
For the normal-theory Gauss-Markov model,
the likelihood function is
Example:
Normal theory Gauss-Markov model:
Yj = 0 + 1X1j + + r Xrj + j
where
or
j
NID(0; 2);
L( ; 2; Y1; : : : ; Yn)
1
T
= (2)1n=2n e; 22 (Y;X ) (Y;X )
i = 1; : : : ; n
Find values of and 2 that maximize this
likelihood function.
2
3
Y1
6
.
Y = 4 . 75 N (X ; 2I )
Yn
590
591
This is equivalent to nding values of and 2 that maximize the log-likelihood.
`( ; 2; Y1; : : : ; Yn)
@`(; 2; Y)
@0
= 12
= log L(; 2; Y1; : : : ; Yn)
= ; n2 log(2) ; n2 log(2)
; 212 (Y ; X )T (Y ; X )
n
X
(Yj ; 0 ; ; rXrj ) = 0
j =1
@`(; 2; Y)
@i
= ; 2 log(2) ; 2 log(2)
n
X
; 212 (Yj ; 0 ; ; rXrj )2
j =1
%
this is minimized by an
OLS estimator for ,
regardless of the value of 2
n
Solve the likelihood equations:
n
= 12
n
X
j =1
Xij (Yj ; 0 ; ; r Xrj ) = 0
for i = 1; 2; : : : ; r
@`(; 2; Y)
@ 2
n
X
= ; 2n2 + 212 (Yj ; 0 ; ; rXrj )2
j =1
= 0
593
592
Solution:
and
^2
^ = bOLS = (
XT X
General normal-theory linear model:
Y = X + where N (0; ) and is known.
); X T Y
n
X
= n1 (Yj ; ^0 ; ; ^rXrj )2
j =1
The multivariate normal likelihood function is
= n1 YT (I ; PX )Y
= n1 SSE
%
This estimator for 2 is biased.
L( ; Y)
= (2)n=12jj1=2 e; 21 (Y;X )T ;1(Y;X )
1
n;rank
(X ) SSE is an unbiased
estimator for 2.
The log-likelihood function is
n
1
`( ; Y) = ; log(2 ) ; log(jj)
2
2
1
n1 SSE and n;rank
(X ) SSE are
asymptotically equivalent.
; 21 (Y ; X )T ;1(Y ; X )
594
595
Maximizing the log-likelihood when is known
is equivalent to nding a that minimizes
(Y ; X )T ;1(Y ; X )
The estimating equations are
(X T ;1X ) = X T ;1Y
Solutions are of the form
^ = bGLS = (X T ;1X ); X T ;1Y
For the general normal theory linear model,
when is known,
maximum likelihood estimation is the
same as generalized least squares estimation.
When contains unknown parameters:
You could maximize the log-likelihood
`(; ; Y)
= ; n2 log(2) ; 12 log(jj)
; 12 (Y ; X )T ;1(Y ; X )
with respect to both and .
There may be no algebraic formulas for
solutions to the joint likelihood equations,
say ^ and ^ .
The MLE for is usually biased.
596
Similarly, generalized least squares estimation
and maximum likelihood estimation are equivalent for in the Aitken model
Y N (X ; 2V )
when V is known.
Substitute = 2V into the previous
discussion.
Then, the log-likelihood is
`( ; 2; Y)
597
The \likelihood equations" are
(X T V ;1X ) = X T V ;1Y
and
2 =
1 (Y ; X )T V ;1(Y ; X )
n
Solutions have the form
^ = bGLS = (X T V ;1X ); X T V ;1Y
= ; n2 log(2) ; n2 log(2)
; 21 log(jV j)
; 212 (Y ; X )T V ;1(Y ; X )
598
and
^2 =
1 (Y ; X ^)T V ;1(Y ; X ^ )
n
The likelihood equations are more complicated
when V contains unknown parameters.
599
(iii) First three partial derivatives of the
log-likelihood function, with respect to
the parameters,
General Properties of MLE's:
(a) exist
(b) are bounded by a function with
a nite expectation.
Regularity Conditions 9.1
(i) The parameter space has nite dimension,
is closed and compact, and the true
parameter vector is in the interior of
the parameter space.
(ii) Probability distributions dened by any two
dierent values of the parameter vector are
distinct (an identiability condition).
(iv) The expectation of the negative of the
matrix of second partial derivatives of
the log-likelihood is
(a) nite
(b) positive denite
in a neighborhood of the true value
of the parameter vector. This is
called the Fisher information matrix.
600
Suppose Y1; : : : ; Yn are independent vectors of
observations, with
2
3
Y1j
Yj = 64 .. 75 ;
Ypj
j =1
f (Yj ; )
and the log-likelihood function is
`(; Y1; : : : ; Yn) = log (L( ; Y1; : : : ; Yn))
=
n
X
j =1
log f (Yj ; )
3
7
7
7
5
is the vector of rst partial derivatives of
the log-likelihood function with respect to
the elements of
Then, the joint likelihood function is
n
Y
The score function
2
3
2
@`( ;Y1;:::;Yn)
u1()
6
@1
6
7
6
.
..
u() = 4 . 5 = 64
@`
(
;
Y
ur ()
1;:::;Yn)
@r
and the density function (or probability
function) is
f (Yj ; )
L( ; Y1; : : : ; Yn) =
601
3
2
1
6
.
= 4 . 75 :
r
The likelihood equations are
u(; Y1; : : : ; Yn) = 0
:
602
603
Let
The maximum likelihood estimator (MLE)
2
3
^1
6
^ = 4 .. 75
^r
is a solution to the likelihood equations, that
maximizes the log-likelihood function.
Fisher information matrix:
i() = Var(u( ; Y1; : : : ; Yn))
=
= ;E
@`(; Y1; : : : ; Yn)
@r @k
denote the parameter vector
i()
denote the Fisher information matrix
^
denote the MLE for .
Then, if Regularity Conditions 9.1 are
satised, we have the following results:
Result 9.1: ^ is a consistent estimator.
E u(; Y1 : : : ; Yn)[u( ; Y1; : : : ; Yn)]T
"
#!
Pr
o
(^ ; )T (^ ; ) > ! 0
as n ! 1; for any > 0.
605
604
Result 9.2: Asymptotic normality
References:
0
pn(^ ; ) dist
;1 ;!n N 0; nlim
n
[
i
(
)]
!1
as
n
Anderson, T.W. (1984). An Introduction to
Multivariate Statistical Analysis, (2nd ed.),
Wiley, New York.
n ! 1:
Cox, C. (1984). American Statistician, 38, pp.
283{287.
With a slight abuse of notation we may express
this as
^ N ; [i()];1
Cox, D.R. and Hinkley, D.V. (1974). Theoretical Statistics, Chapman & Hall, London
(Chapters 8 and 9).
Rao, C.R. (1973). Linear Statistical Inference,
Wiley, New York (Chapter 5).
for \large" sample sizes.
606
607
Download