Methods of Estimation We have discussed the properties of a good

advertisement
Methods of Estimation
We have discussed the properties of a
good estimator. How do you obtain a good
estimator for an unknown parameter?
(1) Method of Moments (Page 472-475)
The method of moments is a very simply
procedure of finding an estimator for one
or more population parameters.
Recall the sample mean X̄ is an unbiased
and consistent estimator for the population
mean EX  .
 - the first population moment
X̄ - the first sample moment
The estimator for , can be obtained by
letting
the first population moment   the first
sample moment X̄ , then we have
̂ m.o.m.  X̄ .
Recall:
the kth population moment
 k 
∑ i px i x ki ; or
 fxx k dx.
the kth sample moment
m k

1
n
∑ i1 x ki .
n
Method of moments is to choose the
estimators such that the population
moments can be matched by the
corresponding sample moments up to the
l th moments, where
l  the number of parameters that are
being estimated
or the number of equations that you need
to solve for all the parameters you want to
estimate. For instance, N1,  2  with a
given  but need to estimate  2 .
 1  X̄ ,
Let
 2  m 2 ,
...,
Here we have l
 l  m l .
equations and l unknowns, so that our l
unknown parameters can be estimated by
solving these equations.
Previous Exercise: Assume
i.i.d.
X i  U,   1, i  1, 2, . . . , n with n  2k.
How do we estimate  ? How do we find ̂ ?
(Hint: for uniform [ 1 ,  2 ],
 2 − 1  2
 1  2
EX i   2 , VarX i  
.)
12
The number of population parameter is 1,
so l  1.
Let EX   1  X̄ . EX    0. 5.
By the method of moment: Letting
  0. 5  X̄ , solve for  :
̂  X̄ − 0. 5 is the method of moment
estimator for .
Denote: ̂ m.o.m.  X̄ − 0. 5.
Example 5. Using a sample X i  N,  2 ,
i  1, . . . , n, find the method of moments
estimators for , and  2 .
HW: Show that
n
1
n
∑ X 2i − X̄  2
i1
n

1
n
∑X i − X̄  2 .
i1
Example 6. Using a sample
X i Gamma, , i  1, . . . , n, find the
method of moments estimators for ,
and. EX i   , VarX i    2 
nX̄ 2
̂
Answer:  m.o.m.  n−1S 2 , and
̂ m.o.m.  n−1S 2 .
nX̄
2. The Method of MLE (Page 476-481):
Suppose X 1 , X 2 , . . . , X n is randomly
selected sample from a population with
probability function Px| 1 ,  2 , . . . ,  k  (for
discrete case) or density function
fx| 1 ,  2 , . . . ,  k  (for continuous case). We
choose the estimates of these parameters
 1 ,  2 , . . . ,  k so that the likelihood function L
can be maximized.
What is a likelihood function?
A likelihood function is the joint probability
function or joint density function of the
observed sample.
Example 1 (a discrete example):
Suppose X 1 , X 2 , . . . , X n is a random sample
from a Binomial distribution with n trials,
and with success probability, p.
Let X i  1 if ith trial was a success, and
X i  0 if ith trial was a failure.
The likelihood function is:
Lp|x 1 , x 2 , . . . , x n   Px 1 , x 2 , . . . , x n |p
Independence

Px 1 |pPx 2 |p. . . Px n |p
  ni1 Px i |p.
Let Y  the number of successes, i.e.
n
Y
∑ Xi,
i1
then the likelihood function is:
Lp|x 1 , x 2 , . . . , x n   p y 1 − p n−y .
Now we try to estimate p by maximizing L
over p.
Note that:
Maximizing L  Maximizing lnL.
lnL  y ln p  n − y ln1 − p
∂lnL
y
n−y


p
1−p
∂p
∂lnL
Let ∂p  0
y
n−y

p
1−p
−1,
y1 − p  n − yp
np  y
Solve for p:
n
p̂ MLE 
y
n

1
n
∑ x i  x̄ .
i1
Example 2 (a continuous example):
i.i.d
A random sample X i  N,  2 ,
i  1, . . . , n. Find the maximum likelihood
estimators (MLEs) for both , and  2 .
First we derive the likelihood function:
L,  2 |x 1 , x 2 , . . . , x n   fx 1 , x 2 , . . . , x n |,  2 
Independence

fx 1 |,  2 fx 2 |,  2 . . . fx n |,  2 
  ni1 fx i |,  2 
  ni1

1
2 
1
2 
e
−
x i − 2
2 2
n −  ni1 x i − 2
2
e
2
.
Taking a log makes things easier for
maximization.
lnL  −n ln 2  −
∂lnL
∂
−
 ni1 2x i −−1
∂lnL
∂ 2
−
n 1
2 2
Let
2 2
−
∂lnL
∂
 0,
∂lnL
∂ 2
 0.
We have
 ni1 x i − 2
2 2
 ni1 x i −
2
−1
4
 ni1 x i −   0,
−
n 1
2 2
−
̂ MLE  x̄ ,


ln  −
2
 ni1 x i − 2
2
Solve for , and  2 :
̂ 2MLE
n
2
 ni1 x i −x̄  2
n
.
 ni1 x i − 2
2
−1
4
 0.
The steps to follow in finding MLEs:
Step 1: Write the likelihood function L
Step 2: Taking log: If L is an exponential
type of function, normally taking log makes
the maximization easier. Then you may
choose maximizing lnL instead of L. If L is
a simple function or its maximum can be
found easily directly from L, then skip Step
2.
Step 3: Take derivative with respect to
each unknown parameter that you are
estimating and let these derivative
functions  0.
Step 4: Solve for these parameters. The
solutions are the MLEs.
Another example:
i.i.d
A random sample X i  Uniform0, ,
i  1, . . . , n. Find the maximum likelihood
estimator (MLE) for .
The probability density function for X i is
1

fx i | 
, 0 ≤ x i ≤ ;
0, otherwise.
, i  1, 2, . . . , n.
First we derive the likelihood function:
L|x 1 , x 2 , . . . , x n   fx 1 , x 2 , . . . , x n |
Independence

fx 1 |fx 2 |. . . fx n |
  ni1 fx i |
n

 1  , 0 ≤ all x i ≤  for i  1, 2, . . . , n;
0,
otherwise.
Note that L is not maximized when
L  0. So the maximizing  has to be at
least no less than all x ′i s. Namely, in order
to make
Δ
L ≠ 0,  ≥ maxx 1 , x 2 , . . . , x n   x n .
On the other hand, when  ≥ x n ,
L  1n , which is a monotonically
decreasing function of , its maximum
reach at the smallest possible   x n .
Therefore, ̂ MLE  x n . See the graph.
This example doesn’t need to take a log of
L. When the density or probability
function’s domain is associated with the
unknown parameter, pay attention to the
piecewiseness of the likelihood function.
A very good and important property that
makes MLE particularly attractive is called
its invariance property:
If  is the parameter associated with a
distribution and we are interested in
estimating some function of , say t,
rather than  itself. If ̂ MLE is the MLE for ,
for any function t, then t ̂ MLE is the MLE
for t.
From Example 2:
̂ 2MLE
 ni1 x i −x̄  2
n

, according to the
invariance property of MLE,
̂ MLE 
 ni1 x i −x̄  2
n
.
From Example 1: What is the MLE of the
variance of Y?
Y  bn, p, so VarY  np1 − p.
y
Since p̂ MLE  n  x̄ ,

VarY MLE  np̂ MLE 1 − p̂ MLE 
y
 y1 − n 
 nx̄ 1 − x̄ .
Some remarks for finding MLEs:
(1) Ask yourself: is it necessary to take log
before taking derivatives? Which is easier,
∂lnL
solve ∂  0 or ∂L
 0?
∂
(2) Is the likelihood function is a piece-wise
function (a function that is constrained over
more than one given intervals) with a
interval associated with the unknown
parameter?
(3) Remember to use MLE’s invariance
property whenever you can.
(4) Some MLEs are not easy to solve, see
the link given in our course webpage for
the MLEs of the parameters in a Gamma
distribution:
http://spartan.ac.brocku.ca/~xxu/minka-gamma
Download