N07-estimation

advertisement
BIOINF 2118
N07-Estimation
Page 1 of 6
Inference
“Inference” means drawing conclusions from data.
Two types of conclusions most common:
 Estimation (point estimation, interval estimation)
 Testing (hypothesis testing, significance testing)
Principles of frequentist (classical) estimation
The two main frequentist ways to estimate:
 Maximum Likelihood Estimation
 Moment Estimation
Maximum Likelihood Estimation:
For a model family
function
and an observation xobs, the likelihood
is defined by
.
A maximum likelihood estimator
satisfies
.
Often
is unique.
Example
If X ~ binom(n, p) , and we observe X=xobs, then
,
and to maximize, we can differentiate and set the slope to zero:
This is zero when
BIOINF 2118
So
N07-Estimation
Page 2 of 6
.
optimize(
function(arg) {
result=dbinom(x=2,
size=10, prob=arg)
points(arg, result)
return(result)
},
lower=1e-10,
upper=1-1e-10,
maximum=TRUE
)
(When the parameter has dimension > 1, you can do maximum likelihood estimation
on the vector parameter, either with algebra (set all the derivatives = 0 vector and
solve) or by searching in the higher-dim space where the parameter lives.
Moment estimators
A moment estimator is obtained by setting an observed value to its expected value
(as a function of the parameter) and solving for the parameter.
Example
For the binomial,
. So we solve
, to get pˆ = xobs / n .
So for the binomial, MLE and moment estimator are the same,
But that’s not always the case.
.
BIOINF 2118
N07-Estimation
Page 3 of 6
Example: the normal distribution.
Suppose we know that the mean of a normal distribution is zero, but don’t
know the variance. We observe i.i.d. data (x1,...,xn ) . Goal: to estimate .
First, let’s try the MLE, the maximum likelihood estimator:
Maximizing the likelihood is the same as maximizing the log-likelihood.
Differentiate the log-likelihood to find the maximizer:
Setting this equal to zero, we get the MLE is
. That’s logical!!
Now, how about the moment estimator?
has a “chi-square distribution on n degrees of freedom”:
.
(Aside:
The mean of
is the same as a gamma distribution G(n / 2,1/ 2) .)
equals n.
So the moment estimator comes from setting
just like the MLE.
But wait! What if we don’t know the mean either?
:
BIOINF 2118
N07-Estimation
Page 4 of 6
and we don’t know either parameter?
How about trying MLE? Start with
.
Maximize the likelihood over
for fixed
mean. Then maximizing over
gives
replace the unknown mean
:
, the sample
, like before, but we
by its estimate
.
How about Method of Moments? Things get different.
chi-square distribution, but this time on n – 1 degrees of freedom:
has a
.
You can think of it as losing as degree of freedom (or a chunk of
information) because we have to use the extra information to estimate .
So the moment estimator comes from setting
, to get
.
We say that the MLE in this case is biassed.
The bias of an estimator in estimating is defined as
Bias = E( )  .
This is overfitting; because we can tinker with a free parameter, we can
be fooled into thinking the noise (variance) is less than it is, and thinking
that the parameter estimates are more accurate than they are.
BIOINF 2118
N07-Estimation
Page 5 of 6
Notice that the bias goes to zero as n goes to ¥ .
Definition: An estimator is consistent if its bias goes to zero as n  ¥ .
Criteria for a good estimator
A good estimator should have low bias and low variance. We can
combine these two criteria into one: the mean squared error.
This is a very important result. MSE = var + bias2
MSE is an example of expected loss (in this case, loss = squared error).
Neither variance nor bias alone can be interpreted as an expected loss.
Properties of the MLE
If is the MLE for q , and we reparametrize with a monotonic
transformation g, so that the new parameter is
, then the MLE of
is
. Look at the likelihood graph; changing q to means stretching
and/or squeezing the horizontal axis. Where the mode is won’t change.
The MLE is almost always consistent. As we’ve seen, it can be biased.
BIOINF 2118
N07-Estimation
Page 6 of 6
Properties of the moment estimator
Recall: the moment estimator finds the value of the parameter which
makes the expected value equal to the observed value:
.
Due to Jensen’s inequality (see below)…
(1) When estimating
S, because
, just as with the MLEs,
-- if you use the SAME statistic
E(S | q = qˆ) = E(S | g(q ) = g(qˆ)) = E(S | f = fˆ) . Therefore, if qˆ is unbiassed for q ,
generally fˆ will NOT be unbiassed for f , :
.
(2) Likewise, if you choose a DIFFERENT statistic to match, T
moment estimator. This is because
= h(S), you get a different
.
Jensen’s inequality:
If h has positive curvature (it smiles!), like “exp”, then
.
T=h(S)=exp(S)
Pr(S=1)=1/2
Pr(S=3)=1/2
E(h(S)) bigger
h(E(S)) smaller
S=1
E(S)
S=3
S
For example, in the normal case with known mean = 0, although
S = n-1(X12 + ...+ X n2 )
is unbiased for s 2 , T= h(S) = S is NOT unbiased for
s = s 2 . Now h has negative curvature (frowns). The bias is negative:
E S - s < E(S) - s = s 2 - s = 0
Download