N07-estimation

advertisement
BIOINF 2118
2013-02-05
Estimation
Page 1 of 6
Inference
“Inference” means drawing conclusions from data.
Two types of conclusions most common:
 Estimation (point estimation, interval estimation)
 Testing (hypothesis testing, significance testing)
Principles of frequentist (classical) estimation
The two main frequentist ways to estimate:
 Maximum Likelihood Estimation
 Moment Estimation
Maximum Likelihood Estimation:
For a model family
function
and an observation xobs, the likelihood
is defined by
.
A maximum likelihood estimator
satisfies
.
Often
is unique.
Example
If X ~ binom(n, p) , and we observe X=xobs, then
,
and to maximize, we can differentiate and set the slope to zero:
This is zero when
BIOINF 2118
So
Estimation
2013-02-05
Page 2 of 6
.
optimize(
function(arg) {
result=dbinom(x=2,
size=10, prob=arg)
points(arg, result)
return(result)
},
lower=1e-10,
upper=1-1e-10,
maximum=TRUE
)
Maximum likelihood estimation can also be performed on a vector parameter, either
analytically (set derivative = 0 (vector) and solve) or by searching.
Moment estimators
A moment estimator is obtained by setting an observed value to its expected value
(as a function of the parameter) and solving for the parameter.
Example
For the binomial,
. So solve
to get pˆ = xobs / n .
For the binomial, MLE and moment estimator are the same,
But that’s not always the case.
.
BIOINF 2118
Estimation
2013-02-05
Page 3 of 6
Example: the normal distribution.
Suppose we know that the mean of a normal distribution is zero, but don’t
know the variance. We observe i.i.d. data (x1,...,xn ) . Goal: to estimate .
First, let’s try the MLE, the maximum likelihood estimator:
Maximizing the likelihood is the same as maximizing the log-likelihood.
Differentiate the log-likelihood to find the maximizer:
Setting this equal to zero, we get
. That’s logical!!
Now, how about the moment estimator?
has a “chi-square distribution on n degrees of freedom”,
(That’s the same as a gamma distribution G(n / 2,1/ 2) .)
The mean of
equals n.
So the moment estimator comes from setting
:
just like before.
But wait! What if we don’t know the mean of the normal distribution?
and we don’t know either parameter?
.
BIOINF 2118
Estimation
2013-02-05
Page 4 of 6
How about trying MLE? Start with
.
Maximize the likelihood over
for fixed
mean. Then maximizing over
gives
:
, the sample
, like before, bu
replace the unknown mean by its estimate .
How about Method of Moments? Things get different.
chi-square distribution, but this time on n – 1 degrees of freedom:
has a
.
You can think of it as losing as degree of freedom (or a chunk of
information) because we have to use the extra information to estimate .
So the moment estimator comes from setting
, to get
.
We say that the MLE in this case is biassed. The bias of an estimator
in estimating is defined as
Bias = E( )  .
This is overfitting; because we can tinker with a free parameter, we can
be fooled into thinking the noise (variance) is less than it is, and thinking
that the parameter estimates are more accurate than they are.
Notice that the bias goes to zero as n goes to ¥ .
Definition: An estimator is consistent if its bias goes to zero as n  ¥ .
BIOINF 2118
Estimation
2013-02-05
Page 5 of 6
Criteria for a good estimator
A good estimator should have low bias and low variance. We can
combine these two criteria into one: the mean squared error.
This is a very important result. MSE = var + bias2
MSE is an example of expected loss (in this case, loss = squared error).
Neither variance nor bias alone can be interpreted as an expected loss.
Properties of the MLE
If is the MLE for q , and we reparametrize with a monotonic
transformation g, so that the new parameter is
, then the MLE of
is
. Look at the likelihood graph; changing q to means stretching
and/or squeezing the horizontal axis. Where the mode is won’t change.
The MLE is almost always consistent. As we’ve seen, it can be biased.
BIOINF 2118
Estimation
2013-02-05
Page 6 of 6
Properties of the moment estimator
Recall: the moment estimator finds the value of the parameter which
makes the expected value equal to the observed value:
When estimating
, just as with the MLEs,
-- if you use the
SAME statistic S, because E(S | q = qˆ) = E(S | g(q ) = g(qˆ)) = E(S | f = fˆ) .
If you choose a DIFFERENT statistic to match, T = h(S), you get a
different moment estimator. This is because
.
Jensen’s inequality:
If h has positive curvature (it smiles!), like “exp”, then
T=h(S)=exp(S)
Pr(S=1)=1/2
Pr(S=3)=1/2
E(h(S)) bigger
h(E(S)) smaller
S=1
E(S)
S=3
S
For example, although s2 is unbiased for s 2 , T= h(S) = s 2 = s is NOT
unbiased for s = s 2 . Now h has negative curvature. So
E(s) = E (s 2 ) < E(s 2 ) = s 2 = s .
.
Download