BIOINF 2118 2013-02-05 Estimation Page 1 of 6 Inference “Inference” means drawing conclusions from data. Two types of conclusions most common: Estimation (point estimation, interval estimation) Testing (hypothesis testing, significance testing) Principles of frequentist (classical) estimation The two main frequentist ways to estimate: Maximum Likelihood Estimation Moment Estimation Maximum Likelihood Estimation: For a model family function and an observation xobs, the likelihood is defined by . A maximum likelihood estimator satisfies . Often is unique. Example If X ~ binom(n, p) , and we observe X=xobs, then , and to maximize, we can differentiate and set the slope to zero: This is zero when BIOINF 2118 So Estimation 2013-02-05 Page 2 of 6 . optimize( function(arg) { result=dbinom(x=2, size=10, prob=arg) points(arg, result) return(result) }, lower=1e-10, upper=1-1e-10, maximum=TRUE ) Maximum likelihood estimation can also be performed on a vector parameter, either analytically (set derivative = 0 (vector) and solve) or by searching. Moment estimators A moment estimator is obtained by setting an observed value to its expected value (as a function of the parameter) and solving for the parameter. Example For the binomial, . So solve to get pˆ = xobs / n . For the binomial, MLE and moment estimator are the same, But that’s not always the case. . BIOINF 2118 Estimation 2013-02-05 Page 3 of 6 Example: the normal distribution. Suppose we know that the mean of a normal distribution is zero, but don’t know the variance. We observe i.i.d. data (x1,...,xn ) . Goal: to estimate . First, let’s try the MLE, the maximum likelihood estimator: Maximizing the likelihood is the same as maximizing the log-likelihood. Differentiate the log-likelihood to find the maximizer: Setting this equal to zero, we get . That’s logical!! Now, how about the moment estimator? has a “chi-square distribution on n degrees of freedom”, (That’s the same as a gamma distribution G(n / 2,1/ 2) .) The mean of equals n. So the moment estimator comes from setting : just like before. But wait! What if we don’t know the mean of the normal distribution? and we don’t know either parameter? . BIOINF 2118 Estimation 2013-02-05 Page 4 of 6 How about trying MLE? Start with . Maximize the likelihood over for fixed mean. Then maximizing over gives : , the sample , like before, bu replace the unknown mean by its estimate . How about Method of Moments? Things get different. chi-square distribution, but this time on n – 1 degrees of freedom: has a . You can think of it as losing as degree of freedom (or a chunk of information) because we have to use the extra information to estimate . So the moment estimator comes from setting , to get . We say that the MLE in this case is biassed. The bias of an estimator in estimating is defined as Bias = E( ) . This is overfitting; because we can tinker with a free parameter, we can be fooled into thinking the noise (variance) is less than it is, and thinking that the parameter estimates are more accurate than they are. Notice that the bias goes to zero as n goes to ¥ . Definition: An estimator is consistent if its bias goes to zero as n ¥ . BIOINF 2118 Estimation 2013-02-05 Page 5 of 6 Criteria for a good estimator A good estimator should have low bias and low variance. We can combine these two criteria into one: the mean squared error. This is a very important result. MSE = var + bias2 MSE is an example of expected loss (in this case, loss = squared error). Neither variance nor bias alone can be interpreted as an expected loss. Properties of the MLE If is the MLE for q , and we reparametrize with a monotonic transformation g, so that the new parameter is , then the MLE of is . Look at the likelihood graph; changing q to means stretching and/or squeezing the horizontal axis. Where the mode is won’t change. The MLE is almost always consistent. As we’ve seen, it can be biased. BIOINF 2118 Estimation 2013-02-05 Page 6 of 6 Properties of the moment estimator Recall: the moment estimator finds the value of the parameter which makes the expected value equal to the observed value: When estimating , just as with the MLEs, -- if you use the SAME statistic S, because E(S | q = qˆ) = E(S | g(q ) = g(qˆ)) = E(S | f = fˆ) . If you choose a DIFFERENT statistic to match, T = h(S), you get a different moment estimator. This is because . Jensen’s inequality: If h has positive curvature (it smiles!), like “exp”, then T=h(S)=exp(S) Pr(S=1)=1/2 Pr(S=3)=1/2 E(h(S)) bigger h(E(S)) smaller S=1 E(S) S=3 S For example, although s2 is unbiased for s 2 , T= h(S) = s 2 = s is NOT unbiased for s = s 2 . Now h has negative curvature. So E(s) = E (s 2 ) < E(s 2 ) = s 2 = s . .