BIOINF 2118 N07-Estimation Page 1 of 6 Inference “Inference” means drawing conclusions from data. Two types of conclusions most common: Estimation (point estimation, interval estimation) Testing (hypothesis testing, significance testing) Principles of frequentist (classical) estimation The two main frequentist ways to estimate: Maximum Likelihood Estimation Moment Estimation Maximum Likelihood Estimation: For a model family function and an observation xobs, the likelihood is defined by . A maximum likelihood estimator satisfies . Often is unique. Example If X ~ binom(n, p) , and we observe X=xobs, then , and to maximize, we can differentiate and set the slope to zero: This is zero when BIOINF 2118 So N07-Estimation Page 2 of 6 . optimize( function(arg) { result=dbinom(x=2, size=10, prob=arg) points(arg, result) return(result) }, lower=1e-10, upper=1-1e-10, maximum=TRUE ) (When the parameter has dimension > 1, you can do maximum likelihood estimation on the vector parameter, either with algebra (set all the derivatives = 0 vector and solve) or by searching in the higher-dim space where the parameter lives. Moment estimators A moment estimator is obtained by setting an observed value to its expected value (as a function of the parameter) and solving for the parameter. Example For the binomial, . So we solve , to get pˆ = xobs / n . So for the binomial, MLE and moment estimator are the same, But that’s not always the case. . BIOINF 2118 N07-Estimation Page 3 of 6 Example: the normal distribution. Suppose we know that the mean of a normal distribution is zero, but don’t know the variance. We observe i.i.d. data (x1,...,xn ) . Goal: to estimate . First, let’s try the MLE, the maximum likelihood estimator: Maximizing the likelihood is the same as maximizing the log-likelihood. Differentiate the log-likelihood to find the maximizer: Setting this equal to zero, we get the MLE is . That’s logical!! Now, how about the moment estimator? has a “chi-square distribution on n degrees of freedom”: . (Aside: The mean of is the same as a gamma distribution G(n / 2,1/ 2) .) equals n. So the moment estimator comes from setting just like the MLE. But wait! What if we don’t know the mean either? : BIOINF 2118 N07-Estimation Page 4 of 6 and we don’t know either parameter? How about trying MLE? Start with . Maximize the likelihood over for fixed mean. Then maximizing over gives replace the unknown mean : , the sample , like before, but we by its estimate . How about Method of Moments? Things get different. chi-square distribution, but this time on n – 1 degrees of freedom: has a . You can think of it as losing as degree of freedom (or a chunk of information) because we have to use the extra information to estimate . So the moment estimator comes from setting , to get . We say that the MLE in this case is biassed. The bias of an estimator in estimating is defined as Bias = E( ) . This is overfitting; because we can tinker with a free parameter, we can be fooled into thinking the noise (variance) is less than it is, and thinking that the parameter estimates are more accurate than they are. BIOINF 2118 N07-Estimation Page 5 of 6 Notice that the bias goes to zero as n goes to ¥ . Definition: An estimator is consistent if its bias goes to zero as n ¥ . Criteria for a good estimator A good estimator should have low bias and low variance. We can combine these two criteria into one: the mean squared error. This is a very important result. MSE = var + bias2 MSE is an example of expected loss (in this case, loss = squared error). Neither variance nor bias alone can be interpreted as an expected loss. Properties of the MLE If is the MLE for q , and we reparametrize with a monotonic transformation g, so that the new parameter is , then the MLE of is . Look at the likelihood graph; changing q to means stretching and/or squeezing the horizontal axis. Where the mode is won’t change. The MLE is almost always consistent. As we’ve seen, it can be biased. BIOINF 2118 N07-Estimation Page 6 of 6 Properties of the moment estimator Recall: the moment estimator finds the value of the parameter which makes the expected value equal to the observed value: . Due to Jensen’s inequality (see below)… (1) When estimating S, because , just as with the MLEs, -- if you use the SAME statistic E(S | q = qˆ) = E(S | g(q ) = g(qˆ)) = E(S | f = fˆ) . Therefore, if qˆ is unbiassed for q , generally fˆ will NOT be unbiassed for f , : . (2) Likewise, if you choose a DIFFERENT statistic to match, T moment estimator. This is because = h(S), you get a different . Jensen’s inequality: If h has positive curvature (it smiles!), like “exp”, then . T=h(S)=exp(S) Pr(S=1)=1/2 Pr(S=3)=1/2 E(h(S)) bigger h(E(S)) smaller S=1 E(S) S=3 S For example, in the normal case with known mean = 0, although S = n-1(X12 + ...+ X n2 ) is unbiased for s 2 , T= h(S) = S is NOT unbiased for s = s 2 . Now h has negative curvature (frowns). The bias is negative: E S - s < E(S) - s = s 2 - s = 0