BIOINF 2118 N08 confidence Intervals Page 1 of 2 Confidence intervals: Introduction: Normal approximations. If X 1,...,X n are i.i.d. & the variance exists (population standard deviation = ). The sample standard deviation is IMPORTANT FACT: s 2 / n is unbiassed for var(X ) = . The standard error of the mean is s / n . (But although s 2 is unbiassed for : Jensen’s inequality.) , s is not unbiassed for Then an “approximate 95% confidence interval for the mean” is: a =1 – 0.95=0.05, and where . So, where does this come from? By the central limit theorem, . (We read “ ” as “is approximately distributed as”.) Then . But we don’t know , so (for NOW) we use the “plug-in principle”—replace The event is the same as the event , which is the same statement as . (Subtract X , multiply by –1, reverse sides, & notice that Example: sample = {3, 5, 5, 7, 9, 12, 12, 15, 15, 20}. The confidence interval for the mean, and we say “ .) , s=5.44 . , is with 95% confidence” by s. BIOINF 2118 N08 confidence Intervals Page 2 of 2 But---- be careful of the interpretation! It does NOT mean that there’s a 95% probabilty that is in the interval. (That would be a Bayesian, not a frequentist, statement.) It DOES mean this: “If we repeated this experiment many times, and created intervals this way each time, then the interval would cover the true value of on average 95% of the experiments.” NOTE: A confidence interval method is a recipe. The statement is about the recipe, not the particular interval the recipe generated on this data set. NOTE: Of course, this “coverage” statement is also true if we randomly picked the interval ( -¥,¥) for 95% of the experiments and the empty set for the other 5%.) So coverage by itself does not guarantee a useful interval. Some criteria: the recipe generating the shortest length interval; the recipe generating a symmetric interval; the recipe where every interval reaches to - ∞ (or to + ∞; called “one-sided”). Normal approximation to a binomial parameter The variance of X ~ binom(n,p) is np(1- p) , which is n times the variance of each Bernoulli in the sum that makes up X. Therefore the variance of the observed proportion p̂ = (Z1 + ...+ Zn ) / n = X / n is 1/ n2 times the variance of X: var(p̂) = p(1- p) / n . But we don’t know p, so we have to “plug in” the estimate: var(p̂) = p̂(1- p̂) / n .