N08-confidence intervals

advertisement
BIOINF 2118
N08 confidence Intervals
Page 1 of 2
Confidence intervals: Introduction: Normal approximations.
If X 1,...,X n are i.i.d. & the variance
exists (population standard deviation =
).
The sample standard deviation is
IMPORTANT FACT:
s 2 / n is unbiassed for var(X ) =
.
The standard error of the mean is s / n .
(But although s 2 is unbiassed for
: Jensen’s inequality.)
, s is not unbiassed for
Then an “approximate 95% confidence interval for the mean” is:
a =1 – 0.95=0.05, and
where
.
So, where does this come from?
By the central limit theorem,
.
(We read “ ” as “is approximately distributed as”.) Then
.
But we don’t know
, so (for NOW) we use the “plug-in principle”—replace
The event
is the same as the event
,
which is the same statement as
.
(Subtract
X , multiply by –1, reverse sides, & notice that
Example: sample = {3, 5, 5, 7, 9, 12, 12, 15, 15, 20}.
The confidence interval for the mean,
and we say
“
.)
, s=5.44 .
, is
with 95% confidence”
by s.
BIOINF 2118
N08 confidence Intervals
Page 2 of 2
But---- be careful of the interpretation! It does NOT mean that there’s a 95% probabilty
that is in the interval. (That would be a Bayesian, not a frequentist, statement.)
It DOES mean this:
“If we repeated this experiment many times, and created intervals this way
each time, then the interval would cover the true value of on average 95% of
the experiments.”
NOTE: A confidence interval method is a recipe. The statement is about the recipe,
not the particular interval the recipe generated on this data set.
NOTE: Of course, this “coverage” statement is also true if we randomly picked the
interval ( -¥,¥) for 95% of the experiments and the empty set for the other 5%.) So
coverage by itself does not guarantee a useful interval. Some criteria:
the recipe generating the shortest length interval;
the recipe generating a symmetric interval;
the recipe where every interval reaches to - ∞ (or to + ∞; called “one-sided”).
Normal approximation to a binomial parameter
The variance of X ~ binom(n,p) is np(1- p) , which is n times the variance of each
Bernoulli in the sum that makes up X. Therefore the variance of the observed
proportion p̂ = (Z1 + ...+ Zn ) / n = X / n is 1/ n2 times the variance of X: var(p̂) = p(1- p) / n .
But we don’t know p, so we have to “plug in” the estimate: var(p̂) = p̂(1- p̂) / n .
Download