A note on estimating confidence intervals from small data samples.

advertisement
A note on estimating confidence intervals
from small data samples.
Given a small sample of n normally-distributed random numbers, with expected value µ and standard deviation σ, how can we use the data to estimate
the expected value and more importantly give a reliable confidence interval on
our result? When n is small, the Student distribution is needed. The best
estimate of the expected value is the sample mean, defined in the usual way:
n
X̄ =
1X
Xk ,
n
k=1
so that E[X̄] = E[X] = µ. Now define another random number, the sample
variance which gives an estimate of the variance of the sample mean derived
from the data. The random number SX defined from
n
2
SX
=
2
1 X
Xk − X̄ ,
n−1
k=1
is an estimate of the variance of the underlying distribution, X, ie. E[S 2 ] = σ 2 .
Note the factor n − 1 appears rescaling the sum; this is the number of degrees
of freedom that remain after the data has been used to estimate the expected
value of the underlying distribution. An estimate of the variance of the sample
2
mean is just given by S 2 = SX
/n. Defining
n
S2 =
X
2
1
Xk − X̄ ,
n(n − 1)
k=1
2
2
2
then gives E[S ] = E[X̄ ] − µ . The Student distribution gives a recipe for
defining confidence intervals, based on the sample mean and sample variance.
The result is
P |X̄ − µ| > γ(c, n) × S = c
where c is the desired size of the confidence interval (95% say) and γ(c, n) is the
scaling factor computed from the Student distribution.
Our homework example asks:
Student’s t-distribution with four degrees of freedom says the 95% confidence interval of a normally distributed random number with unknown
variance σ has width w = 2.78s where s2 is the sample variance. Use this
to find a 95% confidence interval for the mean µ determined from the sample
Y = 13.2, 14.5, 14.8, 15.6, 16.0
Using the definitions above, we find Ȳ = 14.82 and s = 0.486 (remembering
the factor of 1/(n − 1) since we have four degrees of freedom left when we
estimate the expected value from the sample mean). This gives the width of the
95% confidence interval to be w = 2.78s = 1.35 so the interval is [13.5, 16.2].
Download