A note on estimating confidence intervals from small data samples. Given a small sample of n normally-distributed random numbers, with expected value µ and standard deviation σ, how can we use the data to estimate the expected value and more importantly give a reliable confidence interval on our result? When n is small, the Student distribution is needed. The best estimate of the expected value is the sample mean, defined in the usual way: n X̄ = 1X Xk , n k=1 so that E[X̄] = E[X] = µ. Now define another random number, the sample variance which gives an estimate of the variance of the sample mean derived from the data. The random number SX defined from n 2 SX = 2 1 X Xk − X̄ , n−1 k=1 is an estimate of the variance of the underlying distribution, X, ie. E[S 2 ] = σ 2 . Note the factor n − 1 appears rescaling the sum; this is the number of degrees of freedom that remain after the data has been used to estimate the expected value of the underlying distribution. An estimate of the variance of the sample 2 mean is just given by S 2 = SX /n. Defining n S2 = X 2 1 Xk − X̄ , n(n − 1) k=1 2 2 2 then gives E[S ] = E[X̄ ] − µ . The Student distribution gives a recipe for defining confidence intervals, based on the sample mean and sample variance. The result is P |X̄ − µ| > γ(c, n) × S = c where c is the desired size of the confidence interval (95% say) and γ(c, n) is the scaling factor computed from the Student distribution. Our homework example asks: Student’s t-distribution with four degrees of freedom says the 95% confidence interval of a normally distributed random number with unknown variance σ has width w = 2.78s where s2 is the sample variance. Use this to find a 95% confidence interval for the mean µ determined from the sample Y = 13.2, 14.5, 14.8, 15.6, 16.0 Using the definitions above, we find Ȳ = 14.82 and s = 0.486 (remembering the factor of 1/(n − 1) since we have four degrees of freedom left when we estimate the expected value from the sample mean). This gives the width of the 95% confidence interval to be w = 2.78s = 1.35 so the interval is [13.5, 16.2].