Estimating µ with Small Samples:

advertisement
Estimating µ with Small Samples:
For samples of size 30 or larger we can approximate the population standard
deviation σ by s, the sample standard deviation. Then we can use the central limit
theorem to find bounds on the error of estimate and confidence intervals for µ. However,
there are many practical and important situations where large samples are simply not
available. (text p. 460)
Student’s t Distribution
To avoid the error involved in replacing σ by s when the sample size is small (less
than 30), we are introduced to a new variable called Student’s t Variable. The t variable
and its corresponding distribution, called Student’s t Distribution, were discovered in
1908 by W.S. Gossett. He was employed as a statistician by a large Irish brewing
company that frowned on the publication of research by its employees, so Gossett
published his research under the alias Student. Gossett was the first to recognize the
importance of developing statistical methods for obtaining reliable information from
small samples. You may want to call this Gossett’s Distribution, but you will never see
this written.
The t variable is defined by the following formula:
t = _x bar - µ_
_s_
√n
Where x bar is the mean of the random sample of n measurements, µ is the population
mean of the x distribution, and s is the sample standard deviation.
If many random samples of size n are drawn, then we get many t values from the
above formula. These t values can be organized into a frequency table and a histogram
can be drawn, giving us an idea of the shape of the t distribution (for a given n).
Fortunately for us, this is not necessary since theory says that the shape of the t
distribution depends only on n, provided the basic variable x has a normal distribution.
So, when we use the t distribution, we will assume that the x distribution is normal.
Degrees of Freedom
Table 7 in Appendix II gives values of the variable t corresponding to what we
call the number of degrees of freedom, abbreviated d.f. The formula is as follows:
d.f. = n – 1
Where d.f is the degrees of freedom and n is the sample size being used.
Each choice for d.f gives a different t distribution. However, for d.f larger than
about 30, the t distribution and the standard normal z distribution are almost the same.
The graph of a t distribution is always symmetrical about its mean, which (as for
the z distribution) is 0. The main observable difference between a t distribution and the
standard normal z distribution is that a t distribution has somewhat thicker tails.
***View Figure 8 – 4 (text p. 461) to observe a standard normal distribution and a
Student’s t distribution.
Using Table 7 to Find Critical Values for Confidence Intervals
Table 7 of Appendix II gives various t values for different degrees of freedom d.f
We will use this table to find critical values tc for a c confidence level. This means when
we want to find tc so that an area equal to c under the t distribution for a given number of
degrees of freedom falls between -tc and tc. In the language of probability, we want to
find tc so that:
P(-tc < t < tc) = c
This probability corresponds to the area shaded in Figure 8 – 5 on text p. 462.
*** View Example 3 on text p. 463 to see how to use Table 7 in Appendix II.
*** View Guided Exercise 5 on text p. 463.
Maximal Error of Estimate
In section 8.1 we found bounds ±E on the error of estimate for a c confidence
level. Using the same approach, we have
E = tc_s_
√n
This is the maximal error of estimate for a c confidence level with small samples.
Confidence Interval for µ (Small Samples)
For small samples (n < 30) taken from a normal population where σ is unknown, a
c confidence interval for the population mean µ is as follows:
x bar – E < µ < x bar + E
Where x bar = sample mean
E = tc_s_
√n
s = sample standard deviation
c = confidence level (0 < c < 1)
tc = critical value for confidence level c, and degrees of freedom d.f = n – 1 taken from t
distribution.
n = sample size (n > 30)
In our applications of Student’s t distribution we have made the basic assumption
that x has a normal distribution. However, the same methods apply even if x is only
approximately normal. In fact, the main requirement for using the Student’s t distribution
is that the distribution of x values be reasonably symmetrical and mound-shaped. If this
is the case, then the methods we use with the t distribution can be considered valid for
most practical applications.
*** View Guided Exercise 6 on text p. 465.
Complete text p. 466 – 470.
Download