Estimating µ with Small Samples: For samples of size 30 or larger we can approximate the population standard deviation σ by s, the sample standard deviation. Then we can use the central limit theorem to find bounds on the error of estimate and confidence intervals for µ. However, there are many practical and important situations where large samples are simply not available. (text p. 460) Student’s t Distribution To avoid the error involved in replacing σ by s when the sample size is small (less than 30), we are introduced to a new variable called Student’s t Variable. The t variable and its corresponding distribution, called Student’s t Distribution, were discovered in 1908 by W.S. Gossett. He was employed as a statistician by a large Irish brewing company that frowned on the publication of research by its employees, so Gossett published his research under the alias Student. Gossett was the first to recognize the importance of developing statistical methods for obtaining reliable information from small samples. You may want to call this Gossett’s Distribution, but you will never see this written. The t variable is defined by the following formula: t = _x bar - µ_ _s_ √n Where x bar is the mean of the random sample of n measurements, µ is the population mean of the x distribution, and s is the sample standard deviation. If many random samples of size n are drawn, then we get many t values from the above formula. These t values can be organized into a frequency table and a histogram can be drawn, giving us an idea of the shape of the t distribution (for a given n). Fortunately for us, this is not necessary since theory says that the shape of the t distribution depends only on n, provided the basic variable x has a normal distribution. So, when we use the t distribution, we will assume that the x distribution is normal. Degrees of Freedom Table 7 in Appendix II gives values of the variable t corresponding to what we call the number of degrees of freedom, abbreviated d.f. The formula is as follows: d.f. = n – 1 Where d.f is the degrees of freedom and n is the sample size being used. Each choice for d.f gives a different t distribution. However, for d.f larger than about 30, the t distribution and the standard normal z distribution are almost the same. The graph of a t distribution is always symmetrical about its mean, which (as for the z distribution) is 0. The main observable difference between a t distribution and the standard normal z distribution is that a t distribution has somewhat thicker tails. ***View Figure 8 – 4 (text p. 461) to observe a standard normal distribution and a Student’s t distribution. Using Table 7 to Find Critical Values for Confidence Intervals Table 7 of Appendix II gives various t values for different degrees of freedom d.f We will use this table to find critical values tc for a c confidence level. This means when we want to find tc so that an area equal to c under the t distribution for a given number of degrees of freedom falls between -tc and tc. In the language of probability, we want to find tc so that: P(-tc < t < tc) = c This probability corresponds to the area shaded in Figure 8 – 5 on text p. 462. *** View Example 3 on text p. 463 to see how to use Table 7 in Appendix II. *** View Guided Exercise 5 on text p. 463. Maximal Error of Estimate In section 8.1 we found bounds ±E on the error of estimate for a c confidence level. Using the same approach, we have E = tc_s_ √n This is the maximal error of estimate for a c confidence level with small samples. Confidence Interval for µ (Small Samples) For small samples (n < 30) taken from a normal population where σ is unknown, a c confidence interval for the population mean µ is as follows: x bar – E < µ < x bar + E Where x bar = sample mean E = tc_s_ √n s = sample standard deviation c = confidence level (0 < c < 1) tc = critical value for confidence level c, and degrees of freedom d.f = n – 1 taken from t distribution. n = sample size (n > 30) In our applications of Student’s t distribution we have made the basic assumption that x has a normal distribution. However, the same methods apply even if x is only approximately normal. In fact, the main requirement for using the Student’s t distribution is that the distribution of x values be reasonably symmetrical and mound-shaped. If this is the case, then the methods we use with the t distribution can be considered valid for most practical applications. *** View Guided Exercise 6 on text p. 465. Complete text p. 466 – 470.