Math 3680 Lecture #15 Hypothesis Testing: The t Test • In all of the previous examples, we assumed that we knew the population standard deviation s. • In practice, this is an extremely rare situation! • Instead, the sample standard deviation S is used in lieu of s. • This approximation is called the bootstrap estimate. Example. The manufacturer of a new fiberglass tire claims that its average life will be at least 40,000 miles. To verify this claim, a sample of 12 tires is tested, and the lifetimes were found to be 36,100 42,000 36,800 40,200 35,800 37,200 33,800 37,000 33,000 38,500 41,000 36,000 Test the manufacture’s claim using a = 0.05. Solution. H0: The average life is at least 40,000 miles. Ha: The average life is less than 40,000 miles. We choose a = 0.05. Notice that we have to compute the sample SD this time; it’s not given. 36100 40200 33800 38500 42000 35800 37000 41000 36800 37200 33000 36000 Mean SD 37283.33 2731.91 We now can compute the test statistic X t S/ n 37283.33 40000 2731.91 / 12 36100 40200 33800 38500 42000 35800 37000 41000 36800 37200 33000 36000 Mean SD 37283.33 2731.91 3.4448. But there’s a catch: since we used S instead of s, the test statistic t does NOT follow the normal curve. Since we used S instead of s, the test statistic t does NOT follow the normal curve. Instead, there’s a theorem which says that the test statistic t follows the Student t distribution with n - 1 degrees of freedom. (There’s a slight catch with this theorem that we’ll discuss later.) For the current problem, instead of using the normal curve to compute the observed significance level, we will use the Student t distribution with 11 degrees of freedom. For the sake of completeness, here’s the pdf of the Student’s t-distribution with r degrees of freedom: r 1 2 f (t ) r r 2 t2 1 r r 1 2 If you find this intimidating, don’t worry: we will never use it. Student T Distribution with 1 degrees of freedom 0.35 0.3 0.25 0.2 0.15 0.1 0.05 -4 -2 Red: t distribution 2 4 Blue: standard normal curve Student T Distribution with 2 degrees of freedom 0.35 0.3 0.25 0.2 0.15 0.1 0.05 -4 -2 Red: t distribution 2 4 Blue: standard normal curve Student T Distribution with 3 degrees of freedom 0.35 0.3 0.25 0.2 0.15 0.1 0.05 -4 -2 Red: t distribution 2 4 Blue: standard normal curve Student T Distribution with 4 degrees of freedom 0.35 0.3 0.25 0.2 0.15 0.1 0.05 -4 -2 Red: t distribution 2 4 Blue: standard normal curve Student T Distribution with 5 degrees of freedom 0.35 0.3 0.25 0.2 0.15 0.1 0.05 -4 -2 Red: t distribution 2 4 Blue: standard normal curve Student T Distribution with 10 degrees of freedom 0.35 0.3 0.25 0.2 0.15 0.1 0.05 -4 -2 Red: t distribution 2 4 Blue: standard normal curve Student T Distribution with 20 degrees of freedom 0.35 0.3 0.25 0.2 0.15 0.1 0.05 -4 -2 Red: t distribution 2 4 Blue: standard normal curve Student T Distribution with 30 degrees of freedom 0.35 0.3 0.25 0.2 0.15 0.1 0.05 -4 -2 Red: t distribution 2 4 Blue: standard normal curve Student T Distribution with 40 degrees of freedom 0.35 0.3 0.25 0.2 0.15 0.1 0.05 -4 -2 Red: t distribution 2 4 Blue: standard normal curve Student T Distribution with 50 degrees of freedom 0.35 0.3 0.25 0.2 0.15 0.1 0.05 -4 -2 Red: t distribution 2 4 Blue: standard normal curve Student T Distribution with 100 degrees of freedom 0.35 0.3 0.25 0.2 0.15 0.1 0.05 -4 -2 Red: t distribution 2 4 Blue: standard normal curve For the t distribution with 11 df, we can compute the critical value for a = 0.05 using Table 4 (p. 510). 5% 0.3 0.2 0.1 -3.4448 -1.79588 In Excel, be careful: the command is TINV(0.1,11). (The default is for two tails, not one tail.) Observed Significance Level 0.002724 0.3 0.2 0.1 -3.4448 -1.79588 In Excel, the command is TDIST(3.448,11,1). (The third entry specifies the number of tails.) Conclusion: We reject the null hypothesis. There is good reason to believe that the average lifespan of the tires is less than 40,000 miles. Note: It is possible to compute power with the Student’s t-distribution, but the computations are much, much more complicated than the normal case (Larsen & Marx, 3rd ed., p. 447). Many statistical software packages are able to compute power for the t-test automatically. Excel: Use the command =TTEST(A1:A12, B1:B12,1, 1) • This is silly, I know, but you have to list the claimed average once for each entry in the list. • The blue 1 stands for a one-tailed test; the second 1 is required. 36100 40000 40200 40000 33800 40000 38500 40000 42000 40000 35800 40000 37000 40000 41000 40000 36800 40000 37200 40000 33000 40000 36000 40000 0.00273917 Remember: If the sample is small (n < 30) and the population variance s is unknown, then we use the t-test and not the z-test. On the other hand, if either s is known or the sample is sufficiently large (n > 30), then we may safely use the z-test instead. Also, we must be careful about stating the null and alternative hypotheses so that we correctly choose whether to use a left-tail, a right-tail, or both tails. Example. Before a substance can be deemed safe for landfilling, its chemical properties must be assessed. In a sample of six replicates of sludge from a New Hampshire wastewater treatment plant, the mean pH was 6.68 with a standard deviation of 0.20. Can we conclude than the mean pH is less than 7.0? J. Benoit, T. Eighmy and B. Crannell, Journal of Geotechnical and Geoenvironmental Engineering 1999, pp. 877--888. Example. Certain rectangles appear more pleasing to the eye than others. The ancient Greeks called a rectangle with 5 1 (length) width 2 the golden rectangle, and this ratio was called the golden ratio. The golden ratio has been claimed to be a deliberate design of various art and architecture. The data below shows the width-to-length ratios of beaded rectangles used by the Shoshone Native Americans to decorate their leather goods. Does it appear that the golden rectangle is also an aesthetic standard for the Shoshones? 0.693 0.749 0.654 0.670 0.662 0.672 0.615 0.606 0.690 0.628 0.668 0.611 0.606 0.609 0.601 0.553 0.570 0.844 0.576 0.933 C. Dubois, ed., Lowie’s Selected Papers in Anthropology (UC Press, Berkeley, 1960), pp. 137--142 Robustness of the t Test The t statistic is defined by X t S/ n If X1, X2, …, Xn follow a normal distribution, then there’s a theorem that says that this t statistic follows the Student t-distribution with n - 1 degrees of freedom. But, in real life, this assumption is almost certainly not true. Models are idealized; real data are, well, real. Now what? The good news is that the underlying pdf doesn’t have to be very close to normal in order for the test statistic to be close to the Student t-distribution. The following graphs are empirical histograms of the t statistic computed from 10,000 data sets drawn from a “triangular” distribution with pdf 0.5 0 x 1, x / 2, f ( x) (2 x) / 2, 1 x 2, 0.4 0.3 0.2 0.1 0.5 1 1.5 which is not too far off from bell shaped. Even for very small samples, the t distribution is accurate. 2 t Statistic from a Triangular with a sample of size 4 distribution 0.4 0.3 0.2 0.1 -5 -4 -3 -2 -1 0 1 2 3 4 5 The following graphs are empirical histograms of the t statistic computed from 10,000 data sets drawn from a Uniform(0,1) distribution, which is symmetric but decisively not bell-shaped. Notice that convergence does not occur as quickly. 1 0.8 0.6 0.4 0.2 0.2 0.4 0.6 0.8 1 t Statistic from a Uniform with a sample of size 2 0,1 0.4 0.3 0.2 0.1 -5 -4 -3 -2 -1 0 1 2 3 4 5 t Statistic from a Uniform with a sample of size 3 0,1 0.4 0.3 0.2 0.1 -5 -4 -3 -2 -1 0 1 2 3 4 5 t Statistic from a Uniform with a sample of size 4 0,1 0.4 0.3 0.2 0.1 -5 -4 -3 -2 -1 0 1 2 3 4 5 t Statistic from a Uniform with a sample of size 5 0,1 0.4 0.3 0.2 0.1 -5 -4 -3 -2 -1 0 1 2 3 4 5 t Statistic from a Uniform with a sample of size 6 0,1 0.4 0.3 0.2 0.1 -5 -4 -3 -2 -1 0 1 2 3 4 5 t Statistic from a Uniform with a sample of size 7 0,1 0.4 0.3 0.2 0.1 -5 -4 -3 -2 -1 0 1 2 3 4 5 t Statistic from a Uniform with a sample of size 8 0,1 0.4 0.3 0.2 0.1 -5 -4 -3 -2 -1 0 1 2 3 4 5 t Statistic from a Uniform with a sample of size 9 0,1 0.4 0.3 0.2 0.1 -5 -4 -3 -2 -1 0 1 2 3 4 5 t Statistic from a Uniform with a sample of size 10 0,1 0.4 0.3 0.2 0.1 -5 -4 -3 -2 -1 0 1 2 3 4 5 The following graphs are empirical histograms of the t statistic computed from 10,000 data sets drawn from an Exponential(1) distribution, which is neither symmetric nor bell-shaped. This time, the sample has to be of size 40 or so for the t distribution to be accurate… that’s mostly due to the Central Limit Theorem. 1 0.8 0.6 0.4 0.2 1 2 3 4 t Statistic from an Exponential with a sample of size 5 1 0.4 0.3 0.2 0.1 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 t Statistic from an Exponential with a sample of size 10 1 0.4 0.3 0.2 0.1 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 t Statistic from an Exponential with a sample of size 15 1 0.4 0.3 0.2 0.1 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 t Statistic from an Exponential with a sample of size 20 1 0.4 0.3 0.2 0.1 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 t Statistic from an Exponential with a sample of size 30 1 0.4 0.3 0.2 0.1 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 t Statistic from an Exponential with a sample of size 40 1 0.4 0.3 0.2 0.1 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 t Statistic from an Exponential with a sample of size 50 1 0.4 0.3 0.2 0.1 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 t Statistic from an Exponential with a sample of size 100 1 0.4 0.3 0.2 0.1 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 Observations • The distribution of the t statistic is relatively unaffected by the pdf of the Xi, as long as – The pdf is not too skewed, and – The sample size is not too small. • As the sample size n increases, the distribution of the t statistic gets closer to the Student t-distribution with n -1 degrees of freedom. Observations We succinctly describe this as saying that the t test is robust, meaning that it is not heavily dependent on the underlying assumption of normality. The practical importance of this robustness is that the t test can be used in real-life situations. Practical Implications • If n < 15, the data should be nearly normal. Make a histogram. If there are outliers or strong skewness, do not use the t-test. • If 15 n 40, make a histogram to check that the data is unimodal, free of outliers, and reasonably symmetric. Again, make a histogram. • If n > 40, the t-test is safe even if the data is skewed.