SINGLE MEAN HYPOTHESIS TESTING1 The major characteristic of this type of hypothesis testing is that you are testing that a mean of a sample is equal to some constant “a” (like 3 or 6 or any number; you could even test that it’s equal to e or π!). You also have to distinguish between the times that you know the population standard deviation (σ) and those that you don’t which would require you to use the sample standard deviation (s). If you know σ, you can use a Z-distribution. Otherwise you’re required to use the t-distribution which is a substitute for the Z. Okay, so let’s get down to it. Since my wife and I have two dachshunds that we will breed from time to time, I’ll use that as my backdrop. Suppose I want to conduct a hypothesis test (α=0.05) that the average litter of a dachshund is 5 puppies based on a sample of 23 that produced a sample mean of 3.4 and I know that the population standard deviation is 1.2 puppies. What I’m going to do now is reprint that last paragraph and highlight the important points to note. “Suppose I want to conduct a hypothesis test (α=0.05) that the average litter of a dachshund is 5 puppies based on a sample of 23 that produced a sample mean of 3.4 and I know that the population standard deviation is 1.2 puppies.” “Population standard deviation”: The fact that we know the population value means we will use the Zdistribution. Had we only know the sample version, this would indicate we are using the t-table. “α=0.05”: This is going to tell you what critical value to use. Since we are using the Z-table, this indicates the following2. The red area is the “rejection region” which equals the α that we are using. Since this is a two-sided test, each side must equal 0.025. This leaves 0.95 in the blue and green areas combined. Our Z-table will give us the value of the green area. This is half of the 0.95 which is 0.475. This 0.475 is the area on the table that we should be looking for in the “guts”. When we locate 0.475, we can see the corresponding Z-value is 1.96. That is what we’ll use for the critical value in a moment. 1 2 This is written for someone who has at least a “skeleton” knowledge of testing. See Appendix A for more practice on finding a Z-based critical value. The remaining numbers are used to calculate our Z-statistic. I’ll get to those in a bit when I complete the test. Since we are doing a hypothesis test, we are required to do all six steps in the procedure. These are: 1. 2. 3. 4. 5. 6. Specify the null and alternative hypothesis Determine the critical value State the decision rule Calculate the test statistic Make decision State conclusion Here we go. 1. Specify the null and alternative hypothesis Since we are testing whether or not the average litter is 5 puppies, our null hypothesis is exactly that. The alternative is the complement, so it’s the “opposite” of the null. H0: μ = 5 HA: μ ≠ 5 2. Determine the critical value We did this earlier, so it’s now a matter of stating it formally. Zcrit = ± 1.96 3. State the decision rule All we’re doing here is putting in words the picture from the previous page. If Zstat > 1.96 or < -1.96, reject H0. Else, fail to reject H0. 4. Calculate test statistic This is the equation we learned in class. 𝑥̅ − 𝜇 𝑍𝑠𝑡𝑎𝑡 = 𝜎 ⁄ 𝑛 √ All we need to do here is plug in the values we know. 𝑥̅ is our sample average or 3.4, μ is 5 as we pull it directly from the null hypothesis, σ is our known population standard deviation or 1.2, and n is simply our sample size of 23. 𝑍𝑠𝑡𝑎𝑡 = 3.4 − 5 = −6.39 1.2⁄ √23 5. Make Decision Now we can utilize our decision rule. Since -6.39 is well less than our critical value of -1.96, we are going to reject H0. Reject H0 6. State Conclusion We try to avoid using statistical terms at this stage; we want to say this so that anyone can understand our result. It would appear that the average litter of dachshund puppies is not 5. Try doing this one on your own. The answer is on the succeeding page. Based on a sample of 34 that produces a sample mean of 7.4, conduct a hypothesis test (α=0.01) that the population average is 7 assuming that σ = 2.1. H0: μ = 7 HA: μ ≠ 7 Zcrit = ±2.58 If Zstat > 2.58 or < -2.58, then reject H0. Else fail to reject H0. 7.4−7 Zstat = 2.1 ⁄ √34 = 1.11 Fail to reject H0. It appears that the average is not different than 7. How did you do? Now let’s try this one: “Suppose I want to conduct a hypothesis test (α=0.05) that the average litter of a dachshund is 5 puppies based on a sample of 23 that produced a sample mean of 3.4 and a sample standard deviation of 1.2.” Look familiar? It’s the same test I walked you through earlier with one twist. See it? Notice that there is no mention of the population standard deviation. It does indicate that we know the sample standard deviation, however that fact alone changes our method. We can only use the Zdistribution when we know σ. We can use “s” as a substitute, however it requires that we now use the t-distribution in sample sizes less than 500. Other than that, you’ll see that the test is effectively the same since I simply reproduced the same numbers. Here we go. 1. Specify the null and alternative hypothesis H0: μ = 5 HA: μ ≠ 5 2. Determine the critical value When using the t-table, you have two steps. First, note the three rows at the top. The one you want right now is “Two Tails” since this is a two-sided test. These are alphas that are listed. Next, you’ll need to know the degrees of freedom so you can locate the proper row. The degrees of freedom3 for this test is n-1. So our column here will be “0.05” as that is our alpha level . Our test has n-1=22, so the table gives us: tcrit = ±2.0739 3 See Appendix B for a brief discussion of degrees of freedom. 3. State the decision rule This needs to reflect the fact that we’re now using a t-distribution, but it essentially is the same idea. If tstat > 2.0739 or < -2.0739, reject H0. Else, fail to reject H0. 4. Calculate test statistic This is the equation we learned in class. 𝑥̅ − 𝜇 𝑡𝑠𝑡𝑎𝑡 = 𝑠 ⁄ 𝑛 √ All we need to do here is plug in the values we know. 𝑥̅ is our sample average or 3.4, μ is 5 as we pull it directly from the null hypothesis, s is our sample standard deviation or 1.2, and n is simply our sample size of 23. 𝑍𝑠𝑡𝑎𝑡 = 3.4 − 5 = −6.39 1.2⁄ √23 5. Make Decision Reject H0 6. State Conclusion It would appear that the average litter of dachshund puppies is not 5. Another one for you to try… Based on a sample of 22 that produces a sample mean of 1.2 and sample standard deviation of 3.1, conduct a hypothesis test (α=0.02) that the population average is 2. H0: μ = 2 HA: μ ≠ 2 tcrit (df =22 – 1 = 21) = ±2.5176 If Zstat > 2.5176 or < -2.5176, then reject H0. Else fail to reject H0. 1.2−2 Zstat = 3.1 ⁄ √22 = −1.21 Fail to reject H0. It appears that the average is not different than 2. FINAL NOTE: Please pay careful attention to the information you are given. The key feature is whether or not you know the population standard deviation (σ). This will determine which distribution you are using. In fact, this will be important for some time to come. APPENDIX A Finding Zcrit. Try finding the following critical values for two-sided tests based on the alpha levels. 1. 2. 3. 4. α=0.20 α=0.02 α=0.15 α=0.50 1. α=0.20 The sum of the tails is 0.20, so each tail is 0.10. This means the table value we want to find in the guts is 0.40. 0.3997 is as close as you can get, so the corresponding Z-value is 1.28. 2. α=0.02 The sum of the tails is 0.02, so each tail is 0.01. This means the table value we want to find in the guts is 0.4900. 0.4901 is as close as you can get, so the corresponding Z-value is 2.33. 3. α=0.15 The sum of the tails is 0.15, so each tail is 0.075. This means the table value we want to find in the guts is 0.425. 0.4251 is as close as you can get, so the corresponding Z-value is 1.44 4. α=0.50 The sum of the tails is 0.50, so each tail is 0.25. This means the table value we want to find in the guts is 0.25. 0.2486 is as close as you can get, so the corresponding Z-value is 0.67. I will suggest that you become familiar with the following alpha levels and their corresponding two-tail Z-values. These are very commonly used in general and extremely commonly used by me! α=0.10, Zcrit = 1.645 α=0.05, Zcrit = 1.96 α=0.01, Zcrit = 2.58 APPENDIX B Degrees of Freedom The basic idea behind degrees of freedom is that it measures the number of parameters that are allowed to vary. In the beginning this is exactly equal to the sample size. However when you estimate pieces of the equation that is not what you are testing, you lose one degree of freedom per estimated piece. Examine once again the Zstat equation. 𝑥̅ − 𝜇 𝑍𝑠𝑡𝑎𝑡 = 𝜎 ⁄ 𝑛 √ Notice that nothing in here is estimated with the exception of 𝑥̅ . This doesn’t count because that’s what we’re testing. As we know both σ and n, we don’t need to use any of our degrees of freedom to estimate them thus degrees of freedom are irrelevant in this statistic. Now, look at the tstat. 𝑥̅ − 𝜇 𝑡𝑠𝑡𝑎𝑡 = 𝑠 ⁄ 𝑛 √ We have to estimate “s” in this equation. This costs us a degree of freedom. Since it is all we are estimating, we only lose that one so our degrees of freedom is n – 1. There’s a nice way to see this. Degrees of freedom are almost always located in the equation. It’s not totally obvious here, but if you recall the equation for s, you’ll see See the denominator? There it is. ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 𝑠= 𝑛−1