Slide 1 Hypothesis Testing Part II – Computations Slide 2 This video is designed to accompany pages 95-116 of the workbook “Making Sense of Uncertainty: Activities for Teaching Statistical Reasoning,” a publication of the Van-Griner Publishing Company Slide 3 There is no end to the number of different hypotheses that can be tested using the inferential tools of statistical science. Some of these hypotheses are quite complex, as are the tools to test them. If you are going to be a practicing professional statistician then you are going to need to know all of this. This video is not designed to provide that kind of professional training. We are more interested in the reasoning that is common to almost all of these applications, even though the mechanics of implementation may vary widely. Our goal in this video is to learn how to test one very simple hypothesis. And by “testing” we mean actually complete the mathematical steps required to produce an estimated false positive rate and interpret. Please keep in mind that this is just one simple hypothesis, really just one form of one simple hypothesis. Still, a command of the mechanics of even this simple task is sure to help with our larger understanding of hypothesis testing. Slide 4 When we discuss hypotheses informally we typically use words. For example, if we are testing the usefulness of the failed depression drug Flibanserin as a libido aid in women, we may be comparing a null of “Flibanserin is no better than a Placebo” versus an alternative of “Flibanserin is better than a Placebo.” However, if we actually want to do the mathematics required to test this hypothesis, we have to structure the null and alternative more formally. In the Flibanserin case, for instance, we may restate the null as shown, where the Greek letter “mu” refers to a true, unknown mean. So the null may say that the true unknown number of sexually satisfying events for women who use Flibanserin is the same as the true unknown number of sexually satisfying events for women in the placebo group. And the alternative would then say something about how those means are not the same, perhaps that the mean response in the Flibanserin group is greater than the mean response in the placebo group. Keep in mind that the means being referred to are parameters. They describe much larger groups of subjects than those few that were recruited to participate in the experiment. But these overall averages are what we really want to know something about. Slide 5 Essentially all of the examples in your workbook have pertained to a proportion, not a mean. There are lots of reasons for this, but the primary one is that we can describe the procedure for testing a hypothesis about a population proportion more easily than testing a hypothesis about a population mean. Even when we focus on a proportion we have choices to make about what to present and what not to present. There are several different types of hypotheses involving proportions, but remember, we are only interested in understanding the mechanics as a way of further cementing the logic behind and practical worth of general tests of hypotheses. Therefore, we are only going to test the hypothesis shown. The null will be a statement about some hypothesized value, p-zero, of a population proportion p, and the alternative will claim that p is really not p-zero but larger than p-zero. For some purely technical reasons, we could have a less than or equal to sign in the null instead of an equal sign, and the steps we are about to describe would be the same. Your instructor may or may not choose to elaborate on this point. Slide 6 Before we go further, let’s pause to develop an example. You probably won’t be surprised to learn that stress affects the quality of your life. However, you may be surprised to learn that studies have suggested that stress affects college students’ sleep more than alcohol, caffeine, or late-night electronics use, at least according to a 2009 study. The study involved 1,125 college students and for 68% of those in the sample, stress about school and life was their number one reason for not being able to sleep at night. This study is available on line in the Journal of Adolescent Health. Slide 7 The first thing we have to realize is that the 68% seen in the sleep study is a statistic. That is, 68% of the 1,125 students (765 of them) listed stress as their primary reason for not being able to sleep. The business of formal statistical inference is to make defensible statements about a larger population by using the results from the study. Conclusions about the sample alone are not that useful. So a statistical challenge in this context might be to address whether it is safe to say that more than 65% of all college students feel this way, based on the 68% we saw in this one study. We just made up the 65% for purposes of this illustration. In practice, this number emerges from the context. It may be the threshold beyond which public policy can be enacted, or beyond which corporate funds can be committed, or beyond which a candidate wins the majority of the popular vote. In this case, we just made the number up. So what hypothesis are we testing? If we want to know if it is safe to say that more than 65% of the students feel that stress is their primary reason for not being able to sleep, then that is our alternative hypothesis, our HA. Notice, p is the true proportion of all college students who feel that way. The null, is just the opposite, that the true proportion of college students who feel like stress is their primary impediment to sleep is less than or equal to 0.65. Remember that to “test” this hypothesis means we have to compute an FPR value associated with this dichotomous decision and if the FPR value is less than 0.05 then we’ll go with HA. Else we won’t. Slide 8 Our goal is to understand how to get that FPR value. We can’t do exactly what we have done for other screening tests but the process can be broken down into two easy steps. Before we talk about step 1, however, notice that the form of the hypothesis we have just encountered is a general form shown here, where p is the true unknown population proportion that you are interested in and p-zero is the hypothesized value of p – 0.65 in the motivating example. So what is step 1? In step 1 you compute what we will always call the “standard score.” See the formula shown in the gray box. We will always denote the standard score by the letter “z.” The numerator of z is just the sample proportion, phat, minus the hypothesized value of p, p-zero. To compute the denominator of z you have to compute the square root of p-zero times one minus pzero all divided by n, the size of the sample or the set of test subjects. This is a simple computation; and both phat and p-zero will be readily available from the data and the testing context. Slide 9 Step 2 is even easier. After the standard score is computed you simply take it to the appropriate FPR table and come out with an estimated false positive rate. The table shown here is also available in your workbook. Notice the example at the top. If you had computed z to be 1.73 then you would look up 1.73 but finding the 1.7 along the left-most column, and the .03 along the top row. Find where the two intersect – 0.04182, bolded in the table for illustration – and you will have the estimated FPR corresponding to that standard score. Slide 10 Let’s go back to our concrete example about student sleep issues and stress. We had agreed to take pzero as 0.65 for illustration, though we admitted that it was just made up. To compute the standard score you simply take 0.68 (phat) and subtract off 0.65 (p-zero); and then divide that difference by the square root of 0.65 times one minus 0.65, all divided by n, which is 1125 in this case. Make sure you can do this computation and get what you see on the screen. Students sometimes struggle with their calculators at this point. For our example, we have a z of 2.11. Now we need to go to step 2 and get the estimated FPR. Slide 11 Take 2.11 to the FPR table and you will find – see the red underline – an estimated FPR of 0.1743. Make sure you understand how to get to that number in your FPR chart. Slide 12 Understanding what this means is essential. Remember, our choice was between a null that said the true value of p was less than or equal to 0.65 (“negative” outcome) and an alternative that claimed p was bigger than 0.65 (“positive” outcome). The Awkward Rule, recall, says that you should reject the null in favor of the alternative, no matter what the data say. The job of the FPR – false positive rate – is to decide whether that is a good idea or not. In this case, the chance of rejecting a true H0 (FALSE positive) is around 0.017, or about 1.7 chances in 100. Since 1.7 chances in 100 is less risky than 5 chances in 100, it is typically assumed to be a pretty safe bet to say that more than 65% of all college students lose sleep because of stress. Recall, when this happens, when the estimated false positive rate (p-value) is smaller than 0.05, we can say the results are “statistically significant.” Of course there is some risk involved in this decision; notably that HA is really not true. But the FPR allows us to get a numerical handle on that risk and make a rational decision. In this example that risk is no greater than 1.7 chances in 100, pretty small as risks go. Slide 13 It is very fair to ask “what would happen to a test like this if the sample proportion, phat, were smaller than the hypothesized value of p, p-zero?” Look back at the formula for z. It is clear that if phat is smaller than p-zero then the value of z has to be negative. But there are no negative entries in the FPR table so what would you do? The answer is that you’d look up the positive value of z and then add 0.50 (that’s 0.50 not 0.05) to that value. So it would have to be the case that anytime phat was smaller than p-zero, the estimated false positive rate, would have to be bigger than 50 chances in 100. That’s huge. So if phat is smaller than p-zero then there is no way that HA could be accepted, no matter what phat, pzero or n happen to be. Slide 14 This concludes our video on the basic computations associated with one simple test of hypothesis. Remember, testing a simple hypothesis about a proportion is a two-step process involving the computation of a standard score, which is then taken to a table to identify the false positive rate associated with rejecting the null hypothesis.