Transcript - UK College of Arts & Sciences

advertisement
Slide 1
Hypothesis Testing Part II – Computations
Slide 2
This video is designed to accompany pages 95-116 of the workbook “Making Sense of Uncertainty:
Activities for Teaching Statistical Reasoning,” a publication of the Van-Griner Publishing Company
Slide 3
There is no end to the number of different hypotheses that can be tested using the inferential tools of
statistical science. Some of these hypotheses are quite complex, as are the tools to test them. If you
are going to be a practicing professional statistician then you are going to need to know all of this.
This video is not designed to provide that kind of professional training. We are more interested in the
reasoning that is common to almost all of these applications, even though the mechanics of
implementation may vary widely.
Our goal in this video is to learn how to test one very simple hypothesis. And by “testing” we mean
actually complete the mathematical steps required to produce an estimated false positive rate and
interpret. Please keep in mind that this is just one simple hypothesis, really just one form of one simple
hypothesis. Still, a command of the mechanics of even this simple task is sure to help with our larger
understanding of hypothesis testing.
Slide 4
When we discuss hypotheses informally we typically use words. For example, if we are testing the
usefulness of the failed depression drug Flibanserin as a libido aid in women, we may be comparing a
null of “Flibanserin is no better than a Placebo” versus an alternative of “Flibanserin is better than a
Placebo.”
However, if we actually want to do the mathematics required to test this hypothesis, we have to
structure the null and alternative more formally. In the Flibanserin case, for instance, we may restate
the null as shown, where the Greek letter “mu” refers to a true, unknown mean. So the null may say
that the true unknown number of sexually satisfying events for women who use Flibanserin is the same
as the true unknown number of sexually satisfying events for women in the placebo group. And the
alternative would then say something about how those means are not the same, perhaps that the mean
response in the Flibanserin group is greater than the mean response in the placebo group.
Keep in mind that the means being referred to are parameters. They describe much larger groups of
subjects than those few that were recruited to participate in the experiment. But these overall averages
are what we really want to know something about.
Slide 5
Essentially all of the examples in your workbook have pertained to a proportion, not a mean. There are
lots of reasons for this, but the primary one is that we can describe the procedure for testing a
hypothesis about a population proportion more easily than testing a hypothesis about a population
mean.
Even when we focus on a proportion we have choices to make about what to present and what not to
present. There are several different types of hypotheses involving proportions, but remember, we are
only interested in understanding the mechanics as a way of further cementing the logic behind and
practical worth of general tests of hypotheses.
Therefore, we are only going to test the hypothesis shown. The null will be a statement about some
hypothesized value, p-zero, of a population proportion p, and the alternative will claim that p is really
not p-zero but larger than p-zero.
For some purely technical reasons, we could have a less than or equal to sign in the null instead of an
equal sign, and the steps we are about to describe would be the same. Your instructor may or may not
choose to elaborate on this point.
Slide 6
Before we go further, let’s pause to develop an example.
You probably won’t be surprised to learn that stress affects the quality of your life. However, you may
be surprised to learn that studies have suggested that stress affects college students’ sleep more than
alcohol, caffeine, or late-night electronics use, at least according to a 2009 study.
The study involved 1,125 college students and for 68% of those in the sample, stress about school and
life was their number one reason for not being able to sleep at night.
This study is available on line in the Journal of Adolescent Health.
Slide 7
The first thing we have to realize is that the 68% seen in the sleep study is a statistic. That is, 68% of the
1,125 students (765 of them) listed stress as their primary reason for not being able to sleep.
The business of formal statistical inference is to make defensible statements about a larger population
by using the results from the study. Conclusions about the sample alone are not that useful.
So a statistical challenge in this context might be to address whether it is safe to say that more than 65%
of all college students feel this way, based on the 68% we saw in this one study. We just made up the
65% for purposes of this illustration. In practice, this number emerges from the context. It may be the
threshold beyond which public policy can be enacted, or beyond which corporate funds can be
committed, or beyond which a candidate wins the majority of the popular vote. In this case, we just
made the number up.
So what hypothesis are we testing? If we want to know if it is safe to say that more than 65% of the
students feel that stress is their primary reason for not being able to sleep, then that is our alternative
hypothesis, our HA. Notice, p is the true proportion of all college students who feel that way. The null,
is just the opposite, that the true proportion of college students who feel like stress is their primary
impediment to sleep is less than or equal to 0.65.
Remember that to “test” this hypothesis means we have to compute an FPR value associated with this
dichotomous decision and if the FPR value is less than 0.05 then we’ll go with HA. Else we won’t.
Slide 8
Our goal is to understand how to get that FPR value. We can’t do exactly what we have done for other
screening tests but the process can be broken down into two easy steps.
Before we talk about step 1, however, notice that the form of the hypothesis we have just encountered
is a general form shown here, where p is the true unknown population proportion that you are
interested in and p-zero is the hypothesized value of p – 0.65 in the motivating example.
So what is step 1? In step 1 you compute what we will always call the “standard score.” See the
formula shown in the gray box. We will always denote the standard score by the letter “z.”
The numerator of z is just the sample proportion, phat,
minus the hypothesized value of p, p-zero.
To compute the denominator of z you have to compute the square root of p-zero times one minus pzero all divided by n, the size of the sample or the set of test subjects.
This is a simple computation; and both phat and p-zero will be readily available from the data and the
testing context.
Slide 9
Step 2 is even easier. After the standard score is computed you simply take it to the appropriate FPR
table and come out with an estimated false positive rate. The table shown here is also available in your
workbook. Notice the example at the top. If you had computed z to be 1.73 then you would look up
1.73 but finding the 1.7 along the left-most column, and the .03 along the top row. Find where the two
intersect – 0.04182, bolded in the table for illustration – and you will have the estimated FPR
corresponding to that standard score.
Slide 10
Let’s go back to our concrete example about student sleep issues and stress. We had agreed to take pzero as 0.65 for illustration, though we admitted that it was just made up. To compute the standard
score you simply take 0.68 (phat) and subtract off 0.65 (p-zero); and then divide that difference by the
square root of 0.65 times one minus 0.65, all divided by n, which is 1125 in this case.
Make sure you can do this computation and get what you see on the screen. Students sometimes
struggle with their calculators at this point.
For our example, we have a z of 2.11. Now we need to go to step 2 and get the estimated FPR.
Slide 11
Take 2.11 to the FPR table and you will find – see the red underline – an estimated FPR of 0.1743. Make
sure you understand how to get to that number in your FPR chart.
Slide 12
Understanding what this means is essential. Remember, our choice was between a null that said the
true value of p was less than or equal to 0.65 (“negative” outcome) and an alternative that claimed p
was bigger than 0.65 (“positive” outcome).
The Awkward Rule, recall, says that you should reject the null in favor of the alternative, no matter what
the data say. The job of the FPR – false positive rate – is to decide whether that is a good idea or not.
In this case, the chance of rejecting a true H0 (FALSE positive) is around 0.017, or about 1.7 chances in
100. Since 1.7 chances in 100 is less risky than 5 chances in 100, it is typically assumed to be a pretty
safe bet to say that more than 65% of all college students lose sleep because of stress.
Recall, when this happens, when the estimated false positive rate (p-value) is smaller than 0.05, we can
say the results are “statistically significant.”
Of course there is some risk involved in this decision; notably that HA is really not true. But the FPR
allows us to get a numerical handle on that risk and make a rational decision. In this example that risk is
no greater than 1.7 chances in 100, pretty small as risks go.
Slide 13
It is very fair to ask “what would happen to a test like this if the sample proportion, phat, were smaller
than the hypothesized value of p, p-zero?”
Look back at the formula for z. It is clear that if phat is smaller than p-zero then the value of z has to be
negative. But there are no negative entries in the FPR table so what would you do?
The answer is that you’d look up the positive value of z and then add 0.50 (that’s 0.50 not 0.05) to that
value. So it would have to be the case that anytime phat was smaller than p-zero, the estimated false
positive rate, would have to be bigger than 50 chances in 100. That’s huge.
So if phat is smaller than p-zero then there is no way that HA could be accepted, no matter what phat, pzero or n happen to be.
Slide 14
This concludes our video on the basic computations associated with one simple test of hypothesis.
Remember, testing a simple hypothesis about a proportion is a two-step process involving the
computation of a standard score, which is then taken to a table to identify the false positive rate
associated with rejecting the null hypothesis.
Download