Hypothesis Testing

advertisement
ECON 309
Lecture 7B: Hypothesis Testing
I. Hypotheses
A hypothesis is a claim about a parameter that you’re interested in. The simplest
hypotheses are about the parameters of a single variable, such as the mean of a
population. But there are more complicated hypotheses, as we’ll see when we get to
regression analysis; these hypotheses are about the parameters that control the
relationship between two or more variables.
Some simple hypotheses:



The average number of customers in this store per day is greater than 10.
Condoms from this production line will break less than 1% of the time.
The average number of years it takes to graduate from CSUN is 6.5.
To do a hypothesis test, you will actual have two hypotheses: the null hypothesis and the
alternative hypothesis, which are stated in such a way that they are mutually exclusive
(you can’t have both hypotheses be true). The null hypothesis is the conclusion that is
considered the default – you will accept this hypothesis if you fail to find sufficient
support for the alternative hypothesis.
This is important: it means you are placing the burden of proof on those who support the
alternative hypothesis. The null hypothesis is essentially “innocent until proven guilty” –
it can be accepted with little support from the evidence, simply because the evidence
doesn’t strongly indicate something else. For this reason, researchers will usually use the
alternative hypothesis to represent their own position – what they wish to prove – in order
to put their claim to the strongest test. But sometimes researchers put their own position
as the null, in which case they’ve made things very easy on themselves.
II. One-Tail versus Two-Tail Hypo Tests
What if the CSUN administration claims the average number of years to graduate from
CSUN is 6.5? There are two ways they could be wrong: the average could be lower, or
the average could be higher. If I wanted to test the claim, we could state the null and
alternative hypotheses like so:
H0: μ = 6.5
H1: μ ≠ 6.5
Here we’re taking the administration’s claim as the null; we are giving them the benefit
of a doubt, and will only reject the claim with sufficient evidence to the contrary.
On the other hand, what if the CSUN administration claims the average number of years
to graduate from CSUN is no more than 6.5? Then there is only one way they could be
wrong: if the average is really higher. We could state the null and alternative hypotheses
like so:
H0: μ < 6.5
H1: μ > 6.5
Again, we’re giving the administration the benefit of a doubt by putting their claim as the
null.
These two kinds of test are different. The first is called a two-tail test, because there are
two ways we could reject the null. The second is called a one-tail test, because there is
only one way we could reject the null.
We can see the difference by looking at the distribution of sample means around the
population mean. [Draw the bell curve with mean centered 6.5. Show regions to both
the left and right of 6.5, indicating two ways to reject the hypothesis that the mean really
is 6.5: because the sample mean is especially small or especially large.] [Draw the same
bell curve, but with a somewhat larger region on the right, showing a rejection of the null
because the sample mean is especially large. Have no similar region on the left.]
We could have done a different one-tail test. What if CSUN’s administration claimed the
average graduate time was no less than 6.5? Then we would say:
H0: μ > 6.5
H1: μ < 6.5
Again, this gives the benefit of a doubt to the administration.
III. Significance Levels and Type I and Type II Errors
Remember from the lecture on CI’s that we had to choose a significance level, designated
α. This was the probability that a CI generated from a sample would not include the true
mean.
Now, we’ll use the same significance level, or α, for the probability that a hypothesis test
will reject the null hypothesis even though it’s true. This probability corresponds to the
shaded area in the distributions just examined. If the true mean is 6.5, we could still (by
chance) get a sample far enough from 6.5 that we reject the null hypothesis. In the twotail test, this could happen with an especially large or small sample mean. In the one-tail
test, this could happen only with an especially large sample mean (for the claim that
graduation rates are no greater than 6.5).
The level of significance is often set at 0.10, or 10%. For the two-tail test, we need to
split this between the tails, for 0.05 or 5% each. For the one-tail test, we put all the
weight in a single tail.
But we could choose a different significance level. In general, for significance level α,
put α/2 in each tail for a two-tail test, α in the appropriate tail for a one-tail test.
We usually make the significance level relatively small. That’s why I say the null
hypothesis is the default, the claim being given the benefit of a doubt: you will are
setting a relatively small chance of rejecting it when it’s true. (Again, the trial court
analogy is apt: you want a relatively small chance of convicting an innocent man.) We
call this kind of a error a Type I error. The probability of a Type I error is equal to α.
There is another type of error you could make: accepting a null hypothesis even though
it’s false. And if you think about it, this type of error is going to be fairly common if
you’re setting a small α. If you’re giving the null hypothesis the benefit of a doubt, you’ll
often fail to reject it even though it’s wrong. (Trial court analogy: By requiring a high
standard of proof for guilt, we probably let a large number of guilty people go free.) We
call this kind of error – failing to reject the null even though it’s false – a Type II error,
and we say the probability of a Type II error is β.
There is an inverse relationship between the probabilities of Type I and Type II errors.
The higher is one, the lower is the other.
Why do we usually set such a small probability of Type I error? This is a result of the
fact that statistics has largely been developed in scientific applications. Scientists don’t
like to accept a claim that differs from the existing wisdom, or that asserts the existence
of a relationship, unless they have really strong evidence. Notice that I have continually
referred to “not rejecting the null” instead of “accepting the null.” That’s because
scientists generally wish to remain agnostic without sufficient evidence: they will simply
say we don’t know in a wide range of circumstances.
But that may not be appropriate in non-scientific contexts, including business and policy.
For instance, if you’re thinking of starting a business, you might ask is: will I make a
profit? But what you really want to know is: should I open the business or not? Now,
what level of certainty do you need about the conclusion that you will? Do you need to
be 95% certain of that? Put yourself in the position of a loan officer: would you require
95% certainty that the investment will pay off? That’s what you’d be asking for if you
set α = 0.05. You might be willing to accept a substantially higher probability of failure.
(I’ve been told that the wildcatters searching for sites on which to drill for oil will accept
an alpha as large as 0.8, or 80%, on the proposition that a site has oil. Oil wells are so
profitable that you can tolerate a very large number of failed drillings.)
This is why I said, in the CI lecture, that the significance level is not magic. There is
nothing special about 0.10 or 0.05 or 0.01. They are just convenient numbers that
scientists use, but non-scientists may pick different numbers.
IV. Performing the Test
To perform a hypothesis test, you must find a z-score based on the value of the parameter
specified in the null hypothesis.
z
x  Ho
x
Note that in forming this z-score, we are using the standard error of the mean in the
denominator. That’s because your sample mean is distributed normally with that
standard deviation, not the standard deviation of the population as a whole. We can
rewrite the above like so:
z
x  Ho
/ n
We will then compare this to a critical value of z from the standard normal table. If it’s
greater than the z-critical, we reject the null and accept the alternative hypothesis.
Otherwise, we do not reject the null, nor do we accept the alternative.
Example: Let’s do the two-tail test on CSUN’s graduation time. Let’s say we know the
standard deviation of the population is 2 years, and we sampled 49 CSUN graduates and
found a sample mean of 6.9. Then we calculate:
x  Ho
6.9  6.5
z

 1.4
/ n
2 / 49
We need a z-critical value for a significance level of 0.10. Since this is a two-tail test, we
want 0.05 in each tail, so find the value of z in Table 3 that gives you an area as close to
0.95 as possible. This turns out to be 1.64. Since 1.4 < 1.64, we do not reject the null.
The administration’s claim cannot be rejected.
Example: Now let’s do the one-tail test on CSUN’s graduation time. All the calculations
are the same, except now we want the whole 10% in the right tail. That gives us a zcritical of 1.28. Since 1.4 > 1.28, we reject the null and accept the alternative. We think
the administration has underestimated the true mean graduation time.
NOTE: The test we just did is a right-tail test, because the null hypothesis is rejected
only for a sufficiently high sample mean. But what if the null hypothesis had been that
CSUN’s average graduation time was greater than or equal to 6.5? In that case we would
have done a left-tail test. In addition to the z-value calculated above being greater than zcritical, you also need to make sure the sample mean is less than the hypothesized mean
(6.5 in this case). Alternatively, just calculate the z-value above without absolute value
signs, and then put a negative sign on your z-critical.
Why did we reject in the two-tail case and accept in the one-tail case? Because in the
two-tail case, some of the weight of α had to go in the left tail, which turned out to be
irrelevant in this case. That meant there was less weight to go in the right tail, and thus
less chance of rejecting the null as a result of a high sample mean.
V. Getting Rid of the Bogus Assumptions
We assumed above that true standard deviation was known. Just as with CI’s, this is a
weird assumption. Why would we know the true standard deviation but not the true
mean?
When we have a large sample, we can get away with substituting the sample standard
deviation for the true one and continuing to use the z-distribution. This gives us the
following z-score formula:
z
x  Ho
s/ n
But what if you don’t know the true standard deviation and the sample size is small?
Then we have to use the t-distribution. We calculate a t-score instead of a z-score:
t
x  Ho
s/ n
And then we find a t-critical value instead of a z-critical value.
Example: Same example as above, doing a one-tail test. But this time, we don’t know
the standard deviation is 2, and our sample size was only 17. Our sample standard
deviation turns out to be 1.9, and we use this to find our t-score:
x  Ho
6.9  6.5
t

 0.84
 / n 1.9 / 16
In the t-table, with df = 16 -1 = 15 and 90% confidence level, t-critical is 1.75. We do
not reject the null.
If we had wanted a one-tail test, we’d have looked in the column of the table headed by
0.1000 (ignore the 0.8000 confidence level below, because that assumes a two-tail test).
We get 1.341. Since 0.84 < 1.341, we do not reject the null.
VI. P-Values
Remember that we could have picked any value of α. Picking a large one (such as 0.10)
makes it more likely you’ll reject the null hypothesis; picking a relatively small one (such
as 0.01) makes it less likely. So for any given hypothesis test, you might ask: what is the
lowest value of α that would still cause me to reject the null hypothesis? The answer is
called the p-value.
[Show standard bell-curve for one-tail test; mark rejection region as alpha. Show a zvalue that is to the right, so it would result in rejection. Then show the region that would
correspond to the p-value: the region in the remaining tail to the right of the z-value.]
Example: In the one-tail test of CSUN graduation times, we found z = 1.4. Table 3 tells
us the area to the left of 1.4 is 0.9192, so the area to the right of it is 0.0808. This is the
p-value. So any alpha greater than 0.0808 will lead to rejection of the null; any alpha less
than .0808 will lead to non-rejection; or to put it another way, 0.0808 is the lowest alpha
that will result in rejection of the null.
But for a two-tail test, remember that the area to the right of your z-value (or to the left if
you have a negative z-value) is only one-half the alpha. So you need to double the area
to the right (left) of it.
Example: In the above example, for the two-tail test, the p-value is 2(0.0808) = 0.1616.
That’s the lowest alpha that will lead to rejection of the null in the two-tail test.
It is possible to find p-values when we’re using the t-statistics as well. But the t-table in a
book doesn’t give us enough information to find the p-value with much precision. A
statistical software program can do it for us, though.
Download