Introduction to Hypothesis Testing

advertisement
Mention Errors! Statistical vs Practical Significance.
Introduction to Hypothesis Testing or Statistical Inference
Pre-Lecture Items
Recent research has suggested that dogs may be helpful as a supplement to standard medical
diagnostic tests in detecting if a person has cancer. Naturally there are doubters. With this in
mind, consider the following:
An experiment was conducted to analyze a dog's ability to detect the correct urine specimen of a
person with cancer in comparison to what would be expected by random quessing. Six (6) dogs
were run through nine (9) separate trials (this makes 54 total trials) where in each trial a dog was
presented with six (6) urine samples: of these 6, one(1) was from a bladder cancer patient while
the remaining five (5) were clean.
Questions to ponder:
1. For each trial, the probability of guessing correctly would be 1/6 since there was one of six
samples that was from a cancer patient. So for 54 trials, about how many would you expect to get
right just from random quessing?
2. Based on your answer to question 1, if you as the experimenter wanted to demonstrate that
dogs could detect cancer (i.e. better than random guessing), how many would the dogs need to get
correct in order for you to believe they did significantly better than random quessing? What if
you were trying to show that dogs were NOT better than random quesssing?
3. In just such an experiment, the dogs got 22 out of 54 (about 40%) correct. Assuming that to
start we couldn't say the dogs were better (or worse) than random guessing, what do you think the
chances (i.e. probabiliy) was for the dogs getting at least 22 out of 54 correct? Do you think this
result would be likely or unlikely, again keeping in mind that we start from the assumption that
they are NOT different from random guessing?
4. If 540 instead of 54 trials were conducted would the differences you chose in question 2
change much (e.g. if you said you would need the dogs to get at least 5 more correct than what is
expected by guessing would you still say 5 for the longer trial size)?
Statistical inference – drawing conclusions about our population based on our sample statistics.
Last lesson we constructed 1-proportion and 1-mean confidence intervals to estimate the true
population proportion or true population mean. Now we will introduce hypothesis tests for 1proportion and 1-mean.
Five Steps in a Hypothesis Test
1.
2.
3.
4.
Write null and alternative hypotheses.
Set a level of significance called alpha
Calculate an appropriate test statistic.
Determine a p-value associated with the test statistic.
1
5. Decide between the null and alternative hypotheses and state a "real world"
conclusion.
Step 1:
TERMINOLOGY:
A statistical hypothesis test is a procedure for deciding between two possible statements about a
population. The phrase significance test means the same thing as the phrase "hypothesis test."
The two competing statements about a population are called the null hypothesis and the
alternative hypothesis.
 A typical null hypothesis, Ho, is a statement that two variables are not related. Other
examples are statements that there is no difference between two groups (or treatments)
or that there is no difference from an existing standard value.
 An alternative hypothesis, Ha, is a statement that there is a relationship between two
variables or there is a difference between two groups or there is a difference from a
previous or existing standard.
Considering the pre-lecture scenario, as the experiment what would you construct as the null and
alternative hypotheses?
Ho: The dogs were no different from random guessing in identifying cancer patients
Ha: There is a difference between dogs and random guessing in identifying cancer patients.
NOTATION:
The notation Ho represents a null hypothesis and Ha represents an alternative hypothesis.
The possible hypotheses statements are:
1-Proportion
Ho: p = po
Ha: p ≠ po
or
Ha: p > po
or
Ha: p < po [Remember, only select one Ha]
or
Ha: u > uo
or
Ha: u < uo [Remember, only select one Ha]
1-mean
Ho: u = uo
Ha: u ≠ uo
The first Ha is called a two-sided test since "not equal" implies that the true value could be either
greater than or less than the hypothesized value. This two sided alternative is the most common
set up. However, the other two Ha are referred to as one-sided tests since they are restricting the
conclusion to a specific side of an hypothesized value.
2
Returning to our pre-lecture scenario, we have by random guessing a 1/6 chance of correctly
identifying a patient. Since this would be categorical data (i.e. for each trial we would either say
the dog was "correct" or "incorrect" and then calculate the proportion of the total the dogs go
correct), using our new notation we have:
Ho: p = 1/6 Ha: p ≠ 1/6
Special Note: po can be any value from 0 to 1. E.g. in the lab activity where we analyzed the
proportion of students who smoke cigarettes and compared this interval to the U.S. Dept of
Health’s statement that 24% of U.S. adults between 18 – 24 smoke, the po value is 0.24
Step 2
The level of significance, alpha (α), is a “cut-off” that is used to determine if a particular
hypothesis test can be considered significant. For our class we will set this value to 0.05 or 5%.
Referring to the pre-lecture this is the "how unlikely would our results have to be in order to
conclude that this result was too unlikely"
Step 3
The general test statistic format is: (sample statistic – hypothesized value)/S.E.
1-Proportion
Z
pˆ  p0
p0 (1  po )
n
1-mean
t
X  0
S
n
Keep in mind that the use of these test statistics are based on the “rules” we discussed for
confidence intervals. That is, for 1-proportion that the number of successes AND failures is at
least 10; and for 1-mean that either the distribution of the population is approximately normal or
if not that the sample size is at least 30 (i.e. the Central Limit Theorem).
From the pre-lecture, the sample proportion, p̂ , is 22/54 = 0.4 This results in:
Z
.4  1 / 6
.233

 4.56
1 / 6(1  1 / 6) 0.051
54
Step 4
Keep this in mind: The method for finding the p-value is based on the null hypothesis. Minitab
will provide the p-value. If doing by hand, then find p-value from Table A1 for 1-Proportion and
the T-table for 1-Mean.
3
Probability Value (p-value): the probability the data produces a result assuming the null
hypothesis is true. Therefore the smaller the p-value the stronger the evidence against the null
hypothesis.
1-Proportion
For Ha: p ≠ po then p-value = 2*P(Z ≥ |z|) That is, find 1 – P(Z < |z|) and then multiply
this p-value by 2.
For Ha: p > po then p-value = P( Z ≥ z)
For Ha: p < po then p-value = P( Z ≤ z)
1-Mean Keep in mind that Degrees of Freedom (DF) is N – 1 and that table values are
representative of the area to the right of the absolute value of the t test statistic.
a. Using the t test statistic from Step 3 go across the top row of Table A3 to locate test
statistic. Usually the test stat will not be found but will be compared to the listed values (i.e. less
than the first one, between two, or greater than the last one).
b. After locating where the test statistic would “fall” in the table locate the row for the
proper DF from N – 1.
c. Get the p-value(s) from the table that correspond to column t-value(s) found in part a.
d. If Ha is one sided use the p-value(s) in part c. If Ha is two-sided (i.e. not equal) then
double the p-value(s) found in part c.
Step 5
Decision Rule: If the p-value is less than alpha (i.e. 0.05) then reject the null hypothesis, Ho. If pvalue is greater than alpha then fail to reject Ho.
Step 6
Put into words your decision. That is recap your p-value, decision and what this means in terms
have concluding the alternative or not having enough evidence to conclude the alternative.
EXAMPLES
1-Proportion
Continuing with the pre-lecture example...
Check conditions: note we use npo now instead of n*sample proportion.
Step 1:
4
Ho: p = 1/6 Ha: p ≠ 1/6
Step 2:
We set our level of significance, α, at 0.05
Step 3:
Calculate our test statistic. Since our test is a one proportion test we use:
Z
.4  1 / 6
.233

 4.56
1 / 6(1  1 / 6) 0.051
54
Step 4:
We next get the p-value which is based on our alternative being "not equal" we use:
2*P(Z ≥ |z|) = 2*P(Z ≥ |4.56|) and from Z-table the closest we get to 4.56 is 3.49 which as area to
the right of 0.0002 (from 1 - 0.9998). Then 2*0.0002 gets us 0.0004 So we know the p-value is
less than 0.0004 The interpretation of this is that if dogs were no different than random chance
(i.e. 1/6) in identifying cancer patients, then there would be less than a 0.0004 chance that this
sample of dogs would correctly identify 40% or more. Pretty unlikely!
Step 5:
We compare this p-value to 0.05 and if less we reject Ho and if greater we fail to reject
Ho. Here, p-value is less than 0.0004 which in turn is less than 0.05 requiring us to reject
the null hypothesis. We had expected the dogs to get about 9 out of the 54 correct (1/6) if
they were no different from random guessing. However, the dogs getting 22 out of 54
correct was just too unlikely a difference from 9 out of 54 to be due to simply chance.
Step 6:
Putting all this together into words: With our p-value being less than 0.0004, we reject
the null hypothesis and conclude that dogs are better than random chance in identifying
patients with cancer.
1-Mean
Example (from Spring 2013 class survey):
From www.collegedata.com 2011 PSU-UP graduates exited with an average loan debt of $33,
530. Using our survey as a random sample of all PSU-UP, do we have evidence that conflicts
with this amount??
Check conditions: Sample size is 177 which is greater than 30.
5
Step 1: Ho: u = 33,530 Ha: u ≠ 33,530
Step 2: α = 0.05
Step 3: t 
X  0 36352  33530

 0.68
S
54056
n
177
Step 4: First, our DF are 177-1 = 176 so use 100. Then going across the row for 100 we see that
0.68 is less than the first value of 1.29 which corresponds to a right tail probability from the top
of this column of 0.100 Since the alternative hypothesis, Ha, is "not equal" the p-value is found
we double this value to get a final p-value range of 0.200 to 1.00
Step 5: Using a significance value, α, of 0.05 we fail to reject the null hypothesis, Ho, since the
p-value is greater than 0.05.
Step 6: With a p-value of 0.004 to 0.012, we reject the null hypothesis and conclude there is
enough evidence to say that the true mean amount that PSU undergraduates spend on textbooks a
semester is different than $400. Also, since the sample mean was less, we could divide our pvalue by 2 and also conclude that the mean amount is statistically less than $400.
6
Download