Chapter 22, Part 2: Computing p

advertisement
Reminders
• Last HW and Last quiz on Thursday
• My office hours will be Today from 11-1
• If you won’t be around during the final week to take the
Final Project, please email me ASAP to arrange for a
time for you to take it.
1
Warmup
• A drug company develops an AIDS treatment that they
hope will reduce the proportion of AIDS patients who die
within 50 years. In a randomized control trial, 35% of
patients in the control group died within 5 years. The
drug company would like to show that the proportion of
patients who die within 5 years in the treatment group is
less than this.
• What is the null hypothesis for this experiment?
• What is the alternative hypothesis for this
experiment?
2
Warmup
• A drug company develops an AIDS treatment that they
hope will reduce the proportion of AIDS patients who die
within 50 years. In a randomized control trial, 35% of
patients in the control group died within 5 years. The
drug company would like to show that the proportion of
patients who die within 5 years in the treatment group is
less than this. What is the null hypothesis for this
experiment? What is the alternative hypothesis for
this experiment?
• H0 : p = 0.35
• Ha : p < 0.35
3
Warmup
• It turns out that 28% of the patients in the treatment
group died within 5 years. The drug company calculates
that the p-value for the experiment is .014. What does
this p-value mean?
• Before the trial, the drug company set the significance
level of the test at α = 1% = .01. What is the
conclusion of this experiment?
4
Warmup
• It turns out that 28% of the patients in the treatment
group died within 5 years. The drug company calculates
that the p-value for the experiment is .014. What does
this p-value mean?
• There is a .014 chance (14 in 1000 chance) that we
would observe results as extreme (as small) as we did if
the null hypothesis was true.
• Before the trial, the drug company set the
significance level of the test at α = 1% = .01. What
is the conclusion of this experiment?
• Since the p-value is larger than the significance level, we
fail to reject the null hypothesis and conclude that
the differences we observe could be due to random
chance alone. So, we don’t have enough evidence to
suggest that the treatment group has a statistically lower
percent of people dying within 5 years.
5
Chapter 22, Part 2: Computing p-values
for significance tests
Aaron Zimmerman
STAT 220 - Summer 2014
Department of Statistics
University of Washington - Seattle
6
Practice
The U.S. military would like to know whether the proportion
of women in the military has changed in the last 20 years. In
1992, they know that 4.6% of active-duty soldiers were
women. They would like to know if the current proportion is
different that this value.
What is the null hypothesis for this experiment?
What is the alternative hypothesis?
7
Practice
The U.S. military would like to know whether the proportion
of women in the military has changed in the last 20 years. In
1992, they know that 4.6% of active-duty soldiers were
women. They would like to know if the current proportion is
different that this value.
What is the null hypothesis for this experiment?
What is the alternative hypothesis?
H0 : p = 0.046
Ha : p 6= 0.046
8
Practice
It turns out that 16% of the active duty soldiers they surveyed
are women. The military calculates that the p-value for the
experiment is .003.
What does this p-value mean?
Before the trial, the military set the significance level of the
test at α = 5% = .05. Remember, the p-value of the test is
.003.
What is the conclusion of this experiment?
9
Practice
It turns out that 16% of the active duty soldiers they surveyed
are women. The military calculates that the p-value for the
experiment is .003.
What does this p-value mean?
It means that there’s a 3 in 1000 chance (.003) that we would
observe a result this extreme (16% or more of active duty
soldiers are women) if the null hypothesis was true.
Before the trial, the military set the significance level of the
test at α = 5% = .05. Remember, the p-value of the test is
.003.
What is the conclusion of this experiment?
Since the p-value is less than α, we reject the null hypothesis
and conclude that our data gives us evidence suggesting that
the percent of active duty soldiers has changed since 1992.
10
Steps of a Test of Significance
• Returning to our motivating example from Monday,
remember that the prosecution in the Kristin Gilbert case
found that there were 34/1384 = .025 deaths per shift
when Nurse Gilbert wasn’t working, and that there were
40/257=.156 deaths per shift when she was working.
• We’d like to know if the high rate of deaths during her
shift can be explained by random variation. That is, we’d
like to know if the rate of deaths during her shift is truly
different than .025.
• Question: Is there sufficient evidence against the null
hypothesis that the rate of deaths on Nurse Gilbert’s
shifts are different than the baseline .025 rate if the
significance level is α = .05?
11
Step 0 and 1: Significance Level & The Hypotheses
• Before we even start, we set the significance level
(α = .05)
• Remember, the claim being tested in a statistical test is
called the null hypothesis (H0 ).
• Nurse Gilbert’s defense claims that she’s unlucky and that
the rate of deaths during her shift is the same as everyone
else (.025).
? So, H0 : p = 0.025
• The statement we hope or suspect is true instead of H0 is
called the alternative hypothesis (Ha or H1 ).
• The prosecution wants to show that the percent of deaths
under Nurse Gilbert is larger than .025.
? So, Ha : p > 0.025
12
Step 2: The Sampling Distribution (if H0 is true)
• Remember, in a test of
10
y
20
30
40
Sampling Distribution
0
significance, we start by
assuming that H0 is true
• If H0 (p = 0.025) is true,
what is the sampling
distribution of p̂?
? The sampling
distribution is Normal
? The mean is p = 0.025
? The
q standard deviation
p(1−p)
is:
=
n
q
.025(1−.025)
= .00974
257
−0.05
0.00
0.05
0.10
0.15
0.20
x
13
Step 3: The Data
30
40
Sampling Distribution
10
0
y
of 257 shifts under Nurse
Gilbert
40
• So, p̂ = 257
= .156
20
• There were 40 deaths out
−0.05
0.00
0.05
0.10
0.15
0.20
x
14
Step 4: The p-value (NEW)
• Remember: a p-value is the
40
Sampling Distribution
0
10
20
30
probability of observing an
outcome as extreme or more
extreme than what we actually
p−value is the
observed if the null hypothesis
‘more extreme'
area under the
were true
Normal curve
• In this problem, the alternative
hypothesis is one-sided
(p > 0.025)
−0.05
0.00
0.05
0.10
0.15
0.20
• So, the p-value is the area under
p
the normal curve that is as far or NOTE: We’d look at the area
under the curve to the left of
further away from the mean of
the observation if the
the distribution.
alternative was Ha : p < .025.
15
Step 4: The p-value (NEW)
• What percent of the sampling
30
40
Sampling Distribution
10
20
p−value is the
‘more extreme'
area under the
Normal curve
0
distribution is greater than the
observation of 40/257=0.156?
? Mean = 0.025
? SD = 0.0097
? Standard score:
.156−.025
= 13.5!
.0097
• Look up the standard score in
Table B. Not so helpful - it just
tells us that it must be less than
1-.9997 = .0003
• My computer says the p-value is
less than 1/100,000,000
−0.05
0.00
0.05
0.10
0.15
0.20
p
16
Step 4: The p-value (NEW)
17
Step 5: Conclusion
• The p-value of .00000001 means that there is a 1 in
100,000,000 chance that Kristin Gilbert would randomly
(and unluckily) have that extreme percent of deaths
during her shifts if the proportion of deaths during shifts
was actually .025 (H0 ).
• Since my significance level is α = .05 and
alpha > p − value, this test IS statistically significant.
• Conclusion: We have enough evidence to reject the
null hypothesis and conclude that the percent of
deaths during Nurse Gilbert’s shifts is larger than
the baseline rate of .025.
• REMEMBER: this doesn’t mean she was killing people,
but it does imply that something different was happening
under her watch.
18
Note #1: p-values in 2-sided tests
• In practice, when Ha is two-sided
30
40
Sampling Distribution
10
20
2−sided p−value is
found by looking at
‘more extreme'
in both directions!
0
(Ha : p 6= .025), we calculate the
area that’s more extreme than
the observation in one direction
and then multiply by two
• We do this because in the
2-sided setting, “more extreme”
could be extreme and large or
extreme and small. Either way
gives us evidence against the
null hypothesis H0
• We’re not doing it for this
problem, but you should be
aware of it!
−0.05
0.00
0.05
0.10
0.15
0.20
p
19
Note #2: Different Sample Sizes
• What if we only saw 7 of Nurse Gilbert’s shifts?
• 7 × 40/257 ≈ 1. So using 1 death in 7 shifts is about the
same ratio.
• Then the sampling
distribution would have mean .025,
q
but SD =
.025(1−.025)
7
= .059
• And the standard score would be
.156−.025
.059
= 2.22
• So the p-value would be 1-.9861 = .0139
• While we still would reject the null at α = .05, the
evidence isn’t as strong, and we wouldn’t reject at
α = .01.
20
Significance Tests for Means
• There’s no reason that we can’t apply the proportions
significance testing framework directly towards
significance tests for means.
• We’ll still use the same steps
• Chat with your neighbor about the strategy we’re
going to take to perform significance tests about a
mean
21
Significance Tests for Means
• There’s no reason that we can’t apply the proportions
significance testing framework directly towards
significance tests for means.
• We’ll still use the same steps
• Chat with your neighbor about the strategy we’re
going to take to perform significance tests about a
mean
Very generally, we find the sampling distribution if the null
hypothesis was true, and then we see how unlikely it was to
record data as extreme as what we’ve seen (still assuming H0
true).
22
Steps 0-5 for Significance Tests on Means
• Step 0: pick a significance level (usually α = .05 unless
•
•
•
•
•
you have a reason to use a different level)
Step 1: Write down the hypotheses (both H0 & Ha )
Step 2: Determine the sampling distribution if H0 is true.
It will be Normal with mean equal to the claim in H0 and
either standard error like the standard errors used in
confidence intervals (from the CLT)
Step 3: The data. Figure out what the sample mean is
from your data
Step 4: Find the p-value. It’s the area under sampling
distribution more extreme than the sample mean
observation. Multiply this p-value by 2 if you have a
2-sided alternative.
Step 5: Make a conclusion. If the p-value is smaller than
α, reject H0 . If the p-value is larger than α, fail to reject
H0
23
Your turn
A doctor claims that 17 year olds have an average body
temperature that is higher than the commonly accepted
average human temperature of 98.6 degrees Fahrenheit. A
simple random statistical sample of 25 people, each of age 17,
is selected. The average temperature of the 17 year olds is
found to be 98.83 degrees, with standard deviation of 0.6
degrees.
The doctor hires you to perform a statistical
significance test to check the validity of his claim.
Perform the significance test.
How would your work change if he instead suspected
that 17 year olds have a temperature different than
98.6 but wasn’t sure if they were hotter or colder?
24
Your turn
• Step 0: Significance level: α = 0.05
• Step 1: Hypotheses: H0 : µ = 98.6 VS Ha : µ > 98.6
• Step 2: Sampling
distribution: Normal with mean = 98.6 and
√
SD = 0.6/ 25 = .12
• Step 3: Data: Observation is 98.83, and we standardize it to
98.83−98.6
= 1.9
.12
• Step 4: P-value: From Table B, 1-.9713 = .0287
• Step 5: Since .0287 < .05 (p − value < α), we reject the null
hypothesis and claim that we have a significant test result at
the .05 level. So, we conclude that we have enough evidence
to reject the null hypothesis that 17-year-olds have a 98.6
degree average temperature in favor for the claim that that
they have a higher average body temperature.
25
Your turn
Sampling Dist. Assuming
Null Hyp. True: µ= 98.6
Sample Mean from
17 year olds (98.83)
1.5
2.0
2.5
3.0
Sampling Distribution of Sample Mean
0.0
0.5
1.0
P−value (.0287)
97
98
99
100
mean body temp
26
Your turn
And if we instead did the two-sided test,
• Step 0: Significance level: α = 0.05
• Step 1: Hypotheses: H0 : µ = 98.6, Ha : µ 6= 98.6
• Step 2: Sampling
distribution: Normal with mean = 98.6 and
√
SD = 0.6/ 25 = .12
• Step 3: Data: Observation is 98.9, and we standardize it to
98.83−98.6
= 1.9
.12
• Step 4: P-value: From Table B,
2 × (1 − .9713) = 2 × .0287 = .0574
• Step 5: Since .0574 > .05 (p − value > α), we now fail to
reject the null hypothesis! So we don’t have enough evidence
at the .05 significance level to suggest that 17-year-olds have
a different average body temperature than the average
human.
27
Your turn
Sampling Dist. Assuming
Null Hyp. True
Sample Mean from
17 year olds
1.5
2.0
2.5
3.0
Sampling Distribution of Sample Mean
0.0
0.5
1.0
P−value
97
98
99
100
mean body temp
28
Homework
• The final HW is up on the website
• After today you can finish reading Ch. 22
• Do problems:
22.24 (use significance level α = 5% = .05)
22.27 (use significance level α = 1% = .01)
22.28
22.32
22.34) A professor once claimed to Aaron that in a small
discussion course, about 10% of the students fall asleep
at some point during class. During Monday’s lecture,
Aaron counted that 1 out of the 20 students in class were
asleep at some point. Is this evidence that the true
proportion is different that 10%? (use significance level α
= 5% = .05, and note that you will need a two-sided
alternative hypothesis)
29
Download