Comparing Two Proportions

advertisement
Comparing Two Proportions
Case Study
Recall the question that was actually asked
in the CPR study reported in the NEJM.
• Do we need to give mouth-to-mouth
ventilation and chest compression?
• Or will just doing chest compression alone
be just as effective?
Summary
• In the Seattle study, heart-attack victims were
randomly assigned to two groups: full CPR or
chest compression alone.
• They found a 10.4% survival rate for those
receiving full CPR (x = 29, n = 278) and a 14.6%
survival rate for those receiving chest
compression alone (x = 35, n = 240).
• The trial was designed to detect a 3.5%
improvement of chest compression alone over
full CPR.
Question
• Is there any difference in the survival
proportions of dispatcher-instructed
bystander administered CPR depending
on whether mouth-to-mouth ventilation is
used or not?
Steps for Hypothesis Testing
Phase 1: State the Question
1. Evaluate and describe the data
2. Review the assumptions
3. State the question—in the form of
hypotheses
©AMB
Phase 2: Decide How to
Answer the Question
4. Decide on a summary number—a
statistic—that reflects the question
5. How could random variation affect that
statistic?
6. State a decision rule, using p-values, to
answer the question
Phase 3: Answer the Question
7. Calculate the statistic
8. Make a statistical decision
9. State the substantive conclusion
Phase 4: Communicate the
Answer to the Question
10. Document our understanding with text,
tables, or figures
The NEJM CPR Results
How do these steps get applied in the
case of comparing two proportions?
Phase 1: State the Question
1. Evaluate and describe the data
Contingency Table
• The number of patients in each group, and
the number of survivors (but not the nonsurvivors) is shown in Table 4.
• This form of tabular display is somewhat
like a contingency table.
• The contingency table corresponding to
these results is:
This form of tabular display is called:
• a contingency table
• a cross-tabulation table
• a two-way classification
• a 2 x 2 table
Observations
• We observed n = 278 CPR patients who
received instructions by phone, of whom x = 29
survived to hospital discharge.
• We observed n = 240 chest-compression alone
patients, of whom x = 35 survived. Overall
(ignoring group membership), there were 64
survivors out of a total of 518.
Histogram
A histogram visually compares two things that
should be compared (proportions)
Tabular Display of Proportions
• Notice the columns sum to 1
• Each proportion was calculated separately for
each population or treatment group
Questions?
• What proportion of everyone receiving chest
compression plus mouth-to-mouth ventilation, survived
to hospital discharge?
• Of those receiving chest compression alone, what
proportion survived to discharge?
• Of those receiving chest compression plus mouth-tomouth ventilation, what proportion did not survive?
• Of those receiving chest compression alone, what
proportion did not survive?
Which display?
The intent is to compare the two survival
proportions. So which display(s) are best?
Use a display(s) that describes the sample
and the statistic being compared.
Example
Population
proportion (N)
Chest
Compression and
mouth-to-mouth
Chest
Compression
Alone
Row Total
Survived
0.104 (29)
0.146 (35)
0.124 (64)
Did not survive
0.896 (249)
0.854 (205)
0.876 (454)
Column Total
1.00 (278)
1.00 (240)
1.00 (518)
Variable
2. Review assumptions
As in the case where we’re interested in a
single proportion, with two proportions
must also meet the three assumptions:
• representativeness,
• independence, and
• sample size
Representativeness
Are the subjects in each group representative of
some population of interest?
If the study subjects were chosen as a simplerandom sample from a larger population and if
these subjects were randomly assigned to the
two groups, then we can be comfortable that the
information in this sample is representative of
the population of interest.
Independence
Does the response of one subject depend
on the response of another?
If so, then the subjects are independent.
Sufficient size
In order for the test statistic to follow the normal
distribution, n must be large enough to expect both 5
survivals and 5 non-survivals in each group.
As in the single population case, we are not asking whether
you observed at least 5 subjects in each cell.
To check this, we must calculate the expected number
of subjects under the null-hypothesis. But we have
not yet stated the hypotheses. Let’s do that and then
come back
3. State the question
Are the proportions in the two groups the
same?
The alternative is that the two groups have a
different survival proportion.
H0: pCPR = pchest
HA: pCPR ≠ pchest
If Null is True
• The two groups are said to be
homogeneous (Of uniform nature, similar in kind)
• The two proportions are the same.
• If they are the same, it’s convenient to think
of the proportion as a single number, p.
• So, another way to think of the null
hypothesis is:
H0: pCPR = pchest = p
What is p?
What is the best estimate of p, the survival
proportion under the null hypothesis?
We observed a total of 64 survivors out of 518
people so we’ll use this, called p-bar:
x1 + x2
p=
n1 + n2
Revisit sample size
If the true proportion is the same for both
groups, we should use p-bar to determine
if there is sufficient size.
If the null hypothesis is true, how many
people do we expect to see in each of the
four cells?
We keep the number of subjects in each
group fixed and use p-bar
Survival groups
If you have 278 people and 0.124 proportion
survive how many do you expect to survive?
( p ) nChest
= 240 × 0.124 = 29.7
If you have 240 people and 0.124 proportion
survive how many do you expect to survive?
( p ) nCPR = 278 × 0.124 = 34.3
Non-survival groups?
Variable
Chest
Compression and
mouth-to-mouth
Chest
Compression
Alone
Row Total
Survived
Did not survive
34.3
243.7
29.7
210.3
64
454
278
240
518
Column Total
• Is this assumption for our statistical test met?
(Are the expected counts in all cells greater than
5?)
• If it is, then we can trust that the sample
proportion will be normally distributed. If we can
trust that the sample proportion is normally
distributed, then we can calculate a p-value.
• If we can calculate a p-value we trust, then we
can make a decision with understandable risk.
Phase 2: Decide How to Answer the
Question
4. Decide on a summary statistic that reflects
the question
• We want to know if the two proportions are the
same:
H0: pCPR = pchest = p
• This is equivalent to asking if the difference
between the two is zero:
H0: pCPR - pchest = 0
One versus Two Proportions
• Recall that when looking at one proportion
there were three possibilities for null
hypotheses.
• In the case when we’re looking at two
proportions we’re almost always interested
in the null-hypothesis: “same proportions”
and the alternative hypothesis: “different
proportions.”
Generic Test Statistic
• From our earlier discussion, recall that the
generic test statistic is:
z=
pˆ − p0
p0 (1 − p0 )
n
H 0: p C P R - p chest = 0
pˆ2 − pˆ1) − 0
(
z=
SE0
=
( pˆ2 − pˆ1) − 0
p (1 − p ) p (1 − p )
+
n1
n2
5. How could random variation
affect that statistic?
• If the null hypothesis is true, then z is zero.
Since the assumptions are met, z is
normally distributed.
• Extreme values of z reflect larger
differences and thus favor the alternative
hypothesis.
6. State a decision rule, using the
statistic, to answer the question
• Just like in the first case study, if we want to
reject the null-hypothesis 5% of the time, our
decision rule is to choose to believe:
H0: pCPR – pchest = 0 . Choose this if p-value ? α (usually 0.05)
HA: pCPR – pchest ? 0. Choose this if p-value < α (usually 0.05)
Phase 3: Answer the Question
7. Calculate the statistic
We’ve already calculated pCPR as 0.104
nchest = 240
pˆ =
35
= 0.146
240
x1 + x2
= 0.124
p=
n1 + n2
Z-score
z=
( 0.104 − 0.146 ) − 0
124 (1 − 124 ) 124 (1 − 0.124 )
+
278
−0.042
=
0.029
= −1.432
240
8. Make a Statistical Decision
• Determine the p-value
• To calculate a p-value, use the “two-tail”
method where we are interested in
calculating the probability of differences
between the two proportions as large or
larger than we observed.
Using p-value Calculator
For z = -1.43
In words
• The p-value = 0.1521.
• Since p-value > α = 0.05, we will fail to
reject the null hypothesis.
9. State a Substantive Solution
There is insufficient evidence to
conclude that the two survival
proportions are different.
Phase 4: Communicate the Answer to the
Question
10. Document our understanding with text, tables, or
figures
For a dispatcher-instructed bystander-administered
intervention after a cardiac arrest, is the survival
proportion for full CPR different from the survival
proportion with chest compression alone? In this study, n
= 278 patients were randomized to the chestcompression and mouth-to-mouth ventilation group, and
we observed p = 0.104 (x =29) survived until hospital
discharge.
Step 10 (cont)
And n = 240 patients were randomized to
the chest-compression alone group, where
we observed p = 0.146 (x =35) survived
until hospital discharge. Thus, there was a
nominal improvement in survival of 4.2%
but the two proportions were compared
and found to be not significantly different
(z = 1.4, p-value = 0.1521).
Question: Why did we report a
positive z value?
By convention, if were doing is testing “is A
different than B?” we could have just as
well phrase the question as “is B different
than A?”.
Thus, the sign does not matter. So, we
report the positive value.
Question: Why is our p-value different than the one
reported in the NEJM paper?
On page 1547 of the paper, in the last
paragraph of methods it says:
“The primary analysis consisted of a
simple comparison of proportions by
Fisher’s exact test.”
Fisher’s Exact Test
Determining the exact probability of
obtaining the observed results or results
that are more extreme.
The z-score is an asymptotic probability
based on large samples requiring that the
normality assumption is met.
Advantage to Fisher’s
• We can use it even if the sample sizes are
too small for the normal approximation
assumptions to be met.
• If we don’t expect to see more than 5
responses in each cell.
Fisher’s method
• Fisher’s idea was that with small samples
we don’t have to approximate the
distribution with z to calculate p-values.
• We can enumerate (count) all the possible
outcomes and calculate p-values exactly.
Enumeration
Let’s look at a simple example. Fisher used an
example of a woman tasting tea.
• A British woman claimed to be able to
distinguish whether milk or tea was added to the
cup first.
• The Null hypothesis is that there is no ability
• Let’s use a more up-to- date question. Can you
tell the difference between Coke and Pepsi?
Two cups
• Pour, hidden from you, two soft-drink
cups. One with Coke and one with Pepsi.
• Then I ask you: “Which is Coke? And
which is Pepsi?”
• What are the possible outcomes of this
experiment?
Possible Results
• And we can look at the exact distribution of
the number of correct.
• Thus we can determine the p-value we’d
conclude for all the possibilities.
Four Cups
Assuming an equal number of Cokes and
Pepsis, the next larger experiment would be
4 cups:
Results?
If someone is guessing randomly, these 6
possibilities are equally likely.
Conclusion
• So if someone got all 4 right, we be able to
conclude that this person could “… tell the
difference between Coke and Pepsi, pvalue = 0.1667.”
• Would this be convincing?
Calculation of Fisher’s exact pvalues
• How are we going to use this exact test in
practice?
• Fortunately, software can calculate these
p-values easily.
• So how do you interpret the output?
Reports all p-values
Which one?
• The most conservative p-value to report is
the “2-tail” one.
• In this case that’s what they did in the
NEJM paper.
Short cut: Comparing Two
Proportions
We start by labeling the four cells with the letters a
thru d:
The Statistic
It’s actually the square of the z statistic we have
already seen:
2
χ =
2
n ( ad − bc )
( a + c )( b + d )( a + b )( c + d )
CPR Example
χ2 =
518 ( 29i205 − 35i249 )
2
( 29 + 249 )( 35 + 205 )( 29 + 35 )( 249 + 205 )
= 2.05
→ notice that
2.05 = 1.43
Decision Rule
• The decision rule is straightforward.
2
• Take the square-root of the χ value (it is
z) and look up the p-value.
Confidence Interval
• Similar to the one proportion CI but use both
observed proportions an an “average” SE:
( pˆ1 − pˆ2 ) ± z 1−α
(
2
)
pˆ1 (1 − pˆ1) pˆ2 (1 − pˆ2 )
+
n1
n2
CPR Example
( 0.104 − 0.146 ) ± 1.96
0.104 (1 − 0.104 )
278
−0.042 ± 1.96 ( 0.029 )
[ −0.099,
0.016]
+
0.146 (1 − 0.146 )
240
Interpretation
We’re 95% confident that the interval –9.9%
to 1.6% covers the true difference in the
population survival proportion from full
CPR versus chest compression alone.
Notice
Note: The 95% CI includes zero, meaning that
using a confidence interval alone to test the
difference, we would conclude the difference is
zero or that there is no difference in the
treatment groups.
If you find a significant difference, you should add
the confidence interval about the observed
difference to step 10 of the hypothesis testing
steps.
Review
We have applied the ten steps of hypothesis testing to
comparing a single observed proportion to an assumed
proportions and comparing two observed proportions.
We tested the two observed proportions by actually testing
if the difference of the two observed proportions is equal
to no difference.
We will continue to apply the 10 steps of hypothesis testing
to other types of hypothesis tests, such as comparing a
single mean to an assumed mean, comparing two
means, and comparing several means.
Download