COMPARING TWO POPULATION PROPORTIONS USING INDEPENDENT SAMPLES

advertisement
COMPARING TWO POPULATION (OR TREATMENT) PROPORTIONS
1
COMPARING TWO POPULATION
PROPORTIONS USING INDEPENDENT
SAMPLES
EXAMPLE: The article “Foraging Behavior in the
Indian False Vampire Bat” reported that 36 of 193
female bats in flight spent more than 5 minutes in the
air before locating food. For male bats, 64 of 168
exceeded 5 minutes when locating food. Is there
sufficient evidence to indicate that the proportion of
flights taking longer than 5 minutes differs for the
two sexes?
Note: two independent samples and the interest is in
comparing the proportions for the two genders
Notation:
Population Population Sample
Proportion
Size
π1
n1
1
π2
n2
2
Sample
Proportion
π̂1
π̂ 2
To compare 2 population proportions we usually
consider the size of the difference π1 − π 2 :
COMPARING TWO POPULATION (OR TREATMENT) PROPORTIONS
2
π1 − π 2 = 0 ⇒ π1 = π 2
π1 − π 2 > 0 ⇒ π1 > π 2
π1 − π 2 < 0 ⇒ π1 < π 2
Our sampling estimate of this difference is the
difference in the sample proportions πˆ1 − πˆ 2 when
the two samples are independent of one another.
The estimator of π1 − π 2 is πˆ1 − πˆ 2 .
Sampling Distribution of πˆ1 − πˆ 2 when the two
samples are independently and randomly taken:
1) the mean of the distribution is
µπˆ1 −πˆ 2 = π 1 − π 2 (that is, πˆ1 − πˆ 2 is unbiased)
2) the standard deviation of the distribution is
π (1 − π 1 ) π 2 (1 − π 2 )
σ πˆ1 −πˆ 2 = 1
+
n1
n2
3) the shape of the distribution is approximately
normal (a bell curve) if both n1 and n1 are large.
The sample sizes are usually considered to be
sufficiently large if the following statement is true:
COMPARING TWO POPULATION (OR TREATMENT) PROPORTIONS
3
n1πˆ1 ≥ 5, n1 (1 − πˆ1 ) ≥ 5
n2πˆ 2 ≥ 5, n2 (1 − πˆ 2 ) ≥ 5
This, in words, says that you should have enough
observations from each population so that at least 5
observations should be successes and at least 5
should be failures. Hence, at a minimum, n1 ≥ 10
and n2 ≥ 10 (but only if π 1 = π 2 = 0.5 ).
As we’ll see, the estimator of
σ πˆ1 −πˆ 2 =
π 1 (1 − π 1 ) π 2 (1 − π 2 )
n1
+
n2
depends on whether we are constructing a confidence
interval or performing a test of the difference in the
two population proportions.
COMPARING TWO POPULATION (OR TREATMENT) PROPORTIONS
4
Large Sample Test of the Difference in Two
Population Proportions Based on Two
Independent Samples
H0: π1 − π 2 = 0
Null hypothesis:
Alternative Hypothesis is one of three:
a)
b)
c)
HA: π1 − π 2 > 0
HA: π1 − π 2 < 0
HA: π1 − π 2 ≠ 0
Test Statistic:
z=
(πˆ1 − πˆ 2 )
⎛1 1⎞
+ ⎟⎟
⎝ n1 n2 ⎠
where
πˆ C (1 − πˆ C )⎜⎜
n1πˆ1 + n2πˆ 2
n1 + n2
total # successes in both samples
=
total sample size
πˆ C =
P-value: depends on the alternative hypothesis:
a) P-value = Pr( Z > z)
b) P-value = Pr( Z < z)
c) P-value = 2 Pr( Z < - |z| )
COMPARING TWO POPULATION (OR TREATMENT) PROPORTIONS
5
Decision Rule: reject Ho if P-value ≤ α
Assumptions:
1. n1 and n2 are large enough for the sample
proportions to be approximately normally distributed
2. the sampling was random and not more than 5%
of the population.
3. the two samples are independently taken
EXAMPLE Bats:
Sample Statistics:
Popula
tion
1= female
2= male
Sample
Size
n1= 193
#Suc- Sample Proportion
cesses
36
36
πˆ1 =
= .1865
193
64
n2 = 168
64
πˆ 2 =
= .3809
168
Hypotheses:
Assumptions:
Ho: π1 − π 2 = 0
HA: π1 − π 2 ≠ 0
n1πˆ1 ≥ 5, n1 (1 − πˆ1 ) ≥ 5
n2πˆ 2 ≥ 5, n2 (1 − πˆ 2 ) ≥ 5
have
COMPARING TWO POPULATION (OR TREATMENT) PROPORTIONS
6
been met. And we have 2 random samples.
Significance level:
let’s use α = 0.05
Test Statistic: first we need the common proportion
πˆ C =
Then,
z=
n1πˆ1 + n2πˆ 2
36 + 64
=
= .277
n1 + n2
193 + 168
(πˆ1 − πˆ 2 )
⎛1 1⎞
+ ⎟⎟
⎝ n1 n2 ⎠
πˆ C (1 − πˆ C )⎜⎜
=
(.1865 − .3809)
= −4.12
1 ⎞
⎛ 1
+
.277(1 − .277)⎜
⎟
193
168
⎝
⎠
P-value: = 2 Pr(Z< -|z|) = 2 Pr(Z<-4.12) <0.0001 ≈ 0+
Conclusions: We reject the null hypothesis since pvalue <0.0001 <<<< α=0.05. There is strong
evidence based on these samples, that the population
proportion of female false vampire bats who take
COMPARING TWO POPULATION (OR TREATMENT) PROPORTIONS
7
longer than 5 minutes searching for food is different
from the proportion for male bats.
EXAMPLE Old Faithful, the geyser at Yellowstone
National Park, is known to have two distinct types of
eruptions: long-duration (> 3 minutes) and short
duration (< 3 min). If the types of eruptions are
equally likely at all times of the day, then the
proportion of long duration eruptions occurring
during the day should be the same as the proportion
at night. A geologist hypothesized that the length of
duration was affected by solar heating during the day
and hence, the proportion of daytime long duration
eruptions should be higher than the night-time
proportion. Two samples were taken in August over
several days and nights. The geologist observed 53%
long duration eruptions during the day (out of 35
eruptions) and 49% (out of 41 eruptions) at night. Is
there sufficient evidence to support the scientist’s
claim? Use a significance level of 0.025.
Hypotheses:
Ho: π1 − π 2 = 0
HA: π1 − π 2 > 0
(population 1 is the daytime eruptions and 2, the
night time)
COMPARING TWO POPULATION (OR TREATMENT) PROPORTIONS
Assumptions:
n1πˆ1 ≥ 5, n1 (1 − πˆ1 ) ≥ 5
n2πˆ 2 ≥ 5, n2 (1 − πˆ 2 ) ≥ 5
8
?
random samples?
Significance level:
α = 0.025
Test Statistic: first we need the common proportion
Then, z =
(πˆ1 − πˆ 2 )
⎛1 1⎞
+ ⎟⎟
⎝ n1 n2 ⎠
πˆ C (1 − πˆ C )⎜⎜
P-value:
COMPARING TWO POPULATION (OR TREATMENT) PROPORTIONS
9
Conclusions:
Large Sample Confidence Interval Estimation of
The Difference Between Two Proportions Based
on Independent Samples:
Interval Estimator:
(πˆ1 − πˆ 2 ) ± zα ×
2
πˆ1 (1 − πˆ1 ) πˆ 2 (1 − πˆ 2 )
n1
+
n2
where the z critical value is based on the confidence
level (1 – α) desired
Assumptions:
1. n1 and n2 are large enough for the sample
proportions to be approximately normally distributed
2. the sampling was random
3. the two samples are independently taken
Note that the estimator of SE(πˆ1 − πˆ 2 ) is different
than the one used in hypothesis testing!
COMPARING TWO POPULATION (OR TREATMENT) PROPORTIONS
10
EXAMPLE for the bats let’s use a 90% C.I. to
estimate the difference in proportions of time spent
searching for food between males and females.
Now, the z critical value for 90% is 1.645. So, a 90%
C.I. is
(πˆ1 − πˆ 2 ) ± 1.645
πˆ1 (1 − πˆ1 ) πˆ 2 (1 − πˆ 2 )
n1
= (.187 − .381) ± 1.645
+
n2
.187(1 − .187) .381(1 − .381)
+
193
168
= −.194 ± 1.645(.0468)
= −.194 ± .077 = (−.271, − .117)
Hence, with 90% confidence, the population
proportion of female false vampire bats that spend
more than 5 minutes locating food is between 11.7%
and 27.1% lower than the population proportion of
male bats which spend more than 5 minutes locating
food. (We could reverse that and say that the
proportion of males spending more than 5 minutes is
between 11.7 and 27.1% higher than the proportion
of females.)
Download