Estimation & Tests of Population

advertisement
ESTIMATION AND TESTING POPULATION PROPORTIONS
1
INTERVAL ESTIMATION OF THE
POPULATION PROPORTION π
Recall the binomial experiment in which we randomly
select individuals from a population and record, for each
individual, which of two categories they belong to. One of
the two categories is defined as a “Success” and by default
the other is a “Failure”.
e.g. categories that are exhaustive and mutually exclusive:
male vs female
succeed vs. fail
herbaceous vs. woody
trained vs. untrained
The population proportion of successes is denoted π and
the sample proportion of successes is denoted
πˆ =
number of observed successes
.
number of trials
e.g. suppose we are studying ESP and we perform an
experiment in which an individual is tested with three different
playing cards. Each test consists of a card being held up and
their guess as to which card it is. If they do not have ESP, then
they have a 1/3 chance (π = 0.33 ) of picking a card correctly.
Now, suppose I run this experiment with the person being
tested 25 times (n) and they get 10 correct. Then I observed a
10
sample proportion of πˆ =
= 0.40 correct responses.
25
ESTIMATION AND TESTING POPULATION PROPORTIONS
2
If the sample size (n) is sufficiently large (both nπ and
n(1 − π ) ≥ 5 ), then
The sample proportion πˆ is approximately Normally
distributed with mean µπˆ = π and
variance σ π2ˆ =
π (1 − π )
n
.
Since we don’t know the population proportion, estimate
the unknown variance with
sπ2ˆ =
πˆ (1 − πˆ )
n
and we check if the sample size is sufficiently large by
checking that both nπˆ and n(1 − πˆ ) ≥ 5 .
e.g. Parents of autistic children are often told that their child is
autistic around 1-2 years of age, approximately the same age
that children receive their MMR vaccinations (mumps, measles
and rubella). As a result some parents claim that the vaccine
causes autism. To test this, a study was done to estimate the
rate of autism in children who receive the MMR vaccine. In a
sample of 8,500 randomly selected children who did receive
the MMR vaccine, the proportion with autism was .00282. Can
we assume approximate normality?
πˆ =
n=
ESTIMATION AND TESTING POPULATION PROPORTIONS
3
nπˆ =
n(1 − πˆ ) =
sπ2ˆ =
πˆ (1 − πˆ )
n
=
A large sample 95% confidence interval for the
population proportion π is
πˆ ± 1.96
πˆ (1 − πˆ )
n
• Large-sample means that the sampling was done
randomly and the sample size is sufficiently large to
invoke the Central Limit Theorem.
• 1.96 is the z-score, z*, that makes the following
statement true: 0.95 = Pr(- z* < Z < + z*). We use this
because we are using the CLT which states that
sample proportions are normally distributed for large
samples
ESTIMATION AND TESTING POPULATION PROPORTIONS
The formula is easily adapted for other confidence levels.
Simply replace 1.96 with the appropriate number from the
table below. The z critical values for common confidence
levels are:
Confidence Level
80%
90%
95%
98%
99%
99.9%
Z critical values
1.28
1.645
1.96
2.33
2.58
3.29
e.g. Autistic children. A 95% CI for the true proportion
children who have received the MMR vaccine that are autistic
is given by
πˆ ± 1.96
πˆ (1 − πˆ )
n
= 0.00282 ± 1.96(0.0005752)
= 0.00282 ± 0.00113
We interpret this to mean that we are 95% confident that the
true proportion children who have received the MMR vaccine
that are autistic is within the interval (0.00169, 0.00395).
4
ESTIMATION AND TESTING POPULATION PROPORTIONS
5
e.g. A researcher flew to the South Pacific and collected 150
fiddler crabs. For each crab she recorded whether the left or
right pincer was dominant and observed that 20 crabs were
left-pincered. Calculate a 90% C.I. to estimate the true
proportion of left-pincered crabs on the island.
1) Is the sample size large enough to use our method?
2) πˆ = 20
150
= 0.133, and sπˆ =
πˆ (1 − πˆ )
n
=
3) 90% Confidence Î z =
4) The 90% C.I. then is
5) What if we had calculated a 95% C.I.? Would it be wider or
shorter than the 90% C.I.?
ESTIMATION AND TESTING POPULATION PROPORTIONS
6
6) What if she had seen 13.3% based on a sample of 300
crabs? Would the 90% C.I. be wider or shorter than the one
based on 150 crabs?
Defn: Confidence intervals can be written in the form
point estimate ± MARGIN OF ERROR
where the margin of error (ME) is the product of the
critical value and the standard deviation of the point
estimate.
Suppose the scientist is planning to repeat the fiddler crab
experiment and wants to calculate a 95% confidence interval
with a margin of error of no more than 2.5%. How big a
sample size should she take in the new experiment?
Margin of Error (ME) = 1.96
πˆ (1 − πˆ )
n
= 0.025
From the earlier experiment an estimate for πˆ is 0.133 so
we’ll use that.
ESTIMATION AND TESTING POPULATION PROPORTIONS
Now we need to solve 1.96
7
.133(1 − .133)
= 0.025 for n.
n
General equation to estimate the needed sampled size for a
specified margin of error (ME) when estimating a
population proportion is:
⎛ zα ⎞
⎟
⎜
2
n = π 0 (1 − π 0 )⎜
⎟
⎜ ME ⎟
⎠
⎝
2
where
• π 0 is hypothesized as the likely true proportion in
the population (if completely unsure use π 0 = 0.5 )
• zα is the z critical value for the desired confidence
2
level (1-α)100%
• ME is the desired margin of error (in decimals)
ESTIMATION AND TESTING POPULATION PROPORTIONS
8
TESTING THE POPULATION PROPORTION π
Let’s walk through one example, put the pieces into a
testing procedure for proportions and then use the
procedure in another example
e.g. autistic children. Some parents claim that the MMR
vaccine causes autism. To test this, a study was done to
compare the rate of autism in children who receive the MMR
vaccine to the known population rate for children who do not
receive the vaccine. Among those who did not receive the
vaccine, the proportion of children with autism is 0.0021. In a
sample of 8,500 randomly selected children who did receive
the MMR vaccine, the proportion with autism was .0028. Is
this sufficient evidence to indicate that the vaccine is related to
autism?
H0: π =
HA: π
1.
Hypotheses:
2.
Significance level: α =
3. If the Null Hypothesis is true (which is assumed to be
true until proven otherwise), then
The distribution of the sample proportion that we get from
doing such an experiment, πˆ , has a mean of
µπˆ = π = 0.0021
ESTIMATION AND TESTING POPULATION PROPORTIONS
and a standard deviation of
σ πˆ =
π (1 − π )
n
= 0.00049567
Further the distribution of πˆ is approximately Normal in
shape if the sample size is big enough (both nπ and
n(1 − π ) ≥ 5 ).
Check:
nπ = 8500(0.0021) = 17.85 > 5
n(1 − π ) = 8500 − 17.85 = 8482.15 > 5
So, suppose H0 is true. Is a value of πˆ = 0.00282
sufficiently larger than π 0 = 0.0021 to imply that the true
rate is larger than 0.0021?
Let’s convert the observed sample proportion to a z-score
so that we can interpret the difference more easily:
z=
πˆ − µπˆ 0.0028 − 0.0021
=
= 1.4122
σ πˆ
0.00049567
(this z-score is assuming that H0 is true!)
9
ESTIMATION AND TESTING POPULATION PROPORTIONS
10
This says that 0.0028 is 1.41 standard deviation units above
the hypothesized value of 0.0021. Is this very likely if the
null hypothesis is true? Is it supportive of H0 or HA?
What is the p-value associated with this z-score? We
calculate Pr( z > 1.41) = p − value .
Pr( z > 1.41) = 1 − Pr( z ≤ 1.41) = 1 − 0.9207 = 0.0793
So, the probability that a random sample would yield a
sample proportion of 0.0028 or more by chance alone when
the null hypothesis is true is approximately 8%.
So, are the data sufficiently contradictory of H0 for us to
reject it?
ESTIMATION AND TESTING POPULATION PROPORTIONS
Large Sample Hypothesis Test of a Population
Proportion
Null hypothesis:
H0: π = πo
where πo is the
hypothesized value
Alternative Hypothesis is one of three:
a) HA: π > πo
b) HA: π < πo
c) HA: π ≠ πo
Test Statistic: z =
πˆ − π o
π o (1 − π o )
n
where n is the sample size and πˆ is the observed sample
proportion
P-value: depends on the alternative hypothesis:
a) p-value = Pr( Z > z)
b) p-value = Pr( Z < z)
c) p-value = 2 Pr( Z < -|z|)
Decision Rule: reject Ho if P-value ≤ α
Assumptions:
1. n is large enough for p to be approximately normally
distributed ( nπo≥5 and n(1-πo)≥5 )
2. the sampling was random
11
ESTIMATION AND TESTING POPULATION PROPORTIONS
12
e.g. The incidence rate of a certain type of chromosome defect
in adult males in the U.S. is believed to be 1 in 80. A random
sample of 1000 men in prison revealed 20 men with defects. Is
there evidence to suggest that the rate for prisoners differs
from that in the general population? Use a significance level of
0.05.
Null hypothesis:
H0: π = πo = 1/80 = .0125
Alternative Hypothesis:
HA: π ≠ .0125
Check assumptions:
1) nπo = 1000(.0125) = 12.5 ≥5
n(1-πo)=987.5 ≥5
2) sampling was given to be random
Test Statistic:
πˆ − π o
.02 − .0125
z=
=
= 2.1347
π o (1 − π o )
.0125(1 − .0125)
n
P-value:
1000
2 Pr( Z < -|z|) = 2 Pr(Z<-2.13)
= 2(0.0166) = 0.0332
Conclusion: reject the null hypothesis since
0.0332<0.05=α. There is sufficient evidence based on this
sample to conclude that the population of adult males in the
U.S. penal system has a different rate of a certain type of
genetic defect than the general adult male population.
ESTIMATION AND TESTING POPULATION PROPORTIONS
13
e.g. Suppose a genetic crossing experiment was performed. If
independent sorting of the genes occurs then it is expected that
25% of the offspring would display a certain characteristic. If a
particular type of non-independent event occurs, the proportion
should be smaller. The experiment resulted in 50 plants out of
230 having the characteristic. Is this sufficient evidence to
reject independent sorting of the genes?
Significance level: α=.10
Null hypothesis:
H0: π = πo =
Alternative Hypothesis: HA: π
Check assumptions:
1) nπo =
n(1-πo) =
2)
sampling?
Test Statistic:
P-value:
Conclusion:
Download