Confidence Interval for a Difference of Population Proportions

advertisement
1
Chapter 10 – Inferences Concerning Proportions
Inferences for a Single Population Proportion
We want to be able to do inference about the value, p, of the proportion of a population possessing a
certain characteristic. In order to do inference we will need to perform a binomial experiment. To
review, an experiment is binomial if it possesses the following characteristics:
1) It consists of a fixed number, n, of trials;
2) The trials are identical to each other, in that they are performed
the same way;
3) The trials are independent of each other, meaning that the outcome of one trial gives no
information about the outcome of any other trial;
4) Each trial results in one of two possible outcomes; Success or
Failure;
5) P(Success) = p for each of the trials.
Let Y = number of successes in our binomial experiment. Then Y is the sum of n independent and
identically distributed Bernoulli random variables, and the Central Limit Theorem says that the random
variable
pˆ 

Y
~ Normal p,
n

p1  p  

 , approximately, for large n.
n

We need one more theoretical result before we can construct our confidence interval estimate for p.
Slutsky’s Theorem tells us that if
Pˆ  p
p 1  p 
n
Pˆ  p

Pˆ 1  Pˆ

has an approximate standard normal distribution, then
also has an approximate standard normal distribution.
n
Confidence Interval for p:
Given a confidence level, 1 - , we can make the following statement, using the result from the C.L.T.:



P  z  
 2




Pˆ  p

 z   1  
.
Pˆ 1  Pˆ
2 

n



2
Rearranging, we obtain:




Pˆ 1  Pˆ
Pˆ 1  Pˆ

ˆ
ˆ
P P  z
 p  P  z

n
n
2
2

Hence an approximate (1 - )100% confidence interval estimate for p is
   1  


Pˆ  z 
2
.

Pˆ 1  Pˆ
n

.
Example: p. 282, Exercise 10.3
Sample Size for a Specified Margin of Error:
As part of our experimental design, we want to specify the margin of error, E, that is acceptable for our
estimate of p, and choose a sample size to insure that we achieve this margin of error. We let
E  z
2
p 1  p 
n
. Solving for n, we obtain
 z
n 2
 E


2

 p 1 p
.
 


Now, we know E and , but we need to find a usable value for
p(1-p) before we can find the sample size. We use the fact that for any value of p between 0 and 1, we
have p(1-p)  0.25. Then
 z
n 2
 E


2

 1
 4


gives us an upper bound on the sample size that will insure that we will achieve our
desired margin of error with confidence level 1 - .
Example: p. 283, Exercise 10.11
Testing Hypotheses Concerning a Proportion:
We want to test hypotheses of the following possible forms:
1) H0: p = p0 vs. Ha: p  p0
2) H0: p  p0 vs. Ha: p < p0
3) H0: p  p0 vs. Ha: p > p0
The test statistic to be used is
Z
Pˆ  p0
p0 1  p0 
n
. Under the null hypothesis, the Central Limit
Theorem says that this statistic has an approximate standard normal distribution.
3
For the three types of alternative hypotheses, the rejection regions are:
1) Ha: p  p0
Reject H0 if |z| > z(/2)
2) Ha: p < p0
Reject H0 if z < -z()
3) Ha: p > p0
Reject H0 if z > z()
Example: p. 290, Exercise 10.19
Inference for Two Population Proportions Using Independent Samples
We have two separate populations and one specified characteristic. We want to compare the
proportion of members of population 1 who have this characteristic to the proportion of members of
population 2 who have the characteristic.
For population 1: p1 = proportion of population 1 who have the characteristic of interest, n1 = size of
random sample selected from population 1, and
characteristic.
p̂1 = proportion of sample 1 who have the
For population 2: p2 = proportion of population 2 who have the characteristic of interest, n2 = size of
random sample selected from population 2, and
characteristic.
p̂ 2 = proportion of sample 2 who have
We want to test one of the following alternative hypotheses against the appropriate null hypothesis.
Ha: p1 – p2  0
Ha: p1 – p2 > 0
Ha: p1 – p2 < 0
The parameter of interest to us is p1 – p2, and its point estimator is
pˆ 1  pˆ 2
. If both samples are
pˆ  pˆ
large, then the sampling distribution of
1
2 is approximately normal, and the following
random variable has an approximate standard normal distribution:
Z
 pˆ 1  pˆ 2    p1  p 2 
p1 1  p1  p 2 1  p 2  .

n1
n2
To test any of the above alternative hypotheses against the corresponding null hypothesis, we use the
following statistic:
4
Z
 pˆ 1  pˆ 2 
1
1 
p (1  p )  
 n1 n2 
, where
p
n1 pˆ 1  n2 pˆ 2
n1  n2
is the average of the two
sample proportions. Under the null hypothesis, this statistic has an approximate standard normal
distribution.
Example: Photolithography plays a central role in manufacturing integrated circuits made on thin discs
of silicon. Prior to a quality-improvement program, too many rework operations were required. In a
sample of 200 units, 26 required reworking of the photolithographic step. Following training in the use
of Pareto charts and other approaches to identify significant problems, a new sample of size 200 had
only 12 that needed rework. Is this sufficient evidence to conclude at the 0.01 level of significance that
the improvements have been effective in reducing the rework?
Confidence Interval for a Difference of Population Proportions
We want to estimate the difference between the proportion of population 1 who have a characteristic of
interest and the proportion of population 2 who have the characteristic. The formula for the (1 –
)100% confidence interval is
 pˆ 1  pˆ 2   z 
2
pˆ 1 1  pˆ 1  pˆ 2 1  pˆ 2 

.
n1
n2
Example: In the previous example, we want an approximate 99% confidence interval estimate of the
difference between the proportion of units requiring rework after the improvement program and the
proportion of units requiring rework prior to the improvement program.
Download