Confidence Interval for a Difference of Population Proportions

advertisement
1
Inference for Two Population Proportions Using Independent Samples
We have two separate populations and one specified characteristic. We want to compare the
proportion of members of population 1 who have this characteristic to the proportion of members of
population 2 who have the characteristic.
For population 1: p1 = proportion of population 1 who have the characteristic of interest, n1 = size of
random sample selected from population 1, and
characteristic.
p̂1 = proportion of sample 1 who have the
For population 2: p2 = proportion of population 2 who have the characteristic of interest, n2 = size of
random sample selected from population 2, and
p̂ 2 = proportion of sample 2 who have characteristic.
We want to test one of the following alternative hypotheses against the appropriate null hypothesis.
Ha: p1 – p2  0
Ha: p1 – p2 > 0
Ha: p1 – p2 < 0
The parameter of interest to us is p1 – p2, and its point estimator is
pˆ 1  pˆ 2
. If both samples are
ˆ 1  pˆ 2 is approximately normal, and the following random
large, then the sampling distribution of p
variable has an approximate standard normal distribution:
Z
 pˆ 1  pˆ 2    p1  p 2 
p1 1  p1  p 2 1  p 2  .

n1
n2
To test any of the above alternative hypotheses against the corresponding null hypothesis, we use the
following statistic:
Z
 pˆ 1  pˆ 2 
, where p 
n1 pˆ 1  n2 pˆ 2
is the average of the two sample proportions.
n1  n2
1
1 
p (1  p )  
 n1 n2 
Under the null hypothesis, this statistic has an approximate standard normal distribution.
Example: Photolithography plays a central role in manufacturing integrated circuits made on thin discs
of silicon. Prior to a quality-improvement program, too many rework operations were required. In a
sample of 200 units, 26 required reworking of the photolithographic step. Following training in the use
of Pareto charts and other approaches to identify significant problems, a new sample of size 200 had
only 12 that needed rework. Is this sufficient evidence to conclude at the 0.01 level of significance that
the improvements have been effective in reducing the rework?
2
Confidence Interval for a Difference of Population Proportions
We want to estimate the difference between the proportion of population 1 who have a characteristic of
interest and the proportion of population 2 who have the characteristic. The formula for the (1 –
)100% confidence interval is
pˆ 1 1  pˆ 1  pˆ 2 1  pˆ 2 

.
n1
n2
 pˆ 1  pˆ 2   z 
2
Example: In the previous example, we want an approximate 99% confidence interval estimate of the
difference between the proportion of units requiring rework after the improvement program and the
proportion of units requiring rework prior to the improvement program.
Inference About the Ratio
of Two Variances of Normal Populations
When we discussed inference about the difference between the means of two independent populations,
there were two cases to consider – either we could assume that the two populations had equal
variances, or we could not make such an assumption. We want to be able to test whether the two
population variances are unequal. In other words, we want to test the two hypotheses
H0 :
 12   22
v.
Ha:
 12   22 .
Defn: Let X1, X2, …, Xn be a random sample from a distribution which is normal with mean µ and
2
2
1 n
variance  . The sample variance is defined as S 2 
 X i  X  . The random variable
n  1 i 1
 X
n
X2 
n  1S
2
2

i 1
 X
2
i
2
has a chi-square distribution with
d.f. = n – 1. The p.d.f. for distribution which is chi-square with k degrees of freedom is given by
k
y
1 
k
1
f y  k
y 2 e 2 , for y > 0. (Note that this is just a gamma distribution with   and β = 2.)
2
k
2 2  
2
The mean of a chi-square(k) distribution is
k.
The variance of the distribution is
 2  2k .
Defn: Let W and Y be independent chi-square random variables with u and v degrees of freedom,
W / u  has an F distribution with numerator degrees of
respectively. Then the random variable F 
Y / v 
freedom u and denominator degrees of freedom v. The p.d.f. of this distribution is
3
u v


2 

f y 
u v
  
2 2
u
 u  2 2 1
  y
v
 u
1  v

u

y

u v
2
, for y > 0. The mean of an
provided v > 2. The variance of an
Fu ,v
Fu ,v
distribution is  2 
distribution is  
v
,
v2
2v 2 u  v  2
, provided v > 4.
2
u v  2 v  4
Let X11, X12, …, X 1n1 be a random sample from a distribution which is normal with mean µ1 and
2
variance  1 . Let X21, X22, …, X 2n2 be a random sample from a distribution which is normal with
mean µ2 and variance  .
2
2
S
Then the random variable F 
S
2
1
2
2


/  12
has an F distribution with
/  22
numerator degrees of freedom u  n1  1 , and denominator degrees of freedom v  n2  1 .
Testing Hypotheses About the Equality of Variances
We will assume that we have two independent random samples from two normal distributions, the first
having variance  12 , and the second having variance  22 . We want to test whether the two variances
are equal. The test statistic to be used is F 
S12
. Under the null hypothesis, this statistic has an F
S 22
distribution with numerator d.f. = n1 – 1, and denominator d.f. = n2 – 1.
Example: The void volume within a textile fabric affects comfort, flammability, and insulation
properties. Permeability of a fabric refers to the accessibility of void spaces to the flow of a gas or
liquid. The paper “The relationship between porosity and air permeability of woven textile fabrics”
(Journal of Testing and Evaluation, 1997: 108-114) gave summary information on air permeability
(cm3/cm2/sec) for a number of different fabric types. Consider the following data on two different
types of plain-weave fabric:
Fabric Type
Cotton
Triacetate
Sample Size
10
10
Sample Mean
51.71
136.14
Sample SD
0.79
3.59
We want to test whether plain-weave triacetate has a higher mean permeability than plain-weave
cotton. However, to do this test, we need to check the assumption of equal population variances, so
that we know which test statistic to use to compare the means. (Since we have small samples, there is
an additional assumption that needs to be checked, the assumption of normality. However, since we
do not have the raw data for this example, we cannot do normal probability plots.)
Download