1 Inference for Two Population Proportions Using Independent Samples We have two separate populations and one specified characteristic. We want to compare the proportion of members of population 1 who have this characteristic to the proportion of members of population 2 who have the characteristic. For population 1: p1 = proportion of population 1 who have the characteristic of interest, n1 = size of random sample selected from population 1, and characteristic. p̂1 = proportion of sample 1 who have the For population 2: p2 = proportion of population 2 who have the characteristic of interest, n2 = size of random sample selected from population 2, and p̂ 2 = proportion of sample 2 who have characteristic. We want to test one of the following alternative hypotheses against the appropriate null hypothesis. Ha: p1 – p2 0 Ha: p1 – p2 > 0 Ha: p1 – p2 < 0 The parameter of interest to us is p1 – p2, and its point estimator is pˆ 1 pˆ 2 . If both samples are ˆ 1 pˆ 2 is approximately normal, and the following random large, then the sampling distribution of p variable has an approximate standard normal distribution: Z pˆ 1 pˆ 2 p1 p 2 p1 1 p1 p 2 1 p 2 . n1 n2 To test any of the above alternative hypotheses against the corresponding null hypothesis, we use the following statistic: Z pˆ 1 pˆ 2 , where p n1 pˆ 1 n2 pˆ 2 is the average of the two sample proportions. n1 n2 1 1 p (1 p ) n1 n2 Under the null hypothesis, this statistic has an approximate standard normal distribution. Example: Photolithography plays a central role in manufacturing integrated circuits made on thin discs of silicon. Prior to a quality-improvement program, too many rework operations were required. In a sample of 200 units, 26 required reworking of the photolithographic step. Following training in the use of Pareto charts and other approaches to identify significant problems, a new sample of size 200 had only 12 that needed rework. Is this sufficient evidence to conclude at the 0.01 level of significance that the improvements have been effective in reducing the rework? 2 Confidence Interval for a Difference of Population Proportions We want to estimate the difference between the proportion of population 1 who have a characteristic of interest and the proportion of population 2 who have the characteristic. The formula for the (1 – )100% confidence interval is pˆ 1 1 pˆ 1 pˆ 2 1 pˆ 2 . n1 n2 pˆ 1 pˆ 2 z 2 Example: In the previous example, we want an approximate 99% confidence interval estimate of the difference between the proportion of units requiring rework after the improvement program and the proportion of units requiring rework prior to the improvement program. Inference About the Ratio of Two Variances of Normal Populations When we discussed inference about the difference between the means of two independent populations, there were two cases to consider – either we could assume that the two populations had equal variances, or we could not make such an assumption. We want to be able to test whether the two population variances are unequal. In other words, we want to test the two hypotheses H0 : 12 22 v. Ha: 12 22 . Defn: Let X1, X2, …, Xn be a random sample from a distribution which is normal with mean µ and 2 2 1 n variance . The sample variance is defined as S 2 X i X . The random variable n 1 i 1 X n X2 n 1S 2 2 i 1 X 2 i 2 has a chi-square distribution with d.f. = n – 1. The p.d.f. for distribution which is chi-square with k degrees of freedom is given by k y 1 k 1 f y k y 2 e 2 , for y > 0. (Note that this is just a gamma distribution with and β = 2.) 2 k 2 2 2 The mean of a chi-square(k) distribution is k. The variance of the distribution is 2 2k . Defn: Let W and Y be independent chi-square random variables with u and v degrees of freedom, W / u has an F distribution with numerator degrees of respectively. Then the random variable F Y / v freedom u and denominator degrees of freedom v. The p.d.f. of this distribution is 3 u v 2 f y u v 2 2 u u 2 2 1 y v u 1 v u y u v 2 , for y > 0. The mean of an provided v > 2. The variance of an Fu ,v Fu ,v distribution is 2 distribution is v , v2 2v 2 u v 2 , provided v > 4. 2 u v 2 v 4 Let X11, X12, …, X 1n1 be a random sample from a distribution which is normal with mean µ1 and 2 variance 1 . Let X21, X22, …, X 2n2 be a random sample from a distribution which is normal with mean µ2 and variance . 2 2 S Then the random variable F S 2 1 2 2 / 12 has an F distribution with / 22 numerator degrees of freedom u n1 1 , and denominator degrees of freedom v n2 1 . Testing Hypotheses About the Equality of Variances We will assume that we have two independent random samples from two normal distributions, the first having variance 12 , and the second having variance 22 . We want to test whether the two variances are equal. The test statistic to be used is F S12 . Under the null hypothesis, this statistic has an F S 22 distribution with numerator d.f. = n1 – 1, and denominator d.f. = n2 – 1. Example: The void volume within a textile fabric affects comfort, flammability, and insulation properties. Permeability of a fabric refers to the accessibility of void spaces to the flow of a gas or liquid. The paper “The relationship between porosity and air permeability of woven textile fabrics” (Journal of Testing and Evaluation, 1997: 108-114) gave summary information on air permeability (cm3/cm2/sec) for a number of different fabric types. Consider the following data on two different types of plain-weave fabric: Fabric Type Cotton Triacetate Sample Size 10 10 Sample Mean 51.71 136.14 Sample SD 0.79 3.59 We want to test whether plain-weave triacetate has a higher mean permeability than plain-weave cotton. However, to do this test, we need to check the assumption of equal population variances, so that we know which test statistic to use to compare the means. (Since we have small samples, there is an additional assumption that needs to be checked, the assumption of normality. However, since we do not have the raw data for this example, we cannot do normal probability plots.)