Paul T. von Hippel Department of Sociology & Initiative in Population Research Ohio State University 300 Bricker Hall 190 N. Oval Mall Columbus, OH 43210 von-hippel.1@osu.edu 614 688-3768 592 words Difference of proportions. When people are divided into groups, different proportions of each group will display certain traits, attitudes, or outcomes. For example, different proportions of men and women vote for Democrats, different proportions of Americans and Japanese suffer from breast cancer, and different proportions of mainline and fundamentalist Protestants approve of abortion for unmarried women. The difference between the abortion attitudes of mainline and fundamentalist Protestants may be estimated using data from the US portion of the 1990 World Values Survey, summarized in the following table. Table 1. Mainline Fundamentalist Totals Approve 172 (n11) 23 (n12) 195 (n1.) Disapprove 404 (n21) 125 (n22) 529 (n2.) Totals 576 (n.1) 148 (n.2) 724 (n..) Note. To simplify the example, we treat the World Values Survey like a SIMPLE RANDOM SAMPLE. The actual survey design was more complicated. Paul von Hippel Page 1 2/16/2016 Of n.1=576 mainline Protestants, n11=172 approved of abortion for unmarried women, a proportion of p1=172/576=.2986. Of n.2=148 fundamentalist Protestants; n12=23 approved of abortion for unmarried women, a proportion of p2=23/148=.1554. Among the surveyed Protestants, then, the difference of proportions was p1-p2=.1432: the surveyed mainline Protestants were 14.32% more likely to approve of abortion for unmarried women. If we wish to generalize from this sample to the US Protestant population, we need to know the SAMPLING DISTRIBUTION for the difference of proportions. In large samples, the difference of proportions p1-p2 approximates a NORMAL DISTRIBUTION, whose STANDARD DEVIATION is known as the STANDARD ERROR for the difference of proportions. Different standard errors are used for CONFIDENCE INTERVALS and HYPOTHESIS TESTS. For confidence intervals, the most common standard error formula is s p1 p2 p1 (1 p1 ) p2 (1 p 2 ) n n , where p1 11 and p2 12 . n.1 n.2 n.1 n.2 and the corresponding approximate confidence interval is p1 p 2 z1 2 s p1 p2 where z1 / 2 is the (1-/2)X100th PERCENTILE of the STANDARD NORMAL DISTRIBUTION. Although popular, this confidence interval has poor coverage, especially when the table counts nij are low. Better coverage, especially for low counts, is obtained by adding 1 to each nij and 2 to each n.j. (Agresti & Cato, 2000). The improved standard error is ~ s p1 p2 ~ n 1 p1 (1 ~ p1 ) ~ p (1 ~ p2 ) n 1 2 p1 11 p 2 12 , where ~ and ~ n.1 2 n.2 2 n.2 2 n.1 2 Paul von Hippel Page 2 2/16/2016 and the corresponding approximate confidence interval is ~ p1 ~ p 2 z1 2 ~ s p1 p2 For the Protestant data, ~ p1 ~ p2 .1393 , ~s p1 p2 =.0355, and z.975=1.96, so a 95% confidence interval runs from .0698 to .2088. In other words, the population of mainline Protestants is probably between 6.98% and 20.88% more likely to approve of abortion for unmarried women. (Sometimes the confidence limits calculated from this formula exceed the lower bound of –p1 or the upper bound of 1-p1. On such occasions, it is customary to adjust the confidence limits inward.) If we wish to test the NULL HYPOTHESIS of equal proportions, we use a slightly different standard error. Under the null hypothesis, we regard the surveyed Protestants as a single sample of n..=724 persons of whom n1.=195 approve of abortion for unmarried women, a proportion of p=195/724=.2693. In this setting, an appropriate standard error formula is sˆ p1 p2 1 n. 1 , where p 1 p(1 p) n.. n.1 n.2 For the Protestant data, sˆ p1 p2 =.0409. An appropriate test statistic is z p1 p 2 , sˆ p1 p2 which, under the null hypothesis, follows an approximate standard normal distribution. For the Protestant data, z=.1432/.0409=3.503, which has a 2-TAILED P VALUE of .00046. So we reject the null hypothesis that mainline and fundamentalist Protestants are equally likely to approve of abortion for unmarried women. Paul von Hippel Page 3 2/16/2016 When squared, this z statistic becomes the CHI-SQUARE (2) statistic used to test for association in a 2X2 contingency table. For the Protestant data 2=z2=12.27, which like the z statistic has a p value of .00046. The z and 2 tests are considered reasonable approximations when all table counts nij are at least 5 or so. For tables with lower counts, z and 2 should be avoided in favor of Fisher’s exact test (Hollander & Wolfe, 1999). PAUL T. VON HIPPEL References Agresti, Alan, & Cato, Brian. (2000). Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. The American Statistician 54(4), 280-288. Agresti, Alan, & Finlay, Barbara. (1997). Statistical methods for the social sciences. Upper Saddle River, NJ: Prentice Hall. Hollander, Myles, & Wolfe, Douglas A. (1999). Nonparametric statistical methods (2nd ed.). New York: Wiley. Paul von Hippel Page 4 2/16/2016