2. Chi2-tests for equality of proportions Introduction: Two Samples Consider comparing the sample proportions p1 and p2 in independent random samples of size n1 and n2 out of two populations which have a certain characteristic with respective probabilities π1 and π2. We know from STAT.1030 that the relevant test statistic for equality of proportions is: z=r p1 − p2 p(1 − p) n11 + where p = 1 n2 ∼ N (0, 1) under H0 : π1 = π2, n1 p1 + n2 p2 denotes the combined n1 + n2 proportion of both samples. Recall that this z-statistic is based upon the normal approximation of the binomial distribution and requires therefore sufficiently large sample sizes (n & 30). SAS EG does not offer this z-test. But the same test may be equivalently formulated as a χ2 independence test as follows. 71 Consider the contingency table below: pop.1 pop.2 sum: success n1p1 n2p2 np = f•1 failure n1(1 − p1) n2(1 − p2) n(1 − p) = f•2 sum n1 = f1• n2 = f2• n Since p1 and p2 are unbiased estimators of their population counterparts π1 and π2, H0 : π1 = π2 is equivalent to independence of the success and population variables. Applying the formula for two-way tables, 2 n(f f − f f ) 11 22 12 21 χ2 = , f1•f2•f•1f•2 yields after some manipulation: χ2 = (p1 − p2)2 ∼ χ2(1) under H0. p(1 − p) n1 + n1 1 2 It should not surprise us that this is just the square of our familiar z-statistic, since we know from STAT.1030 that the square of a N (0, 1)-distributed random variable is χ2(1)distributed. 72 Homogeneity tests in SAS EG Since SAS does not offer the familiar z-test, we resort to the equivalent χ2-independence test instead. The advantage of the χ2-test is that we may use it to compare arbitrarily many sample proportions and/or variables with arbitrarily many outcomes (rather than just success/failure). For example, our introductory example (satisfaction with the companies management) of section 1.3.1 may be regarded as a comparison of the conditional distribution of satisfaction on a 3-level scale for employees both with and without vocational education. The disadvantage of the χ2-test as compared to the z-test is that the alternative is always two-sided (H1 : π1 6= π2). To perform onesided tests such as H1 : π1 < π2, check first that p1 − p2 has the sign as suggested by H1, and reject H0 in favour of H1 if α > 2p , where p denotes the p-value of the χ2-independence test. 73 Example. Consider the following table about consumption of alcohol for men and women: Consumption: female (pop.1) male (pop.2) yes 68 86 no 34 26 As is evident from the output on the next slide, we may not reject H0 : π1 = π2 against the two-sided alternative H1 : π1 6= π2 (p = 0.0998), but we may reject H0 against the one-sided alternative H1 : π2 > π1 (p = 0.0499). Using Fishers exact test, the p-value of H0 : π1 = π2 against the two-sided alternative H1 : π1 6= π2 is p = 0.1274, leading to the same conclusion as above. But against the onesided alternative H1 : π2 > π1 it is p = 0.0676, which means that we cannot even reject H0 in a one-sided test at α = 5%. If we have access to software, we prefer using the output from Fishers exact test. It is however too hard to calculate by hand, and in larger tables even software might fail to calculate it. 74 1 Table Analysis Results Table of suku by käyttä suku(suku) käyttä(käyttä) Frequency Row Pct 1 0 Total nainen 68 66.67 34 33.33 102 mies 86 76.79 26 23.21 112 154 60 214 Total Statistics for Table of suku by käyttä Statistic DF Value Prob Chi-Square 1 2.7092 0.0998 Likelihood Ratio Chi-Square 1 2.7111 0.0997 Continuity Adj. Chi-Square 1 2.2309 0.1353 Mantel-Haenszel Chi-Square 1 2.6965 0.1006 -0.1125 Phi Coefficient Contingency Coefficient 0.1118 -0.1125 Cramer's V Fisher's Exact Test Cell (1,1) Frequency (F) 68 Left-sided Pr <= F 0.0676 Right-sided Pr >= F 0.9640 Table Probability (P) 0.0316 Two-sided Pr <= P 0.1274 Sample Size = 214 General test for homogeneity of proportions The χ2-test for equality of proportions may be generalized to more than two independent samples as follows. Assemble the frequencies of success npi and failure ni(1 − pi) for the respective populations i in a (2×c) contingeny table, where ni and pi denote the size and proportion of success in population i, and c denotes the number of populations. The homogeneity of the populations (with respect to the success variable) H 0 : π 1 = π2 = · · · = π c versus H1 : Not all πi, i = 1, . . . , c are equal is then assessed by a χ2 independence test of the population and success variables. 76 Example. An insurance company wants to test whether the proportion of people who submit claims for automobile accidents is about the same for the three age groups 25 and under, over 25 and under 50, and 50 and over: H0 : The age goups are homogeneous wrt. claims, H1 : The age goups are not homogeneous wrt. claims. The (2×3) contingency table for this example is: Claim No claim Total age< 25 40 60 100 25 ≤age< 50 35 65 100 50 ≤age 60 40 100 Total 135 165 300 50 ≤age 45 55 100 Total 135 165 300 The expected frequencies are: Claim No claim Total age< 25 45 55 100 25 ≤age< 50 45 55 100 The χ2 statistic is: (40 − 45)2 (35 − 45)2 (60 − 45)2 χ = + + 45 45 45 2 2 (60 − 55) (66 − 55) (40 − 55)2 + + = 14.14 55 55 55 and the degrees of freedom are (2-1)(3-1)=2. Looking up in a table or calling CHIDIST(14.14;2) in Excel establishes that the p-value in this case is less than 0.1%, so we reject homogeneity with respect to claims in favour of inhomogeneity at all conventional significance levels. 2 77 The median test The hypotheses for the median test are H0 : The c populations have the same median, H1 : Not all c populations have the same median. This is in fact a special case of the homogeneity test for independent samples which we just discussed, with the indicator: value ≷ median as the success variable. Median means here grand median, which is the median of all observations regardless of the population. Example: An economist wants to test the null hypothesis that the median family income in three rural areas are approximately equal. Random samples of family incomes (in $1000/year) in three regions (A/B/C) are given below: A B C 22 31 28 29 37 42 36 26 21 40 25 47 35 20 18 50 43 23 38 27 51 25 41 16 62 57 30 16 32 48 78 Example: (continued.) Arranging all observations in order reveals that the grand median is 31.5. The observed and expected counts for each region are displayed below: ≤median (expected) >median (expected) Total Region A 4 (5) 6 (5) 10 Note that here: Region B 5 (5) 5 (5) 10 eij = Region C 6 (5) 4 (5) 10 Total 15 15 30 fi• f•j n/2 · ni ni = = . n n 2 The χ2 statistic is: 2 2 2 (4 − 5) (5 − 5) (6 − 5) χ2 = + + 5 5 5 (6 − 5)2 (5 − 5)2 (4 − 5)2 + + = 0.8 5 5 5 and the degrees of freedom are (2-1)(3-1)=2. Comparing this value with critical points χ2 α (2) of the χ2-distribution with 2 degrees of freedom, we conclude that there is no evidence to reject the null hypothesis. The p-value of the test is CHIDIST(0.8;2)=0.67. 79 The median test in SAS EG You find the median test in SAS Enterprise Guide under the menu point Analyze/ANOVA/ Nonparametric One-Way ANOVA. In the Task Roles window chose the variable of interest as Dependent variable and the classification variable as Independent Variable. Then check the Median box in the Test Scores pane of the Analysis Window and click Run. SAS makes a finite-sample correction by multiplying χ2 by (N−1)/N , where N denotes the number of observations. Nonparametric One-Way ANOVA Median Scores (Number of Points Above Median) for Variable Income Classified by Variable Region Region N Sum of Scores Expected Under H0 Std Dev Under H0 Mean Score 1 10 6.0 5.0 1.313064 0.60 2 10 5.0 5.0 1.313064 0.50 3 10 4.0 5.0 1.313064 0.40 Average scores were used for ties. Median One-Way Analysis Chi-Square 0.7733 2 DF Asymptotic Pr > Chi-Square 0.6793 Exact 0.8968 Pr >= Chi-Square 80 McNemar test on paired proportions Occasionally there is a need to compare two proportions when the samples drawn are not independent. This is always the case when the same statistical unit generates a pair of observations, for example in studies with “before and after” design. The test requires the same n statistical units to be included in the before- and after measurements (matched pairs). The observed counts may be assembled in a two-way table of the form: Sample I: 1 0 sum Sample II: 1 0 a = np11 b = np10 c = np01 d = np00 a+c b+d sum a+b c+d n The null hypothesis that the probability of success is the same in both samples reduces then to testing b = c = (b + c)/2, that is p10 = p01. The approximate sample statistic including a continuity correction is (|b − c| − 1)2 2 χ = ∼ χ2(1) under H0. b+c 81 Example: (Bland 2000) 1391 schoolchildren were questioned on the prevalence of symptoms of severe cold at the age of 12 and again at the age of 14 years: Severe colds at age 12 Yes No Total Severe colds at age 14 Yes No 212 144 256 707 468 851 Total 356 963 1319 In order to assess whether the increase is significant, calculate the test statistic (|144 − 256| − 1)2 = 30.8025, χ = 144 + 256 2 which is far beyond the critical value χ20.001 (1) = 10.83, so the increase in the prevalence of symptoms of severe cold is statistically significant. Note: SAS skips the continuity correction and calculates the test statistics as |144 − 256|2 χ = = 31.36. 144 + 256 However, you may ask SAS for exact p-values which is 2 even better than the continuity correction and works also in small samples. To apply the test in SAS check Measures in the Table Statistics/Agreement window under Describe/Table Analysis. 82