CHI-SQUARE TEST Prepared by: Francis Joseph Campeña Goodnes-of-Fit Test Test for independence Test for Homogeneity/Test for Several Proportions Goodnes-of-Fit Test A test to determine if a population has a specified theoretical distribution. Test Statistic 𝑘 𝜒2 = 𝑖=1 𝑜𝑖 − 𝑒 𝑒𝑖 2 , 𝑣 =𝑘−1 Region: 𝜒 2 > 𝜒 2 𝛼 where the expected frequencies is at least 5. Critical Example Suppose that a die is tossed 120 times and each outcome is recorded. We want to test if the die is balanced. FACE 1 2 3 4 5 6 OBSERVED 20 22 17 18 19 24 EXPECTED 20 20 20 20 20 20 EXAMPLE A machine is supposed to mix peanuts, hazelnuts, cashews and pecans in the ratio 5:2:2:1. A can containing 500 of these mixed nuts was found to have 269 peanuts, 112 hazelnuts, 74 cashews, and 45 pecans. At a 0.05 level of significance, test the hypothesis that the machine is mixing the nuts in the ratio 5:2:2:1. Test for Independence The Chi-Square test can be used to test the hypothesis of independence of two variables of classification. The observed frequencies of this classification is called a contingency table. The row and column totals are called marginal frequencies. Test for Independence 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑘 𝜒2 = 𝑖=1 Critical 𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙 × 𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞. = 𝑜𝑖 − 𝑒 𝑒𝑖 2 , 𝑣 = 𝑟−1 𝑐−1 Region: 𝜒 2 > 𝜒 2 𝛼 Example Suppose that we wish to determine whether the opinions of the voting residents of the state of Illinois concerning a new tax reform are independent of their levels of income. INCOME LEVELS TAX REFORM Low Medium High TOTAL For 182 213 203 598 Against 154 138 110 402 TOTAL 336 351 313 1000 Ho: The opinion of the voting residents regarding the new tax reform is independent of their level of income. Ha: The opinion of the voting residents regarding the new tax reform is dependent of their level of income. 2. 𝛼 = 0.05 3. 𝜒 2 > 𝜒 2 0.05 ⇒ 𝜒 2 > 5.991 1. 4. 2 Test Statistic: 𝜒 = 110−125.8 2 125.8 5. 182−200.9 2 200.9 + 213−209.9 2 209.9 +⋯+ = 7.85 Reject Ho: The opinion of the voting residents regarding the new tax reform is independent of their level of income. Test for Homogeneity We can also use the chi-square test to determine the categories of the column variable are homogeneous with respect to each row. Example Suppose we select voters from100 LP, 150 UNA and 150 Nationalist Party and record whether they are for a proposed abortion law, against it, or undecided. Political Affiliation Abortion Law LP UNA NP Total For 82 70 62 214 Against 93 62 67 222 Undecided 25 18 21 64 TOTAL 200 150 150 500 Political Affiliation Abortion Law LP UNA NP Total For 82 (85.6) 70 (64.2) 62 (64.2) 214 Against 93 (88.8) 62 (66.6) 67 (66.6) 222 Undecided 25 (25.6) 18 (19.2) 21(19.2) 64 TOTAL 200 150 150 500 Test for Several Proportions. 1. 𝐻0 : 𝜋1 = 𝜋2 = ⋯ = 𝜋𝑘 Ha: At least one pair of population proportion are not equal. 3. 𝐿𝑒𝑣𝑒𝑙 𝑜𝑓 𝑠𝑖𝑔𝑛𝑖𝑓𝑖𝑐𝑎𝑛𝑐𝑒: 𝛼 Critical Region: 𝜒 2 > 𝜒 2 𝛼 4. 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑓𝑟𝑒𝑞. = 2. 𝑘 𝜒2 = 𝑖=1 𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙 × 𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙 𝑜𝑖 − 𝑒 𝑒𝑖 2 , 𝑣 = 𝑘−1 EXAMPLE In a shop study, a set of data was collected to determine whether or not the proportion of defectives produced was the same for workers on the day, evening and night shifts. Use 0.025 level of significance. Shift: DAY EVENING NIGHT Defectives 45 55 70 Non-defectives 905 890 870 Ho: 𝜋1 = 𝜋2 = 𝜋3 Ha: At least one pair of work shifts have different amount defectives produced. 2. 𝛼 = 0.025 3. C.R./D.R. : 𝜒 2 > 𝜒 2 0.025 ⇒ 𝜒 2 >7.378 1. 4. Test Statistic: : ⋯+ 5. 870−883.7 2 883.7 𝜒2 = 45−57 2 57 + 55−56.7 2 56.7 = 6.29 Decision: We do not reject the null hypothesis. + EXERCISES