Statistics Chapter 13: Categorical Data Analysis Where We’ve Been Presented methods for making inferences about the population proportion associated with a two-level qualitative variable (i.e., a binomial variable) Presented methods for making inferences about the difference between two binomial proportions McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis 2 Where We’re Going Discuss qualitative (categorical) data with more than two outcomes Present a chi-square hypothesis test for comparing the category proportions associated with a single qualitative variable – called a one-way analysis Present a chi-square hypothesis test relating two qualitative variables – called a two-way analysis McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis 3 13.1: Categorical Data and the Multinomial Experiment Properties of the Multinomial Experiment 1. The experiment consists of n identical trials. 2. There are k possible outcomes (called classes, categories or cells) to each trial. 3. The probabilities of the k outcomes, denoted by p1, p2, …, pk, where p1+ p2+ … + pk = 1, remain the same from trial to trial. 4. The trials are independent. 5. The random variables of interest are the cell counts n1, n2, …, nk of the number of observations that fall into each of the k categories. McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis 4 13.2: Testing Categorical Probabilities: One-Way Table Suppose three candidates are running for office, and 150 voters are asked their preferences. Candidate 1 is the choice of 61 voters. Candidate 2 is the choice of 53 voters. Candidate 3 is the choice of 36 voters. Do these data suggest the population may prefer one candidate over the others? McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis 5 13.2: Testing Categorical Probabilities: One-Way Table Candidate 1 is the choice of 61 voters. Candidate 2 is the choice of 53 voters. H 0 : p1 p2 p3 1 3 No preference H a : At least one of the proprtions exceeds 1 3 E (Number of votes for each candidate| H 0 ) 150 50 3 E1 E2 E3 50 A chi-square ( 2 ) test is used to test H 0 . Candidate 3 is the choice of 36 voters. n =150 [n1 E1 ]2 [n2 E2 ]2 [n3 E3 ]2 E1 E2 E3 2 [61 50]2 [53 50]2 [36 50]2 6.52 50 50 50 2 .05, df 2 5.99147 2 McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis 6 13.2: Testing Categorical Probabilities: One-Way Table Reject the null hypothesis McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis 7 13.2: Testing Categorical Probabilities: One-Way Table Test of a Hypothesis about Multinomial Probabilities: One-Way Table H0: p1 = p1,0, p2 = p2,0, … , pk = pk,0 where p1,0, p2,0, …, pk,0 represent the hypothesized values of the multinomial probabilities Ha: At least one of the multinomial probabilities does not equal its hypothesized value 2 [ n E ] i Test statistic: 2 i Ei Rejection region: 2 2 , with (k-1) df. where Ei = np1,0, is the expected cell count given the null hypothesis. McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis 8 13.2: Testing Categorical Probabilities: One-Way Table Conditions Required for a Valid One-Way Table 1. 2. 2 Test: A multinomial experiment has been conducted. The sample size n will be large enough so that, for every cell, the expected cell count E(ni) will be equal to 5 or more. McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis 9 13.2: Testing Categorical Probabilities: One-Way Table Example 13.2: Distribution of Opinions About Marijuana Possession Before Television Series has Aired Legalization Decriminalization Existing Law No Opinion 7% 18% 65% 10% Table 13.2: Distribution of Opinions About Marijuana Possession After Television Series has Aired Legalization Decriminalization Existing Law No Opinion 39 99 336 26 McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis 10 13.2: Testing Categorical Probabilities: One-Way Table McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis 11 13.2: Testing Categorical Probabilities: One-Way Table Expected Distribution of 500 Opinions About Marijuana Possession After Television Series has Aired Legalization Decriminalization Existing Law No Opinion 500(.07)=35 500(.18)=90 500(.65)=325 500(.10)=50 H 0 : p1 .07, p2 .18, p3 .65, p4 .10 H a : At least one of the proportions differs from its null hypothesis value. [ni Ei ]2 Test statistic: Ei 2 Rejection region: 2 2 .01,df 3 11.3449 McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis 12 13.2: Testing Categorical Probabilities: One-Way Table Expected Distribution of 500 Opinions About Marijuana Possession After Television Series has Aired Legalization Decriminalization Existing Law No Opinion 500(.07)=35 500(.18)=90 500(.65)=325 500(.10)=50 Rejection region: 2 2 .01,df 3 11.3449 (39 35) 2 (99 90) 2 (336 325) 2 (26 50) 2 35 90 325 50 2 13.249 2 McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis 13 13.2: Testing Categorical Probabilities: One-Way Table Expected Distribution of 500 Opinions About Marijuana Possession After Television Series has Aired Legalization Decriminalization Existing Law No Opinion 500(.07)=35 500(.18)=90 500(.65)=325 500(.10)=50 Rejection region: 2 2 .01,df 3 11.3449 (39 35) 2 (99 90) 2 (336 325) 2 (26 50) 2 35 90 325 50 2 13.249 Reject the null hypothesis 2 McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis 14 13.2: Testing Categorical Probabilities: One-Way Table Inferences can be made on any single proportion as well: 95% confidence interval on the proportion of citizens in the viewing area with no opinion is pˆ 4 1.96 pˆ 4 n4 26 .052 n 500 pˆ 4 (1 pˆ 4 ) .052(.948) and pˆ 4 .0099 n 500 pˆ 4 1.96 pˆ 4 .052 1.96(.0099) .052 .019 where pˆ 4 McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis 15 13.3: Testing Categorical Probabilities: Two-Way Table Chi-square analysis can also be used to investigate studies based on qualitative factors. Does having one characteristic make it more/less likely to exhibit another characteristic? McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis 16 13.3: Testing Categorical Probabilities: Two-Way Table The columns are divided according to the subcategories for one qualitative variable and the rows for the other qualitative variable. Column Row Column Totals 1 2 1 n11 n12 2 n21 n22 r nr1 nr2 C1 C1 McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis c Row Totals n1c R1 n2c R2 nrc Rr C1 n 17 13.3: Testing Categorical Probabilities: Two-Way Table General Form of a Two-way (Contigency) Table Analysis: A Test for Independence H 0 : The two classifications are independent H a : The two classifications are dependent Test statistic: 2 where Eij [nij Eij ]2 Eij Ri C j n and Ri total for row i, C j total for row j, n sample size Rejection region: 2 2 , df = ( r 1)(c 1) McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis 18 13.3: Testing Categorical Probabilities: Two-Way Table The results of a survey regarding marital status and religious affiliation are reported below (Example 13.3 in the text). Religious Affiliation Marital Status A B C D None Totals Divorced 39 19 12 28 18 116 Married, never divorced 172 61 44 70 37 384 Totals 211 80 56 98 55 500 H0: Marital status and religious affiliation are independent Ha: Marital status and religious affiliation are dependent McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis 19 13.3: Testing Categorical Probabilities: Two-Way Table The expected frequencies (see Figure 13.4) are included below: Religious Affiliation Marital Status A B C D None Totals Divorced 39 (48.95) 19 (18.56) 12 (12.99) 28 (27.74) 18 (12.76) 116 Married, never divorced 172 (162.05) 61 (61.44) 44 (43.01) 70 (75.26) 37 (42.24) 384 211 80 56 98 55 500 Totals The chi-square value computed with SAS is 7.1355, with p-value = .1289. Even at the = .10 level, we cannot reject the null hypothesis. McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis 20 13.3: Testing Categorical Probabilities: Two-Way Table McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis 21 13.4: A Word of Caution About Chi-Square Tests Relative ease of use Misuse and misinterpretation Widespread applications McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis 22 13.4: A Word of Caution About Chi-Square Tests Be sure Sample is from the correct population Expected counts are ≥ 5 Avoid Type II errors by not accepting non-rejected null hypotheses Avoid mistaking dependence with causation To produce (possibly) valid 2 results McClave, Statistics, 11th ed. Chapter 13: Categorical Data Analysis 23