OPRE504 Study Guide Chapter 13 Chi-Square Tests for Counts Common Procedures for Conducting Chi-Square Tests 1. State Hypotheses 2. Find Expected Values for Each Cell 3. Compute Squared Residuals for Each Cell 4. Standardize Squared Residuals Using Expected Value 5. Sum up Standardized Residuals to Derive Chi-Squared Statistic (π 2 ) π 2 =∑πππ ππππππ (πππ −πΈπ₯π)2 πΈπ₯π 6. Find the Degree of Freedom df = the number of cells – 1 for Goodness of Fit tests; df = (# of rows – 1) x (# of columns – 1) for Homogeneity and Independence Tess. 7. Determine the Critical Value of π 2 using X-Table based on df and the alpha level. 8. Decision: Reject H0 if π 2 > critical value; fail to reject if otherwise. I Goodness of Fit Test Purpose: Compare the distribution of a single variable to an expected distribution. A test of how the distribution of counts in one categorical variable matches the distribution predicted by a model. Typical Research Questions: Whether a particular day of the week is more likely to show a gain in DJIA than any other; whether a die is fair (some faces are more likely to appear); whether M&M has more candies of a particular color than advertised; whether the likelihood of drawing a number is equal in a lottery; whether there is employment discrimination based on ethnics, etc. Hypotheses: H0: The distributions of observed counts and expected counts across all categories of a single variable are the same (There is no particular pattern) Ha: The distributions of observed counts and expected counts across all categories of one single variable are different (There is a pattern of distribution) Degree of Freedom: Df = number of categories in the category variable - 1 Chaodong Han OPRE504 Data Analysis and Decisions Class Handout Page 1 of 8 Q13.1 [Sharpe 2011, Chapter 13, Exercise 7, p.424] Maryland has a Pick-3 Lottery where 3 random digits are drawn each day. A fair game depends on every value (0-9) being equally likely at each of the three positions. To investigate the randomness, we collect data of winning digits over a 32-week period as shown in the following table. Is each of the digits from 0 to 9 equally likely to be drawn? Group 0 1 2 3 4 5 6 7 8 9 1. Observed Count 62 55 66 64 75 57 71 74 69 61 Observed (%) 9.480% 8.410% 10.092% 9.786% 11.468% 8.716% 10.856% 11.315% 10.550% 9.327% State Hypotheses: H0: Ha: 2. Compute Expected Count and Standardized Squared Residuals (Step 2-4) By random, for a total of 654 digits, if each category of digit (0-9) has an equal chance, the expected count should be 10% x 654 = 65.4. Group 0 1 2 3 4 5 6 7 8 9 Observed Count 62 55 66 64 75 57 71 74 69 61 Expected Residual Observed % Count (Obs-Exp) 9.480% 8.410% 10.092% 9.786% 11.468% 8.716% 10.856% 11.315% 10.550% 9.327% 3. Chi-square Statistic π2 = 4. Determine Critical Value of π 2 Chaodong Han OPRE504 Standardized Squared Squared Residuals Residuals Data Analysis and Decisions Class Handout Page 2 of 8 Df = 10 (10 groups) – 1 = 9; 5% significance level: π 2 * = CHIINV(0.05, 9) = 5. Decision More Exercises: Guided Example- Stock Market Patterns (pp.403-405); Chapter 13: Exercises 2, 3, 4, 5, 8, 38. II Chi-Square Test of Homogeneity Purpose: Compare the distribution of counts for two or more groups on the same categorical variable. This categorical variable has multiple categories. Typical Research Questions: Whether the responses to one a survey questions vary across different groups, for example, “whether the distribution of responses about the importance of looking good is the same across five countries (China, France, India, U.K. and U.S.)? Hypotheses: H0: The distribution of responses is homogeneous across all groups Ha: The distribution of responses is not homogeneous across all groups Degree of Freedom: Df = (number of response categories – 1) x (number of groups – 1) Q13.2 [Sharpe 2011, Chapter13, Exercise 14, pp.425-6] A European manufacturer of automobiles claims that its cars are preferred by the younger generation and would like to target university students in its next ad campaign. Suppose we test its claim with our own survey using a random survey of cars parked in the student and staff parking lots respectively at a large university. The car brands are reported by country of origin in the following table. Are there differences in the national origin of cars driven by students and staff? American Brand European Brand Asian Brand 1. Student 107 33 55 Staff 105 12 47 State Hypotheses H0: Ha: Chaodong Han OPRE504 Data Analysis and Decisions Class Handout Page 3 of 8 2. Compute expected value Observed Brand Distribution OBSERVED Student Staff Total American Brand 107 105 212 American Brand European Brand 33 12 45 European Brand Asian Brand 55 47 102 Asian Brand Total 195 164 359 3. 100.00% EXPECTED Student Staff Expected Total Compute Standardized Squared differences (obs-exp) Student Staff (obs-exp)2 Student Staff (obs-exp)2/exp Student Staff American Brand European Brand Asian Brand 4. Chi-square Statistic π2 = 5. Degree of Freedom Df = (no. of car brands – 1) x (no. of driver groups – 1) = 6. Critical Value of Chi-square π 2 * for alpha level = 0.05 and df= π 2 * = CHIINV 7. Decision More exercises: Chapter 13, Guided Example – Attitudes on Appearance (pp.410-411) Chapter 13: Exercises 13, 14, 25, 26, 30, 37 and 41. Chaodong Han OPRE504 Data Analysis and Decisions Class Handout Page 4 of 8 III Chi-Square Test of Independence Purpose: When we have a two-variable contingency table for one sample and each variable has an exhaustive list of categories, we would like to know whether one variable is independent of the other variable. It uses the same calculation as a test of homogeneity. Hypotheses: H0: Two variables are independent of each other Ha: Two variables are not independent of each other Degree of freedom: According to the contingency table, df = (no. of rows – 1) x (no. of columns – 1) Q13.3 [Sharpe 2011, Chapter 13, Exercise 10, p.424] The following table shows the rank attained by male and female officers in the New York City Police Department (NYPD). Do these data indicate that men and women are equitably represented at all levels of the department? Rank Gender Officer Detective Sergeant Lieutenant Captain Higher Ranks Male 21,900 4,058 3,898 1,333 359 218 Female 4,281 806 415 89 12 10 1. Hypotheses 2. Expected Counts (1) Compute male and female proportions and total by category Officer Detective Sergeant Lieutenant Captain Higher Ranks total Male 21,900 4,058 3,898 1,333 359 218 31,766 Female 4,281 806 415 89 12 10 5,613 Total 26,181 4,864 4,313 1,422 371 228 37,379 100% (2) Compute expected male and female counts by multiplying overall male and female proportions with the total for each category. For example, male officers = 84.98% x 26,181 = 22,249.5; female captain = 15.02% x 371 = 55.7 Chaodong Han OPRE504 Data Analysis and Decisions Class Handout Page 5 of 8 Male Officer Detective Sergeant Lieutenant Captain Higher Ranks Total 3. Female 31,766 5,613 Compute Squared Residuals for All Cells Male Female Officer Detective Sergeant Lieutenant Captain Higher Ranks Male Female Officer Detective Sergeant Lieutenant Captain Higher Ranks 4. Standardize Squared Residuals [(obs-exp)2/exp] Standardized Squares Male Female Officer Detective Sergeant Lieutenant Captain Higher Ranks 5. Chi-Squared statistic (π 2 ) π2 = 6. Determine Critical π 2 * Chaodong Han OPRE504 Data Analysis and Decisions Class Handout Page 6 of 8 Df = (number of rows – 1) x (number of columns – 1) = 5% alpha level, π 2 * = CHIINV ( )= 7. Decision More Exercises: Chapter 13: Guided Example – Personal Appearance and Age (pp.415 – 416) Chapter 13: Exercises 10, 11, 15, 16, 27, 28, 32, 35, 36, 39, 40, 42, 43, and 44 IV Compare Two Proportions and Confidence Intervals for the Difference of Two Proportions Q13.4 [Sharpe 2011, p.412] A research conducted by the U.S. Department of Commerce surveyed 24-year-old Americans to see if they had finished high school and reports: Men Women HS Diploma No HS Diploma Total Total 10,579 11,169 21,748 1,881 1,509 3,390 12,460 12,678 25,138 a): Test whether the distribution of high school diplomas is different for men and women. Step 1: Overall percentage for completing high school = Overall percentage for not completing high school = Step 2: Under homogeneity assumption, we would expect the same percentages to occur in both Mean and Women groups. Men with HS diploma: Women with HS diploma: EXPECTED Men 12,460 x 12,678 x %= %= Women HS Diploma No HS Diploma Total 12,460 12,678 Chaodong Han OPRE504 Data Analysis and Decisions Class Handout Page 7 of 8 Step 3: Calculate Standardized Difference expected HS Diploma No HS Diploma Men Women 10,779.70 10,968.30 1,680.30 1,709.70 12,460 12,678 obs-exp obs-exp (obs-exp)^2 (obs-exp)^2 (obsexp)^2/exp (obsexp)^2/exp Men Women Men Women Men Women Sum: π2 = Chi-square statistic: Step 4: Degree of Freedom df = Step 5: Critical Value of Chi-square given df= and alpha = 0.05, π 2 * =CHIINV Step 6: Decision: Note: A Chi-square test on a 2x2 contingency table with df=(2-1) x (2-1) =1 is equivalent to testing whether two proportions (for men and women in the above question) are equal. Confidence Interval for the Difference of Two Proportions: πΜ1 is the proportion for group 1, πΜ1 = 1- πΜ1 ; πΜ2 is the proportion for group 2, πΜ2 = 1-πΜ 2 CI : (πΜ1 - πΜ2 ) ± z* SE(πΜ1- πΜ 2 ) where SE(πΜ1- πΜ 2 ) = √ πΜ1 πΜ1 π1 + πΜ2 πΜ2 π2 Q13.4 b) Are women more likely to complete high school? Test confidence interval at 95%. Proportion of women who have high school diplomas: πΜ1= πΜ1 = 1- πΜ1 = π1 = Proportion of men who have high school diplomas: πΜ1= πΜ1 = 1- πΜ1 = π2 = πΜ1 πΜ1 SE(πΜ1- πΜ2 ) = √ π1 + πΜ2 πΜ2 π2 = CI = (πΜ1 - πΜ 2 ) ± z* SE(πΜ1 - πΜ2 ) = Conclusion: More exercises: Chapter 13: Exercises 17, 18, 19, 20, 21 and 22. Chaodong Han OPRE504 Data Analysis and Decisions Class Handout Page 8 of 8