* Inference for Distributions of Categorical Variables (C26 BVD) * Chi-Squared distributions are appropriate to model sampling distributions for counted data (categorical variable(s)). * If you have only 1 count being compared to 1 expected, you can use a 1 proportion z-interval or test if conditions are met * Conditions to check are: random sampling/assignment, counted data (not percents, etc.), expected cell counts all >5 (not observed, expected!) * If expected cell count is violated, may be able to “fix” by collapsing table * *If you have one set of counts for one variable being compared to an expected distribution, that is Goodness of Fit (OneWay Table) *If you have multiple distributions for a single variable (Think: 1 question asked in survey) = Test of Homogeneity *If you have two variables (Think: 2 survey questions) = Test of Independence * * (observed – expected)2 / expected for each cell * Sum all those and you have your statistic * Degrees of freedom for GOF = categories – 1 * Degrees of freedom for Homogeneity and Independence = (rows – 1) * (columns – 1) does not include table margins * *Old calculators: Put observed in L1, expected in L2 , (o-e)2/ e in L3, then sum L3. That is Chi-squared. Then used Xcdf(statistic, big number, df) to find p-value *Newer calculators: Put observed in L1, expected in L2, then run GOF test under Stat-Test menu. *Always report your test statistic, df, and pvalue, then make conclusion * *Put table in matrix A. Do not include margins. *Run chi-squared test under Stat-Test menu. *Don’t forget to check Matrix B for expected count violations. *Always report Chi-squared statistic, df, and p-value then make conclusion * * GOF: Ho: The distribution for _ is as expected (may need to be more specific). Ha: The distribution for _ is NOT as expected. * Homogeneity: Ho: There is no difference in distribution of __ for the populations/treatments __ Ha: There is a difference in distribution…. * Independence: Ho: There is no association between _ and _. Ha: There IS an association between _ and _. * * If reject the null, you should look at each of the components in the sum for the chi-squared statistic (i.e. each (o-e)2/e) and see which are the largest. * You should comment about which one or two are largest and thus were the largest contributors to rejecting Ho, i.e. were the most different from expected. * * If you need to find an expected cell count “by hand”: * GOF: Find out what proportion of the total count that category is supposed to be (like 30% of M&M’s are supposed to be yellow) and then take that percent of the total to find expected count. Do not round to whole numbers. * Homogeneity/Independence: Find totals/margins for table. Then, find what percent of total table is in the category of interest for whole table. Then, take that percent of the column of interest and that is the expected cell count. *