Chi-Square Test Most of the previous techniques presented so far have been for NUMERICAL data. So, what do we do if the data is CATEGORICAL? Ex: Information gathered on gender, political party, college major, etc. Categorical Variables Based on observations Univariate – single categorical variable Example: Sample 100 people & ask if they agree or disagree with a question. Bivariate – uses two categorical variables Example: Sample 100 people & ask if they are male/female and what political party they support. One-Way Frequency Table - Univariate Data Democrat Democrat Democrat Independent Republican Democrat Republican Independent Republican Republican Republican Republican Vertical OneWay Table Horizontal One-Way Table Freq. Democrat Freq. 4 Republican 6 Independent 2 Democrat 4 Republican 6 Independent 2 Goodness of Fit Test 2 Used to measure the extent to which the observed counts differ from the expected counts. K = # categories of a categorical variable df = k – 1 2 Observed Expected 2 Test Statistic: Expected How Does a Hypothesis Test for Chi-Square Work? The idea of the chi-square goodness-offit test is this: we compare the observed counts from our sample with the counts that would be expected is the 𝐻𝑜 was true. The more the observed counts differ from the expected counts, the more evidence we have AGAINST the null hypothesis. Assumptions 1. Observed Values are based on random Samples 2. Sample size is large – each cell count is at least 5. (All cells ≥ 𝟓) Hypotheses Ho: State each proportion’s hypothesized value. HA: At least 1 of the proportions differ from the hypothesized value. It uses the Chi-Square Chart Positively Skewed Uses d.f. On calculator! Is there a preference in type of car? P1=proportion who prefer a SUV Freq. SUV Expected 27 Truck 25 Sedan 29 Sports 19 P2=proportion who prefer a truck p3=proportion who prefer a sedan P4=proportion who prefer a sports car H o : p1 p2 p3 p4 H A : at least 1 prop. is different (OBSERVED PREDICTED ) 2 PREDICTED 2 2 2 2 27 25 25 25 29 25 19 25 2 25 25 25 25 2 2.24 2 Assumptions: Random Samples & all cell counts are at least 5. Use a Chi-Square goodness of fit Test df = 3 P Val 2 cdf 2.24, ,3 0.524 A researcher believes that the number of homicides crimes in CA by season is uniformly distributed. To test this claim, you randomly select 1200 homicides from a recent year and record the season when each happened. Season Freq Spring 312 Summer 298 Fall 297 Winter 293 Results from a previous survey asking people who go to movies at least once a month are shown in the table below. To determine whether this distribution is still the same, you randomly select 1000 people who go to movies at least once a month and record the age of each. Are the distributions the same? Age Survey Freq 2 - 17 26.70% 240 18 - 24 19.80% 214 25 - 39 19.70% 183 40 - 49 14% 156 50+ 19.80% 207 What’s your favorite flavor of ice-cream? Observed A 40% 45 B 30% 52 C 20% 39 D 5% 8 F 5% 6 Homework Worksheet