Chi-Square Test Section 12.1 Categorical Variables Based on observations Univariate – single categorical variable Example: Sample 100 people & ask if they agree or disagree with a question. Bivariate – uses two categorical variables Example: Sample 100 people & ask if they are male/female and what political party they support. One-Way Frequency Table - univariate Data Democrat Democrat Democrat Independent Republican Democrat Republican Independent Republican Republican Republican Republican Vertical OneWay Table Horizontal One-Way Table Freq. Democrat Freq. 4 Republican 6 Independent 2 Democrat 4 Republican 6 Independent 2 Goodness of Fit Test 2 Used to measure the extent to which the observed counts differ from the expected counts. K = # categories of a catagorical variable Df = k – 1 2 Observed Expected 2 Test Statistic: Expected Assumptions Observed Values are based on random Samples Sample size is large – each cell count is at least 5. Hypotheses Ho: State each proportion’s hypothesized value. HA: At least 1 of the proportions differ from the hypothesized value. It uses the Chi-Square Chart Positively Skewed Uses d.f. On calculator! Is there a preference in type of car? P1=proportion who prefer a SUV Freq. SUV Expected 27 Truck 25 Sedan 29 Sports 19 P2=proportion who prefer a truck p3=proportion who prefer a sedan P4=proportion who prefer a sports car H o : p1 p2 p3 p4 H A : at least 1 prop. is different 2 Assumptions: Random Samples & all cell counts are at least 5. Use a Chi-Square goodness of fit Test Observed - Predicted Predicted 27 25 2 25 2 2.24 2 25 25 25 2 29 25 P-val = xcdf(2.24,∞, 3)=0.52 25 2 19 25 25 2 A researcher believes that the number of homicides crimes in CA by season is uniformly distributed. To test this claim, you randomly select 1200 homicides from a recent year and record the season when each happened. Season Freq Spring 312 Summer 299 Fall 297 Winter 293 Results from a previous survey asking people who go to movies at least once a month are shown in the table below. To determine whether this distribution is still the same, you randomly select 1000 people who go to movies at least once a month and record the age of each. Are the distributions the same? Age Survey Freq 2 - 17 26.70% 240 18 - 24 19.80% 214 25 - 39 19.70% 183 40 - 49 14% 156 50+ 19.80% 207 Homework Worksheet