Lesson 3 - Goodness of Fit

Chi-Square Test  Most of the previous techniques presented so far have been for NUMERICAL data. So, what do we do if the data is CATEGORICAL?  Ex: Information gathered on gender, political party, college major, etc.  Categorical Variables  Based on observations  Univariate – single categorical variable  Example:  Sample 100 people & ask if they agree or disagree with a question. Bivariate – uses two categorical variables  Example: Sample 100 people & ask if they are male/female and what political party they support. One-Way Frequency Table - Univariate Data Democrat Democrat Democrat Independent Republican Democrat Republican Independent Republican Republican Republican Republican Vertical OneWay Table Horizontal One-Way Table Freq. Democrat Freq. 4 Republican 6 Independent 2 Democrat 4 Republican 6 Independent 2 Goodness of Fit Test  2 Used to measure the extent to which the observed counts differ from the expected counts.  K = # categories of a categorical variable  df = k – 1 2 Observed  Expected   2  Test Statistic:    Expected  How Does a Hypothesis Test for Chi-Square Work? The idea of the chi-square goodness-offit test is this: we compare the observed counts from our sample with the counts that would be expected is the 𝐻𝑜 was true.  The more the observed counts differ from the expected counts, the more evidence we have AGAINST the null hypothesis.  Assumptions 1. Observed Values are based on random Samples 2. Sample size is large – each cell count is at least 5. (All cells ≥ 𝟓) Hypotheses  Ho: State each proportion’s hypothesized value.  HA: At least 1 of the proportions differ from the hypothesized value. It uses the Chi-Square Chart Positively Skewed  Uses d.f.  On calculator!  Is there a preference in type of car? P1=proportion who prefer a SUV Freq. SUV Expected 27 Truck 25 Sedan 29 Sports 19 P2=proportion who prefer a truck p3=proportion who prefer a sedan P4=proportion who prefer a sports car  H o : p1  p2  p3  p4   H A : at least 1 prop. is different (OBSERVED  PREDICTED ) 2   PREDICTED 2 2 2 2     27  25 25  25 29  25 19  25 2      25 25 25 25  2  2.24 2 Assumptions: Random Samples & all cell counts are at least 5. Use a Chi-Square goodness of fit Test df = 3 P  Val   2 cdf 2.24, ,3  0.524 A researcher believes that the number of homicides crimes in CA by season is uniformly distributed. To test this claim, you randomly select 1200 homicides from a recent year and record the season when each happened. Season Freq Spring 312 Summer 298 Fall 297 Winter 293 Results from a previous survey asking people who go to movies at least once a month are shown in the table below. To determine whether this distribution is still the same, you randomly select 1000 people who go to movies at least once a month and record the age of each. Are the distributions the same? Age Survey Freq 2 - 17 26.70% 240 18 - 24 19.80% 214 25 - 39 19.70% 183 40 - 49 14% 156 50+ 19.80% 207 What’s your favorite flavor of ice-cream? Observed A 40% 45 B 30% 52 C 20% 39 D 5% 8 F 5% 6 Homework  Worksheet

Lesson 3 - Goodness of Fit

Related documents

Products

Support

Lesson 3 - Goodness of Fit

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib