Lesson 3 - Goodness of Fit

advertisement
Chi-Square Test

Most of the previous techniques presented
so far have been for NUMERICAL data.
So, what do we do if the data is
CATEGORICAL?
 Ex: Information gathered on gender,
political party, college major, etc.

Categorical Variables

Based on observations

Univariate – single categorical variable
 Example:

Sample 100 people & ask if they
agree or disagree with a question.
Bivariate – uses two categorical variables
 Example:
Sample 100 people & ask if they
are male/female and what political
party they support.
One-Way Frequency Table - Univariate
Data
Democrat
Democrat
Democrat
Independent
Republican
Democrat
Republican
Independent
Republican
Republican
Republican
Republican
Vertical OneWay Table
Horizontal One-Way Table
Freq.
Democrat
Freq.
4
Republican
6
Independent
2
Democrat
4
Republican
6
Independent
2
Goodness of Fit Test

2
Used to measure the extent to which
the observed counts differ from the
expected counts.
 K = # categories of a categorical variable
 df = k – 1
2
Observed  Expected 

2
 Test Statistic:   
Expected

How Does a Hypothesis Test for
Chi-Square Work?
The idea of the chi-square goodness-offit test is this: we compare the observed
counts from our sample with the counts
that would be expected is the 𝐻𝑜 was true.
 The more the observed counts differ from
the expected counts, the more evidence
we have AGAINST the null hypothesis.

Assumptions
1. Observed Values are based on random
Samples
2. Sample size is large – each cell count is
at least 5. (All cells ≥ 𝟓)
Hypotheses

Ho: State each proportion’s hypothesized
value.

HA: At least 1 of the proportions differ
from the hypothesized value.
It uses the Chi-Square Chart
Positively Skewed
 Uses d.f.
 On calculator!

Is there a preference in type of car?
P1=proportion who prefer a SUV
Freq.
SUV
Expected
27
Truck
25
Sedan
29
Sports
19
P2=proportion who prefer a truck
p3=proportion who prefer a sedan
P4=proportion who prefer a sports car
 H o : p1  p2  p3  p4

 H A : at least 1 prop. is different
(OBSERVED  PREDICTED ) 2
 
PREDICTED
2
2
2
2




27  25
25  25
29  25
19  25
2
 



25
25
25
25
 2  2.24
2
Assumptions: Random Samples
& all cell counts are at least 5.
Use a Chi-Square goodness of
fit Test
df = 3
P  Val   2 cdf 2.24, ,3  0.524
A researcher believes that the number of homicides crimes in CA by
season is uniformly distributed. To test this claim, you randomly select
1200 homicides from a recent year and record the season when each
happened.
Season
Freq
Spring
312
Summer
298
Fall
297
Winter
293
Results from a previous survey asking people who go to movies at least
once a month are shown in the table below. To determine whether this
distribution is still the same, you randomly select 1000 people who go to
movies at least once a month and record the age of each. Are the
distributions the same?
Age
Survey
Freq
2 - 17
26.70%
240
18 - 24
19.80%
214
25 - 39
19.70%
183
40 - 49
14%
156
50+
19.80%
207
What’s your favorite flavor of ice-cream?
Observed
A
40%
45
B
30%
52
C
20%
39
D
5%
8
F
5%
6
Homework

Worksheet
Download