11.1 Tests for GOF USE

advertisement
Chapter 11
Inference for Tables:
Chi-Square Procedures
11.1
Target Goal: I can compute expected
counts, conditional distributions, and
contributions to the chi-square statistic.
h.w: pg. 621: 1, 3, 5, 9, 11
Test for Goodness of Fit
To analyze categorical data, we construct
two-way tables and examine the counts or
percents of the explanatory and response
variables.
 Count and record M&M colors per bag.
 Expected count:

M&Ms Color Distribution %
according to their website
Brown
Yellow
Red
Blue
Orange
Green
Plain
13
14
13
24
20
16
Peanut
12
15
12
23
23
15
Peanut
10
Butter/
Almond
20
10
20
20
20
We want to compare the observed counts
to the expected counts.
 The null hypothesis is that there is no
difference between the observed and
expected counts.
 The alternative hypothesis is that there is
a difference between the observed and
expected counts


Simulate count of M&M’s bag
or use own M&M’s bag
Label:
 1-13
 14-27
 28-40
 41-64
 65-84
 85-00
Brown
Yellow
Red
Blue
Orange
Green
Math:Prb:Randint(0,99,50) sto in L1
 Sort in ascending and tally.

Chi-square statistic
 
2
O  E 
2
E
Go to Blank student notes.

It measures how well the observed counts
fit the expected counts, assuming that the
null hypothesis is true.
The distribution of the chi-square statistic is called
the chi-square distribution, X 2.
This distribution is a density curve.
The total area under the curve is 1.
 The curve begins at zero on the horizontal
axis and is skewed right.
 As the degrees of freedom increase, the
shape of the curve becomes more
symmetric.

Pg. 703
“Goodness of Fit Test.”

Using the M&M Minis® chi-square
statistic, find the probability of obtaining a
X2 value at least this extreme assuming
the null hypothesis is true.

Use your Chi-square statistic and df = 6-1 = 5

P-value = X2 cdf(lb,up,df)
CONDITIONS for Individual
Expected Counts:
The Goodness of Fit Test may be used
when all expected counts are at least 1
and no more than 20% of the expected
counts are less than 5.
 Following the Goodness of Fit Test, check
to see which component made the
greatest contribution to the chi-square
statistic to see where the biggest changes
occurred.

Conditions for Chi-Square Test
Random: The data come from a random
sample or a randomized experiment.
 Large sample size: All expected counts
are at least 5.
 Independent: Individual observations
are independent. When sampling without
replacement, check the 10% condition.

Ex: The Graying of America

It is believed that with better medicine
and healthier lifestyles, people are living
longer and consequently a larger
percentage of the population is of
retirement age. Compare distribution of
1980 population to 1996 population.
Step 1: State - Identify the population of interest and the parameter
you want to draw a conclusion about.
State the hypothesis in words and symbols.
We want determine if the distribution of
age groups in the United States in 1996
has changed significantly from the 1980
distribution.
 Ho: the age group dist. in 1996 is the
same as the 1980 dist.
 Ha: the age group dist. in 1996 is different
from the 1980 dist.

Or,
State the hypothesis as proportions.
Ho: p0-24 = 0.4139, p25-44 = 0.2768, p45-64
= 0.1964, p65+ = 0.1128.
 Ha: at least one of the proportions differs
from the stated values.

Goal of “Goodness of Fit Tests”

The more the observed counts differ from
the expected counts, the more the
evidence we have to reject Ho and thus
conclude that the population dist. in 1996
is significantly different from 1980.
Always a good idea to plot the data.
Step 2: Plan - Choose the appropriate inference
procedure. Verify the conditions for using the selected
procedure.
If the conditions are met, conduct a chisquare goodness of fit test.
 Random: We must assume the two
distributions of age groups come from a
randomized experiment.

Calculate expected counts in each age category
and verify that they are large enough (see
conditions). Yes, all > 5; Proceed with
Chi – square calculations
Independent:
 We clearly have two independent age
groups, one from 1980 and one from
1996. We must check the 10% condition.
 There are at least 10(286,598) U.S
citizens in 1980 and at least 10(500) U.S
citizens in 1996.

Step 3: Do - If the conditions are met, carry out
the inference procedure.

Calculate the x 2 statistic to measure how
well the observed counts (O) differ form
the expected counts (E) under Ho.
2
O  E

2
 

E
A large value of x 2 shows more evidence
against Ho and also results in a small Pvalue.
Calculate P-value
df: use n-1 degrees of freedom.
 This is because X 2 the family of curves is
used to assess evidence against Ho.
 Since we are using percentages, 3 of the 4
percentages are allowed to vary, the 4th is
not.
 Df = 4-1 = 3,

Table C for a P-value of 0.05, critical value
is 7.81.
 Calc: 2nd VARS: X 2 cdf(8.2275,E99,3)
.0415

Step 4. Conclude - Interpret the results in
the context of the problem.

Since our value of 8.2275 is more
extreme than 7.81, we reject Ho and
conclude that the population dist. in 1996
is significantly different from the 1980
dist. at the 5% level.
To be cont.
Download