chi-square test for homogeneity

advertisement
Chi-Square Procedures
Chi-Square Test for Goodness of
Fit, Independence of Variables,
and Homogeneity of Proportions
The chi-square Goodness of Fit Test:
you have only one set of data on a single
characteristic, and you want to know if it
matches an expected distribution based on
the laws of probability(1 variable, 1population)
In a chi-square goodness of fit test, the null
hypothesis is always
Ho: The data follow a specified
distribution
The alternative hypothesis is always
Ha: The data does not follow a
specified distribution
The idea behind testing these types of claims is
to compare actual counts to the counts we would
expect if the null hypothesis were true. If a
significant difference between the actual counts
and expected counts exists, we would take this
as evidence against the null hypothesis.
The method for obtaining the expected
counts requires that we determine the
number of observations within each cell
under the assumption the null hypothesis is
true.
Test Statistic for the Test of Goodness of Fit
Let Oi represent the observed number of counts in the
ith cell, Ei represent the expected number of counts in
the ith cell. Then,
approximately follows the chi-square distribution with
(# of cells– 1) degrees of freedom in the contingency
table
The Chi-Square Test for Goodness of Fit
If a claim is made regarding the data
following a certain distribution, we can
use the following steps to test the claim
provided
1. the data is randomly selected
The Chi-Square Test for Goodness of Fit
If a claim is made regarding the data
following a certain distribution, we can
use the following steps to test the claim
provided
1. the data is randomly selected
2. all expected frequencies are greater than or
equal to 1.
The Chi-Square Test for Goodness of Fit
If a claim is made regarding the data
following a certain distribution, we can
use the following steps to test the claim
provided
1. the data is randomly selected
2. all expected frequencies are greater than or
equal to 1.
3. 80% of the expected cell counts are greater
than or equal to 5.
EXAMPLE
Testing for Goodness of Fit
In consumer marketing, a common problem that any marketing
manager faces is the selection of appropriate colors for package
design. Assume that a marketing manager wishes to compare
five different colors of package design. He is interested in
knowing if there is a preference among the five colors so that it
can be introduced in the market. A random sample of 400
consumers reveals the following. Do the consumer preferences
for package colors show any significant difference?
Package
Color
Red
Blue
Green
Pink
Orange
Total
Costumers
Preference
70
106
80
70
74
400
Step 1. A claim is made regarding the data fit to
a certain distribution.
H o:
H a:
Step 1. A claim is made regarding the data fit to
a certain distribution.
Ho: the number of customers who prefer
each color are the same.
Ha: the number of customers who prefer
each color are not the same.
Step 2: Calculate the expected frequencies
(counts) for each cell in the contingency table.
Step 2: Calculate the expected frequencies
(counts) for each cell in the contingency table.
Observed Counts
Package
Color
Red
Blue
Green
Pink
Orange
Total
Costumers
Preference
70
106
80
70
74
400
Expected Counts
Package
Color
Red
Blue
Green
Pink
Orange
Total
Costumers
Preference
80
80
80
80
80
400
Step 3: Verify the requirements for the chi-square
test for goodness of fit are satisfied.
(1) data is randomly selected
(2) all expected frequencies are greater than
or equal to 1
(3) 80% of the expected cell counts are
greater than or equal to 5.
Step 4: Select a proper level of significance 
Step 5: Compute the test statistic and P-value
P-value =
cdf(min,max,df)
Step 5: Compute the test statistic and P-value
11.4
P-value = 0.0224
If P-value < , reject null hypothesis
If P-value < , reject null hypothesis
11.4>9.49 and 0.0224<0.05. Therefore I would
reject the null hypothesis. The data is
statistically significant and I am led to believe
that there is a difference in preference of
package color
The chi-square independence test:
you have two characteristics of a
population, and you want to see if there is
any association between the
characteristics(2 variables, 1 population)
In a chi-square independence test, the null
hypothesis is always
Ho: the variables are independent
The alternative hypothesis is always
Ha: the variables are dependent
The idea behind testing these types of claims is
to compare actual counts to the counts we would
expect if the null hypothesis were true (if the
variables are independent). If a significant
difference between the actual counts and
expected counts exists, we would take this as
evidence against the null hypothesis.
The method for obtaining the expected
counts requires that we determine the
number of observations within each cell
under the assumption the null hypothesis is
true.
Expected Frequencies in a Chi-Square
Independence Test
To find the expected frequencies in a cell when
performing a chi-square independence test,
multiply the row total of the row containing the
cell by the column total of the column containing
the cell and divide this result by the table total.
That is
Test Statistic for the Test of Independence
Let Oi represent the observed number of counts in the
ith cell, Ei represent the expected number of counts in
the ith cell. Then,
approximately follows the chi-square distribution with
(r – 1)(c – 1) degrees of freedom where r is the number
of rows and c is the number of columns in the
contingency table
The Chi-Square Test for Independence
If a claim is made regarding the association
between (or independence of) two variables
in a contingency table, we can use the
following steps to test the claim provided
1. the data is randomly selected
The Chi-Square Test for Independence
If a claim is made regarding the association
between (or independence of) two variables
in a contingency table, we can use the
following steps to test the claim provided
1. the data is randomly selected
2. all expected frequencies are greater than or
equal to 1.
The Chi-Square Test for Independence
If a claim is made regarding the association
between (or independence of) two variables
in a contingency table, we can use the
following steps to test the claim provided
1. the data is randomly selected
2. all expected frequencies are greater than or
equal to 1.
3. 80% of the expected cell counts are greater
than or equal to 5.
EXAMPLE
Men
Women
Testing for Independence
Money
82
46
Health
446
574
Love
355
273
Step 1. A claim is made regarding the
independence of the data.
H o:
H a:
Step 1. A claim is made regarding the
independence of the data.
Ho: there is not association between gender
of lifestyle choice, the variables are
independent
Ha: there is an association between
gender of lifestyle choice, the variables
are dependent
Step 2: Calculate the expected frequencies
(counts) for each cell in the contingency table.
Step 2: Calculate the expected frequencies
(counts) for each cell in the contingency table.
Observed Counts
Men
Women
Money
82
46
Health
446
574
Expected Counts
Money
Health
Men
63.64
507.13
Women 64.36
512.87
Love
355
273
Love
312.23
315.77
Step 3: Verify the requirements for the chi-square
test for independence are satisfied.
(1) data is randomly selected
(2) all expected frequencies are greater than
or equal to 1
(3) 80% of the expected cell counts are
greater than or equal to 5.
Step 4: Select a proper level of significance 
Step 5: Compute the test statistic and P-Value
P-value =
cdf(min,max,df)
Step 5: Compute the test statistic and P-Value
36.84
P = 0.00000001
If P-value < , reject null hypothesis
If P-value < , reject null hypothesis
36.84>5.99 and 0.00000001<0.05. Therefore I
would reject the null hypothesis. The data is
statistically significant and I am led to believe
that there is an association between gender
and lifestyle choice and that these variables
are dependent
In a chi-square test for homogeneity:
you take samples from different populations,
and you want to test to see if the proportions
in various categories is the same for each
population(1 variable, multiple populations)
In a chi-square homogeneity test, the null
hypothesis is always
Ho: populations have the same
proportion of individuals with some
characteristic.
The alternative hypothesis is always
Ha: populations have different
proportion of individuals with some
characteristic.
The idea behind testing these types of claims is
to compare actual counts to the counts we would
expect if the null hypothesis were true
(proportions are equal). If a significant
difference between the actual counts and
expected counts exists, we would take this as
evidence against the null hypothesis.
The method for obtaining the expected
counts requires that we determine the
number of observations within each cell
under the assumption the null hypothesis is
true.
Expected Frequencies in a Chi-Square
Homogeneity Test
To find the expected frequencies in a cell
when performing a chi-square
independence test, multiply the row total of
the row containing the cell by the column
total of the column containing the cell and
divide this result by the table total. That is
Test Statistic for the Test of Homogeneity
Let Oi represent the observed number of counts in the
ith cell, Ei represent the expected number of counts in
the ith cell. Then,
approximately follows the chi-square distribution with
(r – 1)(c – 1) degrees of freedom where r is the number
of rows and c is the number of columns in the
contingency table
The Chi-Square Test for Homogeneity
If a claim is made regarding that different
populations have the same proportion of
individuals with some characteristic, we
can use the following steps to test the
claim provided
1. the data is randomly selected
The Chi-Square Test for Homogeneity
If a claim is made regarding that different
populations have the same proportion of
individuals with some characteristic, we
can use the following steps to test the
claim provided
1. the data is randomly selected
2. all expected frequencies are greater than or
equal to 1.
The Chi-Square Test for Homogeneity
If a claim is made regarding that different
populations have the same proportion of
individuals with some characteristic, we
can use the following steps to test the
claim provided
1. the data is randomly selected
2. all expected frequencies are greater than or
equal to 1.
3. 80% of the expected cell counts are greater
than or equal to 5.
EXAMPLE A Test of Homogeneity of Proportions
The following question was asked of a random
sample of individuals in 1992, 1998, and 2001:
“Would you tell me if you feel being a teacher is an
occupation of very great prestige?” The results of
the survey are presented below:
Yes
No
1992
549
522
1998
539
578
2001
570
599
Step 1. A claim is made regarding the
homogeneity of the data.
Ho:
Ha:
Step 1. A claim is made regarding the
homogeneity of the data.
Ho: the proportions of individuals who feel
teaching is an occupation of very great
prestige in each year are equal
Ha: the proportions of individuals who feel
teaching is an occupation of very great
prestige in each year are not equal
Step 2: Calculate the expected frequencies
(counts) for each cell in the contingency table.
Step 2: Calculate the expected frequencies
(counts) for each cell in the contingency table.
Observed Counts
Yes
No
1992
549
522
1998
539
578
2001
570
599
Expected Counts
Yes
No
1992
528.96
542.04
1998
551.68
565.32
2001
577.36
591.64
Step 3: Verify the requirements for the chi-square
test for homogeneity are satisfied.
(1) data is randomly selected
(2) all expected frequencies are greater than
or equal to 1
(3) 80% of the expected cell counts are
greater than or equal to 5.
Step 4: Select a proper level of significance 
Step 5: Compute the test statistic and P-Value
P-value =
cdf(min,max,df)
Step 5: Compute the test statistic and P-Value
2.26
P = 0.3228
If P-value < , reject null hypothesis
If P-value < , reject null hypothesis
2.26<9.21 and 0.323>0.01. Therefore I would
fail to reject the null hypothesis. The data is
not statistically significant and I can not
conclude that the proportions of individuals
who feel teaching is an occupation of very
great prestige is different each year
Download