Chi-Square Test

Two Mini-Lectures
Chapter Contents
15.1 Chi-Square Test for Independence
15.2 Chi-Square Tests for Goodness-of-Fit
15.3 Uniform Goodness-of-Fit Test
15.4 Poisson Goodness-of-Fit Test
15.5 Normal Chi-Square Goodness-of-Fit Test
15.6 ECDF Tests (Optional)
Chapter 15
Chi-Square Tests
Chapter 15
Chi-Square Test for Independence
Contingency Tables
A contingency table is a cross-tabulation of n paired observations into
Each cell shows the count of observations that fall into the
category defined by its row (r) and column (c) heading.
Chapter 15
Chi-Square Test for Independence
Contingency Tables
For example:
Chapter 15
Chi-Square Test for Independence
Chi-Square Test
In a test of independence for an r x c contingency table, the hypotheses
H0: Variable A is independent of variable B
H1: Variable A is not independent of variable B
Use the chi-square test for independence to test these hypotheses.
This nonparametric test is based on frequencies.
The n data pairs are classified into c columns and r rows and then the
observed frequency fjk is compared with the expected frequency ejk.
Chapter 15
Chi-Square Test for Independence
Chi-Square Distribution
The critical value comes from the chi-square probability distribution
with d.f. degrees of freedom.
d.f. = degrees of freedom = (r – 1)(c – 1)
r = number of rows in the table
c = number of columns in the table
Appendix E contains critical values for right-tail areas of the chi-square
distribution, or use Excel’s =CHISQ.DIST.RT(α,d.f.)
The mean of a chi-square distribution is d.f. with variance 2d.f.
Chapter 15
Chi-Square Test for Independence
Chi-Square Distribution
Consider the shape of the chi-square distribution:
Chapter 15
Chi-Square Test for Independence
Expected Frequencies
Assuming that H0 is true, the expected frequency of row j and column k
ejk = RjCk/n
Rj = total for row j (j = 1, 2, …, r)
Ck = total for column k (k = 1, 2, …, c)
n = sample size
Chapter 15
Chi-Square Test for Independence
Steps in Testing the Hypotheses
Step 1: State the Hypotheses
H0: Variable A is independent of variable B
H1: Variable A is not independent of variable B
Step 2: Specify the Decision Rule
Calculate d.f. = (r – 1)(c – 1)
For a given α, look up the right-tail critical value (2R) from
Appendix E or by using Excel =CHISQ.DIST.RT(α,d.f.).
Reject H0 if 2R > test statistic.
Chapter 15
Chi-Square Test for Independence
Steps in Testing the Hypotheses
For example, for d.f. = 6 and α = .05, 2.05 = 12.59.
Chapter 15
Chi-Square Test for Independence
Steps in Testing the Hypotheses
Here is the rejection region.
Chapter 15
Chi-Square Test for Independence
Steps in Testing the Hypotheses
• Step 3: Calculate the Expected Frequencies
ejk = RjCk/n
For example,
Chapter 15
Chi-Square Test for Independence
Steps in Testing the Hypotheses
• Step 4: Calculate the Test Statistic
The chi-square test statistic is
• Step 5: Make the Decision
Reject H0 if test statistic 2calc > 2R or if the p-value  α.
Chapter 15
Chi-Square Test for Independence
Example: MegaStat
all cells have ejk  5 so
Cochran’s Rule is met
Caution: Don’t highlight row or column totals
p-value = 0.2154 is not small enough to reject
the hypothesis of independence at α = .05
Chapter 15
Chi-Square Test for Independence
Test of Two Proportions
• For a 2 × 2 contingency table, the chi-square test is equivalent to a twotailed z test for two proportions.
• The hypotheses are:
Chapter 15
Chi-Square Test for Independence
Small Expected Frequencies
• The chi-square test is unreliable if the expected frequencies are
too small.
• Rules of thumb:
• Cochran’s Rule requires that ejk > 5 for all cells.
• Up to 20% of the cells may have ejk < 5
• Most agree that a chi-square test is infeasible if ejk < 1 in any cell.
• If this happens, try combining adjacent rows or columns to enlarge the
expected frequencies.
Chapter 15
Chi-Square Test for Independence
Cross-Tabulating Raw Data
Chi-square tests for independence can also be used to analyze
quantitative variables by coding them into categories.
For example, the variables Infant Deaths per 1,000 and Doctors
per 100,000 can each be coded into various categories:
Chapter 15
Chi-Square Test for Independence
Why Do a Chi-Square Test on Numerical Data?
The researcher may believe there’s a relationship between X
and Y, but doesn’t want to use regression.
There are outliers or anomalies that prevent us from assuming
that the data came from a normal population.
The researcher has numerical data for one variable but not
the other.
Chapter 15
Chi-Square Test for Independence
3-Way Tables and Higher
More than two variables can be compared using contingency
However, it is difficult to visualize a higher-order table.
For example, you could visualize a cube as a stack of tiled 2-way
contingency tables.
Major computer packages permit three-way tables.
Purpose of the Test
The goodness-of-fit (GOF) test helps you decide whether your
sample resembles a particular kind of population.
The chi-square test is versatile and easy to understand.
Hypotheses for GOF tests:
The hypotheses are:
H0: The population follows a _____ distribution
H1: The population does not follow a ______ distribution
The blank may contain the name of any theoretical distribution (e.g.,
uniform, Poisson, normal).
Chapter 15
Chi-Square Tests for Goodness-of-Fit ML 10.2
Chapter 15
Chi-Square Tests for Goodness-of-Fit
Test Statistic and Degrees of Freedom for GOF
Assuming n observations, the observations are grouped into c classes
and then the chi-square test statistic is found using:
fj = the observed frequency of
observations in class j
ej = the expected frequency in class j if the sample
came from the hypothesized population
Chapter 15
Chi-Square Tests for Goodness-of-Fit
Test Statistic and Degrees of Freedom for GOF tests
If the proposed distribution gives a good fit to the sample, the test
statistic will be near zero.
The test statistic follows the chi-square distribution with degrees of
d.f. = c – m – 1.
where c is the number of classes used in the test and m is the number
of parameters estimated.
Chapter 15
Normal Chi-Square GOF Test
Is the Sample from a Normal Population?
Many statistical tests assume a normal population, so this the most
common GOF test.
Two parameters, the mean μ and the standard deviation σ, fully
describe a normal distribution.
Unless μ and σ are known a priori, they must be estimated from a
sample in order to perform a GOF test for normality.
Method 1: Standardize the Data
Transform sample observations x1, x2, …, xn into standardized z-values.
Count the sample observations within each interval on the z-scale and
compare them with expected normal frequencies ej.
Problem: Frequencies will be small in the end bins yet large in the
middle bins (this may violate Cochran’s Rule and seems inefficient).
Chapter 15
Normal Chi-Square GOF Test
Chapter 15
Normal Chi-Square GOF Test
Method 2: Equal Bin Widths
Step 1: Divide the exact data range into c groups of equal
width, and count the sample observations in each bin to get
observed bin frequencies fj.
Step 2: Convert the bin limits into standardized z-values:
Step 3: Find the normal area within each bin assuming a
normal distribution.
Step 4: Find expected frequencies ej by multiplying each
normal area by the sample size n.
Problem: Frequencies will be small in the end bins yet large in the
middle bins (this may violate Cochran’s Rule and seems inefficient).
Chapter 15
Normal Chi-Square GOF Test
Method 3: Equal Expected Frequencies
• Define histogram bins in such a way that an equal number of
observations would be expected under the hypothesis of a
normal population, i.e., so that ej = n/c.
• A normal area of 1/c is expected in each bin.
• The first and last classes must be open-ended, so to define c bins
we need c-1 cut points.
• Count the observations fj within each bin.
• Compare the fj with the expected frequencies ej = n/c.
Advantage: Makes efficient use of
the sample.
Disadvantage: Cut points on the
z-scale points may seem strange.
Chapter 15
Normal Chi-Square GOF Test
Method 3: Equal Expected Frequencies
• Standard normal cut points for equal area bins.
Critical Values for Normal GOF Test
Two parameters, m and s, are estimated from the sample, so the
degrees of freedom are d.f. = c – m – 1.
We need at least four bins to ensure at least one degree of freedom.
Small Expected Frequencies
Cochran’s Rule suggests at least ej  5 in each bin (e.g., with 4 bins
we would want n  20, and so on).
Chapter 15
Normal Chi-Square GOF Test
Visual Tests
The fitted normal superimposed on a histogram gives visual
clues as to the likely outcome of the GOF test.
A simple “eyeball” inspection of the histogram may suffice to
rule out a normal population by revealing outliers or other nonnormality issues.
Chapter 15
Normal Chi-Square GOF Test
ECDF Tests for Normality
There are alternatives to the chi-square test for normality based
on the empirical cumulative distribution function (ECDF).
ECDF tests are done by computer. Details are omitted here.
A small p-value casts doubt on normality of the population.
The Kolmogorov-Smirnov (K-S) test uses the largest absolute
difference between the actual and expected cumulative relative
frequency of the n data values.
The Anderson-Darling (A-D) test is based on a probability plot.
When the data fit the hypothesized distribution closely, the
probability plot will be close to a straight line. The A-D test is
widely used because of its power and attractive visual.
Chapter 15
ECDF Tests
Example: Minitab’s Anderson-Darling Test for Normality
Data: weights of 80
babies (in ounces)
Near-linear probability
plot suggests good fit to
normal distribution
p-value = 0.122 is not small
enough to reject normal
population at α = .05
Chapter 15
ECDF Tests
Chapter 15
ECDF Tests
Example: MegaStat’s Normality Tests
Data: weights of 80
babies (in ounces)
p-value = 0.2487 is not small
enough to reject normal
population at α = .05 in this
chi-square test
Near-linear probability
plot suggests good fit to
normal distribution
Note: MegaStat’s chi-square test is not as powerful as the A-D test,
so we would prefer the A-D test if software is available. The
MegaStat probability plot is good, but shows no p-value.