Chapter 1 Looking at Data— Distributions

advertisement
Chapter 9
Analysis of
Two-Way Tables
Introduction to the Practice of
STATISTICS
SEVENTH
EDI T I O N
Moore / McCabe / Craig
Lecture Presentation Slides
Chapter 9
Analysis of Two-Way
Tables
9.1 Inference for Two-Way Tables
9.2 Formulas and Models for Two-Way Tables
9.3 Goodness of Fit
2
9.1 Inference for Two-Way
Tables
 Two-Way Tables
 Expected Cell Counts
 The Chi-Square Statistic
 The Chi-Square Distributions
 The Chi-Square Test
3
Two-Way Tables
The two-sample z procedures of Chapter 8 allow us to compare the
proportions of successes in two populations or for two treatments. What
if we want to compare more than two samples or groups? More
generally, what if we want to compare the distributions of a single
categorical variable across several populations or treatments? We need
a new statistical test. The new test starts by presenting the data in a
two-way table.
Two-way tables of counts have more general uses than
comparing distributions of a single categorical variable. They can
be used to describe relationships between any two categorical
variables.
4
Expected Cell Counts
Two-way tables sort the data according to two categorical variables.
We want to test the hypothesis that there is no relationship between
these two categorical variables (H0).
To test this hypothesis, we compare actual counts from the sample
data with expected counts, given the null hypothesis of no
relationship.
The expected count in any cell of a two-way table when H0 is true is:
5
The Problem of
Multiple Comparisons
To perform a test of
H0: there is no difference in the distribution of a categorical variable for
several populations or treatments
Ha: there is a difference in the distribution of a categorical variable for several
populations or treatments
we compare the observed counts in a two-way table with the counts we would
expect if H0 were true.
The problem of how to do many comparisons at once with an overall measure of
confidence in all our conclusions is common in statistics. This is the problem of
multiple comparisons. Statistical methods for dealing with multiple comparisons
usually have two parts:
1. An overall test to see if there is good evidence of any differences among the
parameters that we want to compare.
2. A detailed follow-up analysis to decide which of the parameters differ and to estimate
how large the differences are.
The overall test uses the chi-square statistic and distributions.
6
The Chi-Square Statistic
To see if the data give convincing evidence against the null hypothesis,
we compare the observed counts from our sample with the expected
counts assuming H0 is true.
The test statistic that makes the comparison is the chi-square statistic.
The chi-square statistic is a measure of how far the observed counts
are from the expected counts. The formula for the statistic is:
(Observed - Expected) 2
 
Expected
2
where “observed” represents an observed cell count, “expected”
represents the expected count for the same cell, and the sum is over
all r  c cellsin the table.
7
The Chi-Square Distributions
If the expected counts are large and the observed counts are very different, a
large value of 2 will result, providing evidence against the null hypothesis.
The P-value for a 2 test comes from comparing the value of the 2 statistic
with critical values for a chi-square distribution.
The Chi-Square Distributions
The chi-square distributions are a family of
distributions that take only positive values and
are skewed to the right. A particular 2
distribution is specified by giving its degrees of
freedom.
The 2 test for a two-way table with r rows and c
columns uses critical values from the 2
distribution with (r – 1)(c – 1) degrees of
freedom. The P-value is the area under the
density curve of this 2 distribution to the right of
the value of the test statistic.
8
Cell Counts Required for the
Chi-Square Test
The chi-square test is an approximate method that becomes more
accurate as the counts in the cells of the table get larger. We must
therefore check that the counts are large enough to allow us to trust the
P-value. Fortunately, the chi-square approximation is accurate for quite
modest counts.
Cell Counts Required for the Chi-Square Test
You can safely use the chi-square test with critical values from
the chi-square distribution when the average of the expected
counts is 5 or more and all individual expected counts are 1 or
greater. In particular, all four expected counts in a 2  2 table
should be 5 or greater.
9
The Chi-Square Test
The chi-square test is an overall test for detecting relationships between
two categorical variables. If the test is significant, it is important to look at
the data to learn the nature of the relationship. We have three ways to
look at the data:
1)Compare selected percents: which cells occur in quite different
percents of all cells?
2)Compare observed and expected cell counts: which cells have
more or less observations than we would expect if H0 were true?
3)Look at the terms of the chi-square statistic: which cells contribute
the most to the value of χ2?
10
The Chi-Square Test
One of the most useful properties of the chi-square test is that it tests the
null hypothesis “the row and column variables are not related to each
other” whenever this hypothesis makes sense for a two-way table.
Uses of the Chi-Square Test
Use the chi-square test to test the null hypothesis
H0: there is no relationship between two categorical variables
when you have a two-way table from one of these situations:
Independent SRSs from two or more populations, with each
individual classified according to one categorical variable
A single SRS, with each individual classified according to both of
two categorical variables
11
9.2 Formulas and Models for
Two-Way Tables*
 Computations for Two-Way Tables
 Computing Conditional Distributions
 Computing Expected Cell Counts
 The Chi-Square Statistic and its P-value
 Models for Two-Way Tables
12
Computations for Two-Way Tables
The calculations required to analyze a two-way table are
straightforward, but tedious. In practice, it is recommended to use
software. However, it is possible to do the work with a calculator. When
analyzing relationships between two categorical variables, follow this
procedure:
1)Calculate descriptive statistics that convey the important information
in the table. Usually these will be column or row percents.
2)Find the expected counts and use them to compute the 2 statistic.
3)Compare your 2 statistic to the chi-square critical values from Table
F to find the approximate P-value for your test.
4)Draw a conclusion about the association between the row and
column variables.
13
Computing Conditional Distributions
The calculated percents within a two-way table represent the
conditional distributions describing the “relationship” between both
variables.
For every two-way table, there are two sets of possible conditional
distributions (column percents or row percents).
For column percents, divide each cell count by the column total.
The sum of the percents in each column should be 100, except for
possible small roundoff errors.
When one variable is clearly explanatory, it makes sense to describe
the relationship by comparing the conditional distributions of the
response variable for each value (level) of the explanatory variable.
14
Comparing Conditional Distributions
Market researchers suspect that background music may affect the mood and
buying behavior of customers. One study in a supermarket compared three
randomly assigned treatments: no music, French accordion music, and Italian
string music. Under each condition, the researchers recorded the numbers of
bottles of French, Italian, and other wine purchased. Here is a table that
summarizes the data:
PROBLEM:
a)Calculate the conditional distribution (in proportions) of the type of wine sold for
each treatment.
b)Make an appropriate graph for comparing the conditional distributions in part (a).
c)Are the distributions of wine purchases under the three music treatments similar or
different? Give appropriate evidence from parts (a) and (b) to support your answer.
15
Comparing Conditional Distributions
(a) When no music was playing, the distribution of wine purchases was
30
11
43
French :
= 0.357 Italian :
= 0.131 Other :
= 0.512
84
84
84
When French accordion music was playing, the distribution of wine purchases was
39
1
35
French :
= 0.520 Italian :
= 0.013 Other :
= 0.467
75
75
75
When Italian string music was playing, the distribution of wine purchases was
30
19
35
French :
= 0.357 Italian :
= 0.226 Other :
= 0.417
84
84
84
The type of wine that customers buy seems to differ considerably across the three
music treatments. Sales of Italian wine are very low (1.3%) when French music is
playing but are higher when Italian music (22.6%) or no music (13.1%) is playing.
French wine appears popular in this market, selling well under all music conditions
but notably better when French music is playing. For all three music treatments, the
percent of Other wine purchases was similar.
16
Calculating Expected Cell Counts
Finding the expected counts is not that difficult, as the following example
illustrates.
The
overall proportion of French wine bought during the study was 99/243 =
0.407. So the expected counts of French wine bought under each treatment are:
99
99 experiment is that there’s no
99
The
null: hypothesis
in theFrench
wine and
music
No
music
× 84 = 34.22
music
:
× 75 = 30.56
Italian music :
× 84 = 34.22
243
243
243
difference in the distribution of wine purchases in the store when no music,
French accordion music, or Italian string music is played.
To find
the proportion
expected counts,
wewine
startbought
by assuming
thatstudy
H0 is was
true.31/243
We can=see
The
overall
of Italian
during the
from the
table counts
that 99of
ofItalian
the 243
bottles
of wine
bought
during theare:
0.128.
So two-way
the expected
wine
bought
under
each treatment
study were 31
French wines.
31
31
× 84 = 10.72
French music :
× 75 = 9.57
243
243
If the specific type of music that’s
No music :
Italian music :
243
× 84 = 10.72
playing has no effect on wine
purchases, the proportion of French
The
overall
Other wine bought during the study was 113/243 =
wine
sold proportion
under eachofmusic
0.465.
So the
expected
counts= of
Other wine bought under each treatment are:
condition
should
be 99/243
0.407.
No music :
113
× 84 = 39.06
243
French music :
113
× 75 = 34.88
243
Italian music :
113
× 84 = 39.06
243
17
Calculating Expected Cell Counts
Consider the expected
count of French wine
bought when no music
was playing:
99
99
× 84 = 34.22
243
84
243
The values in the calculation are the row total for French wine, the column
total for no music, and the table total. We can rewrite the original
calculation as:
•
= 34.22
Expected Counts
The expected count in any cell of a two-way table when H0 is true is:
expected count =
row total × column total
table total
18
The Chi-Square Calculation
For the French wine with no music, the observed count is 30 bottles and the
expected count is 34.22. The contribution to the c 2 statistic for this cell is
(Observed - Expected)2 (30 - 34.22) 2
=
= 0.52
Expected
34.22
The c 2 statistic is the sum of nine such terms :
(Observed - Expected)2 (30 - 34.22) 2 (39 - 30.56) 2
(35 - 39.06) 2
c =å
=
+
+ ...+
Expected
34.22
30.56
39.06
2
= 0.52 + 2.33 + ...+ 0.42 = 18.28
19
The 2 Statistic and its P-Value
H0: There is no difference in the distributions of wine purchases at this
store when no music, French accordion music, or Italian string music is
played.
Ha: There is a difference in the distributions of wine purchases at this
store when no music, French accordion music, or Italian string music is
played.
Our calculated test statistic is χ2 = 18.28.
To find the P-value
using a chi-square table
look in the
df = (3-1)(3-1) = 4.
P
df
.0025
.001
4
16.42
18.47
The small P-value (between 0.001 and 0.0025) gives us convincing
evidence to reject H0 and conclude that there is a difference in the
distributions of wine purchases at this store when no music, French
accordion music, or Italian string music is played.
20
Models for Two-Way Tables
The chi-square test is an overall technique for comparing any number
of population proportions, testing for evidence of a relationship between
two categorical variables. We can either:
 Compare several populations: Randomly select several SRSs
each from a different population (or from a population subjected
to different treatments)  experimental study.
 Test for independence: Take one SRS and classify the
individuals in the sample according to two categorical variables
(attribute or condition)  observational study, historical design.
Both models use the 2 test to test the hypothesis of no relationship.
21
Comparing Several Populations
Select independent SRSs from each of c populations, of sizes
n1, n2, . . . , nc. Classify each individual in a sample according to a
categorical response variable with r possible values. There are c
different probability distributions, one for each population.
The null hypothesis is that the distributions of the response variable are
the same in all c populations. The alternative hypothesis says that
these c distributions are not all the same.
22
Comparing Several Populations
Random digit dialing telephone surveys used to exclude cell phone numbers. If the opinions
of people who have only cell phones differ from those of people who have landline service,
the poll results may not represent the entire adult population. The Pew Research Center
interviewed separate random samples of cell-only and landline telephone users who were
less than 30 years old. Here’s what the Pew survey found about how these people describe
their political party affiliation.
We want to perform a test of:
H0: There is no difference in the distribution of party affiliation in the cellonly and landline populations.
Ha: There is a difference in the distribution of party affiliation in the cellonly and landline populations.
23
Comparing Several Populations
If the conditions are met, we can conduct a chi-square test.
• Random: The data came from separate random samples of 96 cell-only
and 104 landline users.
• Large Sample Size: We used a calculator to determine the expected
counts for each cell. The calculator screenshot confirms all expected
counts ≥ 5.
• Independent: Researchers took independent samples of cell-only and
landline phone users. Sampling without replacement was used, so there
need to be at least 10(96) = 960 cell-only users under age 30 and at least
10(104) = 1040 landline users under age 30. This is safe to assume.
24
Comparing Several Populations
Since the conditions are satisfied, we can a perform chi-square test. We
begin by calculating the test statistic.
Test statistic :
(Observed - Expected) 2
2
 
Expected
(49  46.08) 2 (47  49.92) 2
(30  32.24) 2


 ...
 3.22
46.08
49.92
32.24
P-Value:
Using df = (3 – 1)(2 – 1) = 2, the P-value is 0.20.
Because the P-value, 0.20, is greater than α = 0.05, we fail to reject H0.
There is not enough evidence to conclude that the distribution of party
affiliation differs in the cell-only and landline user populations.
25
Testing for Independence
Suppose we have a single sample from a single population. For each
individual in this SRS of size n we measure two categorical variables.
The results are then summarized in a two-way table.
The null hypothesis is that the row and column variables are
independent. The alternative hypothesis is that the row and column
variables are dependent.
26
Testing for Independence
We’re interested in whether angrier people tend to get heart disease
more often. We can compare the percents of people who did and did
not get heart disease in each of the three anger categories:
There is a clear trend: as the anger score
increases, so does the percent who suffer
heart disease. A much higher percent of
people in the high-anger category developed
CHD (4.27%) than in the moderate (2.33%)
and low (1.70%) anger categories.
27
Testing for Independence
Here is the complete table of observed and expected counts for the CHD
and anger study side by side. Do the data provide convincing evidence of
an association between anger level and heart disease in the population of
interest?
We want to perform a test of:
H0: There is no association between anger level and heart disease in the
population of people with normal blood pressure.
Ha: There is an association between anger level and heart disease in the
population of people with normal blood pressure.
28
Testing for Independence
If the conditions are met, we should conduct a chi-square test for
association/independence.
• Random: The data came from a random sample of 8474 people with
normal blood pressure.
• Large Sample Size: All the expected counts are at least 5, so this
condition is met.
• Independent: Knowing the values of both variables for one person in
the study gives us no meaningful information about the values of the
variables for another person. So individual observations are
independent. Because we are sampling without replacement, we need
to check that the total number of people in the population with normal
blood pressure is at least 10(8474) = 84,740. This seems reasonable to
assume.
29
Testing for Independence
Since the conditions are satisfied, we can perform a chi-test for
association/independence. We begin by calculating the test statistic.
Test statistic :
(Observed - Expected) 2
 
Expected
2
(53  69.73) 2 (110 106.08) 2
(606  618.81) 2


 ...
69.73
106.08
618.81
 4.014  0.145  ... 0.265  16.077
P-Value:
The two-way table of anger level versus heart disease has 2 rows and 3 columns. We
will use the
 chi-square distribution with df = (2 – 1)(3 – 1) = 2 to find the P-value.
Table: Look at the df = 2 line in Table F. The observed statistic χ2 = 16.077 is larger
than the critical value 15.20 for α = 0.0005. So the P-value is less than 0.0005.
Technology: The calculator command χ2cdf(16.077,1000,2) gives 0.00032.
Because the P-value is clearly less than α = 0.05, we reject H0 and conclude that
anger level and heart disease are associated in the population of people with
normal blood pressure.
30
9.3 Goodness of Fit*
 The Chi-Square Goodness of Fit Test
31
The Chi-Square Test for
Goodness of Fit
Mars, Inc. makes milk chocolate candies. Here’s what the company’s
Consumer Affairs Department says about the color distribution of its
M&M’S milk chocolate candies:
On average, the new mix of colors of M&M’S milk chocolate candies will
contain 13 percent of each of browns and reds, 14 percent yellows, 16
percent greens, 20 percent oranges, and 24 percent blues.
The one-way table below summarizes the data from a sample bag of
M&M’S milk chocolate candies. In general, one-way tables display the
distribution of a categorical variable for the individuals in a sample.
Color
Blue
Orange
Green
Yellow
Red
Brown
Total
Count
9
8
12
15
10
6
60
The sample proportion of blue M & M’ S is pˆ 
9
 0.15.
60
32
The Chi-Square Test for
Goodness of Fit
Since the company claims that 24% of all M&M’S milk chocolate candies are
blue, we might believe that something fishy is going on. We could use the z
test for a proportion to test the hypotheses:
H0: p = 0.24
Ha: p ≠ 0.24
where p is the true population proportion of blue M&M’S. We could then
perform additional significance tests for each of the remaining colors.
However, performing a one-sample z test for each proportion would be pretty
inefficient and would lead to the problem of multiple comparisons.
More important, performing one-sample z tests for each color wouldn’t tell us
how likely it is to get a random sample of 60 candies with a color distribution
that differs as much from the one claimed by the company as this bag does
(taking all the colors into consideration at one time).
For that, we need a new kind of significance test, called a
chi-square test for goodness of fit.
33
The Chi-Square Test for
Goodness of Fit
We can write the hypotheses in symbols as:
H0: pblue = 0.24, porange = 0.20, pgreen = 0.16,
pyellow = 0.14, pred = 0.13, pbrown = 0.13,
Ha: At least one of the pi’s is incorrect
where pcolor = the true population proportion of M&M’S milk chocolate candies of
that color.
The idea of the chi-square test for goodness of fit is this: we compare the
observed counts from our sample with the counts that would be expected if
H0 is true. The more the observed counts differ from the expected counts,
the more evidence we have against the null hypothesis.
In general, the expected counts can be obtained by multiplying the proportion
of the population distribution in each category by the sample size.
34
The Chi-Square Test for
Goodness of Fit
Assuming that the color distribution stated by Mars, Inc. is true, 24% of all
M&M’S milk chocolate candies produced are blue.
For random samples of 60 candies, the average number of blue M&M’S should
be (0.24)(60) = 14.40. This is our expected count of blue M&M’S.
Using this same method, we can find the expected counts for the other color
categories:
Orange: (0.20)(60) = 12.00
Green: (0.16)(60) = 9.60
Yellow: (0.14)(60) = 8.40
Red:
(0.13)(60) = 7.80
Brown: (0.13)(60) = 7.80
35
The Chi-Square Test for
Goodness of Fit
To calculate the chi-square statistic, use the same formula as you did
earlier in the chapter.
(Observed - Expected)2
c =å
Expected
2
(9 -14.40) 2 (8 -12.00) 2 (12 - 9.60) 2
c =
+
+
14.40
12.00
9.60
2
(15 - 8.40) 2 (10 - 7.80) 2 (6 - 7.80) 2
+
+
+
8.40
7.80
7.80
c 2 = 2.025 +1.333 + 0.600 + 5.186 + 0.621+ 0.415
= 10.180
36
The Chi-Square Test for
Goodness of Fit
The Chi-Square Test for Goodness of Fit
A categorical variable has k possible outcomes, with probabilities p1, p2, p3,
… , pk. That is, pi is the probability of the ith outcome. We have n
independent observations from this categorical variable.
To test the null hypothesis that the probabilities have specified values
H0: p1 = ___, p2 = ___, …, pk = ___.
find the expected count for each category assuming that H0 is true. Then
calculate the chi-square statistic:
(Observed - Expected)2
2
c =å
Expected
where the sum is over the k different categories. The P - value is the area to
the right of c 2 under the density curve of the chi - square distribution with k -1
degrees of freedom.
37
The Chi-Square Test for
Goodness of Fit
We computed the chi - square statistic for our sample of 60 M & M’ S to be
 2  10.180. Because all of the expected counts are at least 5, the  2
statistic will follow a chi - square distributi on with df = 6 - 1 = 5 reasonably
well when H 0 is true.
P
df
.15
.10
.05
4
6.74
7.78
9.49
5
8.12
9.24
11.07
6
9.45
10.64
12.59
The value c 2 =10.180 falls between the critical values 9.24 and 11.07. The
corresponding
areasisinbetween
the right0.05
tail ofand
the0.10,
chi - square
distribution
df = 5
Since our P-value
it is greater
than αwith
= 0.05.
are
0.10 andwe
0.05.
Therefore,
fail to reject H0. We don’t have sufficient evidence to
conclude that the company’s claimed color distribution is incorrect.
38
So, the P - value for a test based on our sample data is between 0.05 and 0.10.
Chapter 9
Analysis of Two-Way
Tables
9.1 Inference for Two-Way Tables
9.2 Formulas and Models for Two-Way Tables
9.3 Goodness of Fit
39
Download