Uploaded by Jessi Bender

stats chap 11

advertisement
Chi-square test for goodness of fit -> to perform a test for the distribution of a categorical
variable with two or more categories
Chi-square test for homogeneity -> to perform a test to compare the distribution of a
categorical variable for two or more populations or treatments (where the variable can
have two or more categories)
Chi-square test for independence -> to perform a test to determine if there is convincing
evidence for an association between two variables in a population
11.1: Chi-square test for goodness of fit
We compare the observed counts from our sample with the counts that would be
expected if the null was true
● **We assume that the null is true when performing a significance test)
● The more the observed counts differ from the expected counts -> the more
evidence wehave against the null hypothesis and for the alternative hypothesis
Candy example:
● Company’s claims: brown, red, yellow, and green are 12.5% and orange and blue
are 25%
● Sample of 60: brown = 12, red = 3, yellow = 7, green = 9, orange = 9, blue = 20
̂
𝑝 of brown = 0.20, which is not the the 0.125 that the company claimed, might use a onesample z test for a proportion
● P = true proportion of M&M’s in the large bag that are brown
● Hₒ: p = 0.125, Hₐ: p ≠ 125
○ Perform a test like this for all 6 colors
This method would be inefficne tand lead to the problem of multiple tests
● Wouldn’t tell us how likely it is to get a random sample of 60 candies with a color
DISTRIBUTION that differs as much from one the claimed by the company as the
sample does (taking all colors into consideration at one time)
Stating Hypotheses
●
●
Null hypothesis: should state a claim about the distribution of a single categorical
variable in the population of interest
Alternative hypothesis: should state that the categorical variable does not have
the specified distribution
○ Don’t state the alternative hypothesis in a way that suggests that all the
proportions in the hypothesized distribution are wrong
Back to candy man: categorical variable is color, population of interest is all m&ms in
large bag
● Hₒ : the distribution of color in the large bag of M&Ms is the same as the claimed
distribution
○
●
Hₒ : p brown = 0.125, p red = 0.125, p yellow = 0.125, p green = 0.125, p
orange = 0.25, p blue = 0.25
Hₐ : the distribution of color in the large bag of M&Ms is NOT the same as the
claimed distribuiton
○ Hₐ : at least two of the p colors’s are incorrect
■ Where p color = the true proportion of M&Ms in the large bag of that
color
○ This shouldn’t be written as all the p colors are not equal to the claimed
We don’t say at least one because if the stated proportion in one category is wrong -> the
stated proportion in at least one other category must be wrong because the sum of the p
color’s must be 1
Calculating Expected Counts In a Chi-Square Test for GOF
The expected count for category i in the distribution of a categorical variable is npᵢ
Where pᵢ is the relative frequency for category i specified by the null
hypothesis
The number of counts in a specific category in a random sample is a binomial random
●
variable
●
Whose expected value is np = the average number of counts in that category
○ Expected count is not likely to be a whole number + shouldn’t be rounded
to a whole number
Back to candy man:
Expected color distribution for the random sample of 60 candies in all color categories
● Red: 60 (0.125) = 7.5
● Yellow: 60 (0.125) = 7.5
● Green: 60 (0.125) = 7.5
● Brown: 60 (0.125) = 7.5
● Orange: 60 (0.25) = 15
● Blue: 60 (0.25) = 15
To see if data gives convincing evidence for alternative -> compare observed counts
with expected counts
● If observed counts are far from expected counts -> we get the evidence we want
We can see some pretty big differences between the observed and expected counts in
several color categories
● How likely is it that differences this large or larger would occur just by chance in
random samples of size 60 from the population disribtuion claimed by the
website?
○
Calculate a statistic that measures how far apart the observed and
expected counts are -> chi-square test statistic
Chi-Square Test Statistic
Chi-square test statistic: a measure of how far the observed counts are from the
expected counts
●
2
𝑝
○
=∑
(𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝 − 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝)
2
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝
Where the sum is over all possible values of the categorical variable
Back to candy example: 𝑝
+ 1.67 = 9.8
2
2 = (12−7.5)
7 .5
+
(3−7.5) 2
(20−15) 2
+.
.
.
=
7 .5
15
2.7 + 2.7 + 0.03 + 0.30 + 2.4
Fair die example: made a 6-sided die, roll 90 times to test if each side was equally liekly
to show up
Outcome of
roll
1
2
3
4
5
6
Total
Frequency
12
28
12
13
10
15
90
State the hypotheses
●
●
Hₒ : the sides of Carrie’s die are equally liekly to show up
○ Carri’s die is fair
○ The distribution of outcome for Carrie’s die is uniform
Hₐ : the sides of Carrie’s die are not equally liely to show up
Calculate the expected count for each of the possible outcomes
●
●
If the null is true, each of the 6 sides should show up ⅙ of the time
The expected count is 90 (⅙) = 15 for each side
Calculate the value of the chi-squrae statistic
●
𝑝
2
2 = (12−15)
15
+
(28−15) 2
15
+. . .
(15−15) 2
15
= 0.6 + 11.27 + 0.6 + 0.27 + 1.67 + 0 = 14.41
Chi-Square Distributions and P-values
𝑝
2 is
a measure of the distance the observed counts are from the expected counts
Distance is always zero or positive
Zero is only when the observed counts are exactly equal to the expected counts
Large values of 𝑝 2-> stronger evidence for the alternative (observed counts are
far from what we would expect if the null was true)
● Small values of 𝑝 2-> the data is consistent with the null
●
●
●
Back to candy example: is the value from the candy sample, 𝑝 2= 9.8, a large value?
● In a simulation of 1000 random samples of size 60, only 87 of the 1000 simulated
samples resulted in a chi-square test statistic of 9.8 or higher
● So estimated P-value is 87/1000 = 0.087 > ⍺ = 0.05 -> we fail to reject the null -> we don’t
have convincing evidence that the color distribution in the sample is different form the
distribution claimed by the company
The sampling distribution of the chi-square test statistic is NOT a Normal distribution
● It’s right-skewed that allows only nonnegativ values (𝑝 2can’t be negative)
● Sampling distribuiotn depends on the number of possible values for the
categorical variable (# of categories)
When the expected counts are all at least 5 -> sampling distribution of 𝑝 2test statistic is
modeled well by a chi-squre distribution with degrees of freedom = number of
categories minus 1
● Chi-squre distribution: defined by a density curve that takes only nonnegative
valus and is skewed to the right
A particular chi-square distribution is specified by its degrees of freedom
● As the degrees of freeodm increase -> the density curves becomes less skewed ->
larger values become more probable
○ Mean of chi-squre distribution = degrees of freedom
○ When df > 2, the mode of the chi-squre density curve is at df =2
Back to candy: 𝑝 2= 9.8, because all the epecetd counts are at least 5 -> the 𝑝 2test
statistic will be modeled well by a chi-sqare distribution where the null is true
● P-value is probability of getting a value of 𝑝 2as large or larger than 9.8 when the
null is true (using 5 degrees of freedom)
● P-value = 0.081
What conclusions should we draw about Hₒ : the distribution of color in the large bag of
M&Ms is the same as the claimed distribution?
●
●
Because our P-value of 0.081 > ⍺ = 0.05 -> we fait to reject the null
We don’t have convincing evidence that the distribution of of color in the large
bag of M&Ms differs from the claimed distribution
○ **Remember -> failing to reject the null does not mean that the null is true,
we can only say that the sample data did not provide convincing evidence
to reject the null
In calc -> go to 6: 𝑝
2 cdff,
use 𝑝
2 as
the lower bound and infinity as the upper bound
Back to fair die example: Find the P-value
● Df = 6 -1 = 5
● Using 𝑝 2= 14.41, P-value = 0.0132
We reject the null because the P-value of 0.0132 < ⍺ = 0.05. There is convincing evidence that
Carrie’s die is unfair
Carrying Out a Test
Chi-square test for goodness of fit uses an approximaion that becomes more accurate as
we take larger sample
Conditions for Performing a Chi-Square Test for Goodness of Fit
●
●
●
Random: the data comes from a random sample from the population of interest
10%: when sampling without replacement, n < 0.10N
Large Counts: all EXPECTED counts are at least 5
○ Expected not observed
To compare the distribution of a categorical variable in one population to a claimed
distribution -> use chi-square test for goodoness of fit
The Chi-Square Test for Goodness of Fit
To test a hypothesized model for the distribution of a categorical variable
Suppose the conditions are met. To peform a test of Hₒ : the stated distribution of a
categorical variable in the population of interest is correct
● Compute chi-square test staistics (where sum is over the k different caetgoires)
● P-value is area to the right of 𝑝 2 under chi-squre density curve with k -1 degrees
of freedom
Hockey birthday example:random sample of 80 NHL players from recent reasons
selected, birthdays recorded -> to see if birth data is related to success (judged by if
players makes it into NHL), do these data provide convincing evidence that the
birthdays of NHL players are not uniformly distributed across the four quarters of the
year?
Birthday
Jan - MAr
Apr - Jun
Jul - Sep
Oct - Dec
% of Players
32
20
16
12
State
●
●
●
Hₒ : the birthdays of all NHL players are uniformly distributed across the four
quarter of the year
Hₐ : the birthdays of all NHL players are NOT uniformly distributed across the
four quarter of the year
We’ll use ⍺ = 0.05
Plan
●
Chi-square test for goodness of fit
○ Random: data comes from random sample of all NHL players
○ 10%: must assume tha 80 is less than 10$ of all NHL players
○ Large Counts: all expected counts = 80 (¼) = 20 ≥ 5
■ Some evidence in favor of the alternative because the observed
counts differ from the epxiered counts
●
𝑝
●
P-value with df = 3 is equal to 0,011
Do
2
2 = (32−20)
20
+
(20−20) 2
20
+. . .
(12−20) 2
20
= 7.2 + 0 + 0.8 + 3.2 = 12.3
Conclude
●
Because the P-value of 0.011 < ⍺ = 0.05, we reject the null. We have convincing evidence
that the birthdays of NHL players are not uniformly distributed across the four quarter of the
year
In calc -> go to tests -> 𝑝 2 GOF Test
● When using calc, write out first few terms of chi-square calculator, name the
procedure, test statistic, degrees of freedom, and P-value
How to investigate HOW the distribution is different?
● Identify the categories that contribute the most to the chi-square statistic
● Describe how the observed and expcetd counts differ in those categories, noting
the direction of the difference
Back to hockey birthday example:
Birthday
Observed
Expected
O-E
(O - E)^^2 / E
Jan-Mar
32
20
12
7.2
Apr-Jun
20
20
0
0.00
Jul-Sep
16
20
-4
0.8
Oct-Dec
12
20
-8
3.2
The last column shows the contributors/components of the chi-square test statistics
● The two biggest contributions to the chi-square statistic came frm Jan-Mar and
Oct-Dec
○ Jan-March -> 12 more players born than expected
○ October-December -> 9 fewer players were born than expected
11.2 Inference for Two-Way Tables
If we want to compare the proportions of successes in more than two samples or groups
OR compare the distributions of a single categorical variable across several populations
or treatments -> chi-square test for homogenity
● Present data in a two-way table
○ Can be used to compare distributions of a single categorical variable
○ Can also be used to summarize relationships between two categorical
variables
To determine if there is a convincing evidence of an association between two categorical
variables -> chi-square test for independence
Tests for Homogeneity
Stating Hypotheses
●
●
Null hypothesis (in general) says that there is no difference in the true
distribution of a categorical variable in the populations of interest or for the
treatments in an experiment
Alternative hypothesis (in general) says that there is a difference in the
distributions but does not specify that nature of that difference
○ Alternative does not state that each distribution is different from each of
the others
■ Alternative will be true even if just one of the true distributions is
different from the others
Restaurant example: does background music influence what customers buy? ->
experiment in a restaurant compared 3 randomly assigned treatments (no music,
French accordion music, Italian string music) + recorded # of customers who ordered
French, Italian, and other entrees
● Hₒ : there is no difference in the true distributions of entrees ordered at this
restaurant when no musics, French accordion music, or Italian string music is
played
○ Can also say -> the distribution of a categorical variable is the same for
each of several populations or treatments
○ prefer “no difference” because it’s more consisten with the language we
used in previous sig tests
● Hₐ : there is a difference in the true distributions of entrees ordered at this
restaurant when no music, French accordion music, or Italian string music is
played
○ Any difference among the three observed distributions of entree ordered
is evidence against the null and for the alternative
Relative frequency bar graphs comparing the distributions of entrees ordered for
difference music conditions
●
●
Type of entree that customers order seems to differ considerably across the three
music treatments
Orders of Italian entrees are very low (1.3%) when French music is playing but
higher when Italian music (22.6%) or no music (13.1%) is playing
●
French entrees seem popular as they are ordered often under all music
conditions, but more often when French music is playing
● For all three music treatments, the percent of Other entrees ordered was similar
Do the differences in these distributions provide convincing evidence that background
music affects customer behavior at this restaurant? OR Is it plausible that the
background music has no effect on customer behavior and that these differences are
due to chance involved in the random assignment of treatments?
● We have to know how likely it is to get differences this big or bigger when the
null is true -> need a P-value
We could compare many pairs of proportions -> ending up with many tests and many Pvalues
● This is a bad idea -> performing multiple tests on the same data increases the
probability that we make a Type I error in at least one of the tests
It’s also a bad idea to pick out one large difference from the two-way table and then
perform a sig test as if it were the only comparison we had in mind -> P-hacking
● A test comparing the proportions of French entrees ordered under the no music
and French accordion music treatments show that the difference is statistically
significant (z = 2.06, P = 0.039)
● But the proportions of Italian entrees ordered for the no music and Italian string
music treatments do not differ significantly (z = 1.61, P = 0.107)
○ Reporting the results of the first tests wouldn’t tell the whole story
The problem of how to do many comparisons at once without increasing the overall
probability of a Type I error is common in stats
We can:
● Perform an overall tests to see if there is convincing evidence of any differences
among the parameters that we want to compare
○ Uses chi-square tests statistic
● When the overall tess shows there IS convincing evidence of a difference ->
perform a detailed follow up analysis to decide which of the parameters differ +
estimate how large the differences are
Expected Counts and the Chi-Square Test Statistic
●
●
Hₒ : there is no difference in the distribution of a categorical variable for several
populations or treatments
Hₐ : there is a difference in the distribution of a categorical variable for several
populations or treatments
Compare the observed counts in a two-way table with the counts we would expect if the
null was true
Calculating expected counts for a Chi-Square Test Based on Data in a Two-Way Table
When the null is true, the expected count in any cell of a two-way table is
●
Expected count =
𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝 (𝑝𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝)
𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝
We then compare the observed counts with the expected counts using the chi-square
statistic
●
𝑝
2
=∑
(𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝 − 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝)
2
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝
Back to restaurant example: null -> no difference in distribution of entrees ordered
when no music, French accordion music, or Italian string music is played
Find expected counts by assuming that null is true
● Out of the 243 entrees ordered, 99 were French -> so we expect the same
proportion of French entrees to be ordered under all 3 music conditions (99/243
= 0.4074)
○ No music (84 entrees): 84 (0.4074) = 34.22 expected French entrees
○ French music (75 entrees): 75 (0.4074) = 30.56 expected French entrees
○ Italian music (84 entrees): 84 (0.4074) = 34.22 expected French entrees
● Out of the 243 entrees ordered, 31 were Italian -> so we expect the same
proportion of Italian entrees to be ordered under all 3 music conditions (31/243 =
0.1276)
○ No music (84 entrees): 84 (0.1276) = 10.72 expected Italian entrees
○ French music (75 entrees): 75 (0.1276) = 9.57 expected Italian entrees
○ Italian music (84 entrees): 84 (0.1276) = 10.72 expected Italian entrees
● Out of the 243 entrees ordered, 113 were Other-> so we expect the same
proportion of Other entrees to be ordered under all 3 music conditions (113/242 =
0.465)
○ No music (84 entrees): 84 (0.4654) = 39.06 expected Other entrees
○ French music (75 entrees): 75 (0.465) = 34.88 expected Other entrees
○ Italian music (84 entrees): 84 (0.465) = 39.06 expected Other entrees
●
●
Values for no music and Italian music are the same -> 84 total entrees ordered
under each condition -> expect the distributions of entree choice to be the same
Example: we found expected count of French entrees when on music was
99
) - 34.22
243
84 (99)
99 (84)
or 243 =
243
playing by doing 84 (
○
Rewrite as
34.22
Chi-square statistic
● Sum is over all the cells (9 total)
●
𝑝
2
2 = (20−34.22)
34.22
+
(29−30.56) 2
(35−39.06) 2
+.
.
.
=
30.56
39.06
0.52 + 2.33 + … + 0.42 = 18.28
Female president example: does the gender of an interviewer affect the responses to a
suruvery question?, half of the 100 males were randomly assigned to be asked “would
you vote for a female president” by a female interviewer, the other half by a male
interviewer
State the appropriate null and alternative hypothesis
●
●
Hₒ : there is no difference in the true distributions of response to this question
when asked by a male interviewer and when asked by a female interviewer for
subjects like these
Hₐ : there is a difference in the true distributions of response to this question
when asked by a male interviewer and when asked by a female interviewer for
subjects like these
Show the calculation for the expected count in the Male/Yes cell + provide a complete
table of expected counts
●
Expected count for Male/Yes cell is
50 (69)
=
100
34.5
●
Calculate the value of the chi-square test statistic
●
𝑝
2
2 = (30−24.5)
34.5
+
(39−34.5) 2
+. ..=
34.5
4.25
Conditions and P-values
Conditions for Performing a Chi-Square Test for Homogeneity
●
●
●
Random: the data comes from a random sample from the poulation of interest
10%: when sampling without replacement, n < 0.10 N for each sample
Large Counts: all EXPECTED counts are at least 5
Think of the chi-square test statistic 𝑝 2as a measure of how much the observed counts
deviate from the expected counts
● Large values of 𝑝 2are evidence against the null and in favor of the alternative
P-value measures the stregnth of the evidence
● When conditions are met -> P-values for a chi-square test for homogeneity come
from a chi-square distribution
● Df = (number of rows -1) (number of columns - 1)
In calc: use matrices
● 2nd -> x^-1 which is matrix -> edit -> choose A
● Enter dimensions of matrix as the dimensions of the table
● Same locations as in the table
● Stat -> test -> 𝑝 2test -> observed is matrix A and expected is matrix B
○ With calc be sure to name the procedure 𝑝 2test for homogeneity, report
the test statistic, degrees of freedom, and p-value
Back to restaurant example: conditions are met
● Random: the three treatments were assigned at random
● Large Counts: all expected counts are at least 5 (see table)
●
10%: doesn’t need to be checked because the researchers were not sampling
without replacement from some population of interest -> performed an
experiment using customers who happened to be in the restaurant at the same
time
What are the degrees of freedom for the distribution?
●
Df = (3-1) (3-1) = 4
What is the P-value?
● P (𝑝 2> 18.28) w/ df = 4 is 0.0011
○ Because the P-value of 0.0011 < ⍺ = 0.05, we reject the null. There is
convincing evidence that there is a difference in the true distributions of
entrees ordered at this restaurant when no music, French accordion music, or
Italian string music is played
Back to female president example:
Verify that the conditions for inference are met
●
●
●
Random: treatments were randomly assigned
Large Counts: all expected counts are greater than or equal to 5 (see table of
expected counts)
10%: don’t need to check because they aren’t ranodmly selecting subjects from
some population
Find the P-value
●
●
Df = (3-1)(2-1) = 2
P (𝑝 2> 4.25) = 0.119
Interpret the P-value
● Assuming that the gender of the interviewer doesn’t affect responses to this
question, there is a 0.119 probability of observing differences in the
distributions of reponses as large or as larger than those in this study by chance
alone
What conclusion would you draw?
● Because the P-value of 0.119 > ⍺ = 0.05, we fail to reject the null. There is not
convincing evidence of a difference in the true distributions of response to this
question when asked by a male interviewer and when asked by a female interviewer
for subjects like these
The Chi-Square Test for Homogeneity
To compare the distribution of a categorical variable in several populations or for
several treatments
Suppose the conditions are met. To perform a test of Hₒ : there is no difference in the
distribution of a categorical variable for several populations or treatments
● Compute the chi-square test statistic
○ Sum is over all cells in the two-way table
● P-value is the area to the right of 𝑝 2 under the chi-square density curve with
degrees of freedom = (number of rows - 1) (number of columns - 1)
Speaking english example:survey residents of Australia, UK, and US “how important do
you think it is to be able to speak English?”
Do these data provide convincing evidence at the ⍺ = 0.05 level that the distributions of opinion about
speaking English differ for residents of Australia, the UK, and the US?
State:
●
●
Hₒ : there is no difference in the true distributions of opinion about speaking
English for residents of Australia, the UK, and the US
Hₐ : there is a difference in the true distributions of opinion about speaking
English for residents of Australia, the UK, and the US
We’ll use ⍺ = 0.05
●
Plan:
Chi-square test for homogeneity
● Random: independent random samples of residents from the three countries
● 10%: 1000 < 10% of all Australian residents, 1460 < 10% of all UK residents, 1003 <
10% of all US residents
● Large Counts: all expected counts are > 5
○ Can’t just say this but have to SHOW the expected counts
Do:
2
2 = (690−741.8)
741.8
(1177−1083.1) 2
+. ..
1083.1
●
Test statistic: 𝑝
●
●
Df = (4-1)(3-1) = 6
P (𝑝 2> 68.57) with df = 6 is approximately 0
+
= 68.57
Conclude:
● Because the p-value of approximately 0 < ⍺ = 0.05, we reject the null. There is
convincing evidence that there is a difference in the true distributions of opinion
about speaking English for residents of Australia, the UK, and the US
○ There is some evidence for the alternative because the observed counts
differ from the expected counts
What if we want to compare several populations?
● Many studies involve comparing the proportion of successes for each of several
populations or reatments
○ Two-sample z test allows us to test the null hypothesis which states that p1
and p2 are the same (the two proportions of successes for the two
populations or treatments)
○ Chi-square test for homogeneity allows us to test the null hypothesis
which states that p1 = p2 = … = pk (no difference in the proportions of
successes for the k populations or treatments) against the alternative
hypohesis (at least two of the true prportions are different)
■ It’s not ALL the proportions are different
■ The opposite of all proportions are equal is some of the proportions
are not equal
● Chi-square test for homogeneity compares the distribution of a categorical
variable for any number of populations or treatments
○ If the test allows us to reject the null hypothesis of no difference -> it’s time
to examine the differences in detail
■ Identify the cells that contribute the most to the chi-square statistic
■
●
Describe how the observed and expected counts differ in those
categories (note direction of the difference)
Restaurant example: significant differences among the distributoins of entrees
ordered under each of the three music conditions
○
○
The two components that contribute the most to the chi-square statistic
are Italian entrees with French music and Italian entrees with Italian
music -> orders of Italian entrees are far below what we expect when
French music is playing and far above what we expect when Italian music
is playing
■ Orders of Italian entrees are strongly affected by Italian and French
music
Relationships Between Two Categorical Variables
Two-tables can summarize data from different types of studies. We can compare the
distribution of a categorical variable for several populations or treatments and use the
chi-square test for homogeneity to perform inference in such settings.
A two-way table can also be made when a SINGLE random sample of individuals is
chosen from a SINGLE population and then classified based on TWO categorical
variables -> we want to analyze the relationship between the variables
Stating Hypotheses
We are interested in whether the sample data provide convincing evidence that the
variables have an association in the population
● Does knowing the value of one variable help predict the value of the other
variable for individuals in the population?
● To determine if evidence from a sample is convincing -> perform a chi-square
test for independence
Null -> there is NO association between the two categorical variables in the population of
interest
●
No association = knowin the value of one variable does not help us predict the
value of the other (the variables are INDEPENDENT)
Alternative -> there IS an aossication between the variables
Expected Counts
Compare the observed counts in a two-way table with the expected counts if the null is
true
● If we assume the null is true -> we assume the two categorical variables of
interest are independent
● Therefore, we can use the definition of independent events to calculate the
expected counts
○ P (A | B) = P (A)
𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝 (𝑝𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝)
● Expected count =
𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝
Conditions and Calculations
10% and Large Counts conditions for chi-square test for independence are the same as
the test for homogeneity
● Difference in the Random condition -> a test for independence uses data from a
single random sample (but a test for homogeneity uses data from two or more
independent random samples or from two or more groups in a randomized
experiment)
Conditions for Performing a Chi-Square Test for Independence
●
●
●
Random: the data comes from a random sample from the poulation of interest
10%: when sampling without replacement, n < 0.10 N for each sample
Large Counts: all EXPECTED counts are at least 5
When the conditions are met -> use familiar 𝑝 2 test statistic to measure the strength of
the association between the number of variables in the sample
● P-values come from a chi-square distribution with
○ Df = (number of rows - 1) x (number of columns -1)
The Chi-Square Test for Independence
When we want to test for an association between two categorical variables in a
population, we use a chi-square test for independence
Suppose the conditions are met. To perform a test of Hₒ : there is no association between
two categorical variables in the population of interest
● Compute the chi-square test statistic
○ Sum is over all cells in the two-way table
● P-value is the area to the right of 𝑝 2 under the chi-square density curve with
degrees of freedom = (number of rows - 1) (number of columns - 1)
Anger and heart disease example: are people who are prone to sudden anger more likely
to develop heart disease? Observational study with random sample of 8474 people with
normal blood pressure, they were free of heart disease at the beginning of the study,
took the Spielberger Trait Anger Scale to measure how prone they are to sudden anger
and recorded if each individual developed coronary heart disease
●
●
As the anger score increases, so does the percent who suffer heart disease
A much higher percent of people in the high anger category developed CHD
(4.27%) than in the moderate (2.33%) and low (1.70%) anger categories
Does the data provide convincing evidence of an association between the variables in
the larger population? Or is it plausible that there is no association between the
variables in the population and that we osbreved an association in the sample by chance
alone?
Testing the hypotheses:
● Hₒ : there is no association between anger level and heart-disease status in the
population of people with normal blood pressure
○ OR anger and heart-disease status are independent in the populatin of
people with normal blood pressure
● Hₐ : there is an association between anger level and heart-disease status in the
population of people with normal blood pressure
○ OR anger and heart-disease status are not independent in the populatin of
people with normal blood pressure
Expected counts:
● Null -> there is no association between anger level and heart disease status in the
population of interest
○ If we assume that this is true -> anger level and CHD status are
independent
○ Use def of independent events
○
●
Chance process in this case is randomly selected a person and recording
their anger level and CHD status
Start with “Yes” and “low anger”
○ 190/8474 people had CHD so P (Yes) = 190/8474
○ If null is true and anger level and CHD status are independent -> knowing
that the the selected individual is low anger doesn’t change the probability
that this person develops CHD
■ P (Yes | low anger) = P (Yes) = 190/8473 = 0.02242
■ 31110 low anger (0.02242) = 69.73 -> we expect that 69.73 of the 3110
low-anger people in the study would get CHD
𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝 (𝑝𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝) 190 (3110)
■
=
= 69.73
𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝
8474
●
Conditions
● Random: the data comes from a random sample of 8474 people with normal
blood pressure
● 10%: it is reasonable to assume that 8474 < 10% of all people with normal blood
pressure
● Large Counts: all the expected counts are at least 5 (see previous table)
Test statistic and P-value
2
(53−69.73) 2
69.73
(110−106.08) 2
+. ..=
106.08
●
𝑝
●
●
With df = (2-1) x (3-1) = 2 -> P-value = 0.00032
=
+
16.077
Because the P-value of 0.00032 < ⍺ = 0.05, we reject the null. We have convincing
evidence of an association between anger level and heart-disease status in the
population of people with normal blood pressure
Follow-up analysis
● The two cells that contribued most of the chi-square test statistic were Low anger
+ Yes (4.014) and High anger + Yes (11.564)
● A much smaller number of low-anger people developed CHD than expected and a
much larger number of high-anger people got CHD than expected
● BUT we cannot conclude that proneness to anger causes heart disease
○ The anger and heart-disease study is an observational study (not an
experiment)
○
It’s not surprising that some other variables are confounded with anger
level (ex: people prone to anger are more likely to be me who drink and
smoke)
■ Don’t know if increased rate of heart disease among those with high
anger levels in the study is because of their anger or because or
their drinking and smoking or because of their gender
Snowmobiles example: 1526 sample of visitors to Yellowstone, asked “do you belong to
an environmental club?” and “what is your experience with a snowmobile: own, rent,
never used?”
Do the data provide convincing evidence of an association between environmental club
status and type of snowmobile use in the population of winter visitors to Yellowstone
National Park?
State:
●
●
Hₒ : there is no association between environmental club status and type of
snowmobile use in the pouplation of winter vistors to Yellowstone
○ OR environmental club status and type of snowmbile use are independent
in the population of winter visitors to Yellowstone
Hₐ : there is an association between environmental club status and type of
snowmobile use in the pouplation of winter vistors to Yellowstone
○ OR environmental club status and type of snowmbile use are not
independent in the population of winter visitors to Yellowstone
We’ll use ⍺ = 0.05
●
Plan:
Chi-square test for independence
● Random: random sample of 1526 winter visitors to Yellowstone
● 10%: it is reasonable to assume that 1526 < 10% of all winter visitors to Yellowstone
● Large Counts: all expected counts are at least 5
○ Can’t just say this but have to SHOW the expected counts
●
Because the observed counts differ from the expected counts, there is some
evidence for the alternative hypothesis
●
Test statistic: 𝑝
●
Df = (3-1)(2-1) = 2
●
P (𝑝
○
Do:
2
2 = (445−525.7)
525.7
+
(212−131.3) 2
+. ..
131.3
= 116.6
116.6) with df = 2 is 4.82 x 10⁻ ²⁶
Remember that P-value are probabilities that MUST be between 0 and 1 so
double check with the calculator if it looks like your P-value is greater than
1
2>
Conclude:
● Because the p-value of approximately 0 < ⍺ = 0.05, we reject the null. We have
convincing evidence of an association between environmental club status and type of
snowmobile use in the population of winter visitors to Yellowstone National Park
○ There is some evidence for the alternative because the observed counts
differ from the expected counts
Using Chi-Square Tests Wisely
The chi-square test for homogeneity and the chi-square test for independence start with
a two-way table of observed counts, calculate the test statistic, degrees of freedom, and
P-value in the same way BUT
● A chi-square test for homogeneity tesets whether the distribution of a categorical
variable is the same for each of several populations or treatments
● The chi-square test for independence tests whether two categorical variables are
associated in some population of interest
Examples:
In tests for homogeneity -> one set of totals are known by the researchers before the
data are collected
● Only one set of totals was left to vary
● Select independent random samples (or randomly assign treatment) and
compare the distribution of a single categorical variable
●
●
Gender of interviewer: Abby and Mia decided in advance to randomly assign 50
subjects to each treatment
○
English speaking: researchers knew in advance that they would survey 1000
people from Australia, 1460 from the U.K, and 1003 from the U.S.
In tests for independence -> neither set of totals is known in advance
● Select one sample and record the values of two variables for each member
● Yellowstone: researchers didn’t know anything about either variable ahead of
time, only knew that they would survey 1526 visitors
○
It sucks because it is common to see questions about association when a test for
homogeneity applies and to see questions about differences between proportions or the
distribution of a variable when a test for independence applies
● **consider how the data was produced**
● If data comes from two or more independent random samples or treatment
groups in a randomized experiment -> do a chi-square test for homogeneity
● If data comes from a single random sample w/ individuals classified according to
two categorical variables -> use chi-square test for independence
Scary movies: are men and women equally likely to suffer lingering fear from watching
scary movies as children? Asked random smaple of 117 college students to write
narrative accounts of exposure to scary movies before the age of 13, more than ¼ said
that some of the fright symptoms are still present when awake
Assume that conditions for performing inference are met. Output for a chi-square test
using these data is shown
Should a chi-square test for independence or a chi-square test for homogeneity be used
in this setting?
●
Chi-square test for independence -> data was produced using a single random
sample of college students then classified according to two variables (gender and
whether they had fright symptoms)
○ Chi-squared test for homogeneity requires independent random samples
from each population
State an appropriate pair of hypotheses for researchers to test in this setting
●
●
Hₒ : there is no association between gender and whether or not college students
have lingering fright symptoms
Hₐ : there is an association between gender and whether or not college students
have lingering fright symptoms
Which cell contributes the most to the chi-square test statistic? In what way does this
cell differ from what the null hypothesis suggests?
●
Men + having fright symptoms account for the largest component of the chisquaered test statistic
○ Far fewer men in the sample admitted t lingering fright symptoms than we
would expect if the null was true
Interpret the P-value. What conclusion would you draw at ⍺ = 0.01?
● If there is no association between gender and whether or not college students
have lingering fright symptoms, there is a 0.045 probability of obtaining an
association as strong or stronger than the one observed in the random sample
of 117 students
●
Because the P-value of 0.045 > ⍺ = 0.05, we fail to reject the null. We do not have
convincing evidence that there is an association between gender and whether or not
college students have lingering fright symptoms
What if we want to compare two proportions?
Shopping example: second hand stores have become popular, study of customers’
attitudes toward second hand stores interviewed separate random samples of shoppers
at two second hand stores of the same chain in different cities
Do the data provide convincing evidence of a difference in the distributions of gender
for shoopers at these two stores -> chi-square test for homogeneity
Hypotheses:
● Hₒ : there is no difference in the distributinos of gender for shoppers at these two
stores
● Hₐ : there is a difference in the distributions of gender for shoppers at these two
stores
A difference in distribution of gender -> a difference in the true proportion of female
shoppers at the two stores -> can use two-sample z test
Hypotheses:
● Hₒ : Pa - Pb = 0
● Hₐ: Pa - Pb ≠ 0
○ Pa and Pb are the true proportions of female shoppers at Store A and Store
B
When performing both tests, the textbook obtained the same P-values
● The chi-square test statistic was the same as the square of the two-sample z
statistic
The chi-square test for homogeneity based on a 2 x 2 table is equivalent to a two-sample
test for p1 - p2 with a two-sided alternative hypotheses HOWEVER
● If the two-way table is larger than 2 x 2 and alternative hypothesis is one-sided ->
use two-sample z test for difference in proportions rather than chi-square test
● If the table is 2 x 2 and the alternative hypothesis is one-sided -> use a twosample z test for a difference in proportions rather than chi-square test
● If the table is 2 x 2 and you want to construct a confidence interval for a
difference in proportions -> only option is a two-sample z interval
Grouping Quantitative Data Into Categories
Can convert a quantitative variable to a categorical variable by grouping together
intervals of values
Example: researchers surveyed independent random samples of shoppers at two
second hand stores of the same chain in different cities, table summarizes data on the
incomes of the shoppers in the two samples
● Personal income is a quantitative variable but by grouping the values of this
variable we created a categorical variable
● Can then carry out chi-square test for homogeneity because data comes from
independent random samples of shoppers at the two stores
○ Comparing distribution of income for shoppers at the two stores would
give more info than just comparing mean inocmes
What if some of the expected cell counts are less than 5?
One strategy is to collapse the table by combining two or more rows or columns to
ensure that all expected counts are greater than or equal to 5 -> will allow us to run the
chi-square test we want to run
● Ries also said it is ok if less than 20% of the cells have an expected count that is
less than 5
Download