Document

advertisement
Lecture 5
Chi-Squares II (other categorical
measures of association)
Measures of Association for
Categorical Data
•
One problem with the hypothesis testing framework that we’ll discuss
later is the fact that any observed difference has the potential to be
statistically significant, provided the sample size is large enough.
•
Hence, the results of a hypothesis test could be depicted as a test to
determine whether your sample size is large enough to detect the true
difference between two populations.
•
A more informative way of describing observed differences relies on
effect size indices (statistics that attempt to depict differences in a metric
that provides substantive meaning to the observed difference).
•
In the context of chi-square tests, the appropriate effect size indices are
measures of association (statistics that depict the magnitude of the
relationship between the two variables in the table).
Measures of Association for
Categorical Data
•
For example, consider the two tables below. Both have comparable chisquare and p-values, but most people would say that the one on the left
shows evidence of a stronger relationship than the one on the right
(particularly given the expected values shown in parentheses). Measures
of association for these tables illustrate this difference.
30
19
(24.5) (24.5)
300
(281)
262
(281)
19
30
(24.5) (24.5)
262
(281)
300
(281)
c2 = 4.94, p = .03
c2 = 5.14, p = .02
Measures of Association for
Categorical Data
•
•
There are four relevant measures of association for nominal (categorical)
data. The first three are interpreted similarly, and the last has a different
interpretation.
• Contingency coefficient
• Phi Coefficient
• Cramer’s phi coefficient
• Odds ratio
• Risk ratio
The contingency coefficient cannot have a maximum value of 1, so its
interpretation is somewhat difficult.
Measures of Association for
Categorical Data
• The phi coefficient and Cramer’s phi coefficient have a range from 0 to 1
with 0 indicating no association and 1 indicating a perfect relationship
between the two variables in the contingency table.
• As a rule, values less than .2 indicate a negligible relationship, values from
.2 up to .5 indicate an important relationship, and values from .5 up to 1
indicate a very strong relationship.
• The phi coefficient only applies to 2 x 2 tables, and Cramers phi (aka
Cramer’s V) applies to any two-way table. As you can see, the equations
for the three indices are similar.
Measures of Association for
Categorical Data
• Going back to our original example, let’s apply
what we now know…
300
(281)
262
(281)
262
(281)
300
(281)
c2 = 4.94, p = .03
Measures of Association for
Categorical Data
• Going back to our original example, let’s apply
what we now know…
30
19
(24.5) (24.5)
19
30
(24.5) (24.5)
c2 = 5.14, p = .02
Measures of Association for
Categorical Data
•
The introduction of these correlation-based (we’ll learn more
about correlations in Chapter 9) statistics introduces a new way of
thinking about the null hypothesis of the Pearson chi-square test
of association.
•
Recall that our null hypothesis is that there is no relationship
between the two variables depicted by the table and that we
represent this symbolically as
•
Ho: ρij = ρ i+1,j, for all i .
•
That is, the proportion of observations in one row equal the
proportion of observations in another row for each column of the
table.
Measures of Association for
Categorical Data
•
Recall that this is a test of association and that Cramer’s V is a
measure of association. Also, note that the null hypothesis implies
that there is no relationship between the two variables in the table
(i.e., that the proportion of observations in an individual cell is
dictated by the marginal frequencies for the two variables).
•
Hence, we can restate the null hypothesis for the Pearson chisquare test of association as
•
Ho: Cramer = 0
•
That is, there is no relationship between the two variables in the
table, which is equivalent to saying that the proportion of
observations in one row equal the proportion of observations in
another row for each column of the table.
Measures of Association for
Categorical Data
•
The odds ratio (OR) is a little more difficult to understand, but it also has
a straightforward interpretation. Note that the odds of an event is
represented as a fraction 2/1, sometimes represented 2:1 or 2 to 1.
•
The odds represents the likelihood of one event instead of the converse
of that event.
•
For example, you could describe the odds of a 1, 2, 3, or 4 on the roll of a
six-sided die rather than something other than a 1 through 4 (i.e., a 5 or a
6) as 2/1 or simply 2 (4 to 2, simplified to 2 to 1).
This means that an outcome of 1 through 4 is twice as likely as an
outcome of 5 or 6. Hence, the odds of a 5 or 6 rather than its converse is
2/4 or 1/2 or simply 0.5a 5 or 6 is half as likely as a 1 through 4.
•
•
Note that in this example, the ratio for the odds is a simplification of a
ratio of probabilities. The probability of a 1 through 4 is (4/6) and the
probability of a 5 or 6 is (2/6). So, the odds of 1 through 4 is
4
6  4 22
2 2 1
6
Measures of Association for
Categorical Data
•
An odds ratio, on the other hand, is a ratio of odds compared between
two groups.
•
Let’s compare the odds of a fair die resulting in a 1 through 4 outcome to
the odds of a “loaded” die resulting in a 1 through 4 outcome as the ratio
of the odds for each event.
•
For the fair die, the odds would be 2/1 or 2 (as stated on the previous
slide).
•
For the loaded die, the odds might be 4/1—loading the die has doubled
the chances of seeing a 1 through 4.
•
Hence, the odds ratio between an outcome of 1 through 4 for a fair
versus a loaded die would be (2/1)/(4/1)=2/4, which equals 0.50. That
means that the odds of a fair die showing a 1 through 4 is only 50% as
large as the odds of a loaded die showing a 1 through 4.
Measures of Association for
Categorical Data
•
Alternatively, you can turn around this odds ratio by inverting the original odds
ratio. Hence, 1 / .50 = 2, which is the odds ratio between an outcome of 1-4 for a
loaded die versus a fair die. We can confirm this by constructing the odds ratio
from the actual odds of each event:
•
(4/1)/(2/1) =4/2= 2. Hence, the odds of a 1-4 on an loaded die is 2 times larger
than the odds of a 1-4 on a fair die.
Example in out text…
Measures of Association for
Categorical Data
Aspirin
Placebo
Outcome
Heart Attack No Heart Attack
104
10933
189
10845
293
21778
11037
11034
22071
Table 6.4 The effect of aspirin on the incidence of heart attacks
Odds of heart attack given that participants did not take aspirin:
OddNoAspirin=189/10845=0.0174
Odds of heart attack given that participants did take aspirin:
OddAspirin=104/10933=0.0095
OR= OddNoAspirin/OddAspirin=0.0174/0.0095=1.83
•
Thus, the odds of having a heart attack given you didn’t take aspirin are
1.83 times greater than the odds of having a heart attack with aspirin.
Measures of Association for
Categorical Data
•
•
•
An alternative calculation is simply dividing the cross products. Again we
want to divide the odds of the treatment group (experimental group) by the
odds of the no-treatment group (control group)
Example in out text:
Table 6.4 The effect of aspirin on the incidence of heart attacks
AD/BC or BC/AD  will yield different OR’s and different interpretations
Outcome
Heart Attack No Heart Attack
Aspirin
104
10933
11037
Placebo
189
10845
11034
293
21778
22071
Odds of heart attack given that participants did take aspirin:
OddNoAspirin=189(10933)=2066337
Odds of heart attack given that participants did not take aspirin:
OddAspirin=104(10845)=1127880
OR= OddNoAspirin/OddAspirin=2066337/1127880=1.83
Measures of Association for
Categorical Data
•
•
Another commonly seen measure of association is relative risk (RR).
The relative risk is a measure of the relative size of the probabilities of
two events: p1 / p2. We know that the probability of a 1 through 4 on a
fair die is 4/6 (or 2/3 = .67). From the odds ratio, for the loaded die, we
can see that the probability of a 1 through 4 is 4/5 (p/1-p = 4, so p =
.80). Hence, the relative risk of a 1 through 4 on a fair versus a loaded
die is 2/3 / 4/5 or .83. That is, the likelihood of a 1 through 4 on a fair
die is 83% of the likelihood of a 1 through 4 on a loaded die. This is
different than the odds ratio for these events which equals .50.
Measures of Association for
Categorical Data
•
Back to the example on the previous slide:
•
Risk of heart attack given that participants did not take aspirin:
RiskNoAspirin=189/11034=0.0171
Odds of heart attack given that participants did take aspirin:
RiskAspirin=104/11037=0.0094
Risk Difference = .0171-.0094=.0077
RR= RiskNoAspirin/RiskAspirin=0.0171/0.0094=1.819
•
Therefore, the risk of having a heart attack given you did not take
aspirin is 1.82 times as likely than if you had taken aspirin
•
Note: The odds ratio is only relevant for 2 x 2 tables
Measures of Association for
Categorical Data
• Some quick notes on risk and odds
– Risk is intuitive but limited
• It is future oriented and inapplicable in retrospective
studies
– Odds is less intuitive
• But it is applicable in retrospective and prospective
studies
• Can make odds more intuitive with some simple
transformations
Measures of Association for
Categorical Data
• Example
Aspirin
Placebo
Outcome
Heart Attack No Heart Attack
104
10933
189
10845
293
21778
11037
11034
22071
• The odds of having a heart attack given you took aspirin are
.54 times the odds of having a heart attack given you were
in the placebo group
• The probability of having a heart attack given you were in
the aspirin group is OR/(1+OR) = .54/1.54=.35
• The probability of having a heart attack given you were in
the placebo group is 1.83/2.83 = .65
• .65+.35 = 1
Measures of Association for
Categorical Data
•
A quick reminder…
•
All of the tests that we present in this course will place certain requirement, expectations, or
assumptions on the data in order for the test interpretation to be valid. For the chi-square test,
the assumptions are:
•
Independence: We assume that observations are independent of one another. That is, the value
of any one observation does not depend on or is not influenced by the value of other
observations in the dataset. Don’t confuse this with the test of independence, which focuses on
independence between variables (not observations).
•
One way to ensure independence among observations is to verify that the categories constitute
mutually exclusive codes (an individual cannot be a member of multiple categories).
•
Another way to ensure independence among observations is to use simple random sampling
from the population. A third way to ensure independence is to evaluate your research design to
determine whether there are opportunities for participants to interact or to for group clusters.
Measures of Association for
Categorical Data
•
Normality: Recall that the chi-square distribution can be formed by summing squared
observations from a standard normal curve (z-scores from a normal distribution). This
suggests that the chi-square distribution relies on a normality assumption in some
way.
Look at the tables below. If you fix the margins as indicated, there are several
configurations of allocating individuals to cells that allows you to maintain these
marginal frequencies.
–
5
5
5
15
4
6
5
15
4
5
6
15
4
4
7
15
4
3
8
15
5
5
5
15
6
4
5
15
6
5
4
15
6
6
3
15
6
7
2
15
10 10 10
10 10 10
10 10 10
10 10 10
10 10 10
3
5
7
15
3
7
5
15
3
4
8
15
3
3
9
15
3
2
10 15
7
5
3
15
7
3
5
15
7
6
2
15
7
7
1
15
7
8
0
10 10 10
10 10 10
10 10 10
10 10 10
15
10 10 10
6
4
5
15
6
3
6
15
6
2
7
15
6
1
8
15
6
0
9
15
4
6
5
15
4
7
4
15
4
8
3
15
4
9
2
15
4
10
1
15
10 10 10
10 10 10
10 10 10
10 10 10
10 10 10
20
Measures of Association for
Categorical Data
–
–
In fact, the distribution of possible values for any single cell in the
table is normally distributed, given that the sample size is large
enough and the probability of an observation falling in that cell is
not extreme.
Also, recall that the expected cell frequencies for the chi-square test
are defined as Np (total sample size times the probability of being in
that cell). Hence, the requirement of normality can be satisfied if
the expected cell frequencies are of sufficient size. A rule of thumb
is that all of the expected cell frequencies should be 5 or greater.
Measures of Association for
Categorical Data
• Sensitivity is the probability that an outcome
occurs given a positive result on some
(predictive) measure for that outcome
• Specificity is the opposite; the probability of
not having some outcome or meeting criteria
for some outcome given you screened
negatively on some predictive measure.
Cancer
No Cancer
Total
Screen Pos
9
110
18
Screen Neg
1
880
881
Total
10
990
1000
Measures of Association for
Categorical Data
Cancer
No Cancer
Total
Screen Pos
8
110
118
Screen Neg
2
880
882
Total
10
990
1000
• This data is similar to mammography data
predicting the presence and absence of
breast cancer (not real data)
• We need to consider the conditional and
marginal distributions to get at the answer of
sensitivity and specificity
Measures of Association for
Categorical Data
Cancer
No Cancer
Total
Screen Pos
8
110
118
Screen Neg
2
880
882
Total
10
990
1000
• sensitivity = 8/10 = .80 (the probability of
screening positive in the diagnostic cancer
population)
• specificity = 880/990 = .89 (the probability of
screening negative in the non-diagnostic
cancer population)
Measures of Association for
Categorical Data
• Going one step further…
• What if we wanted to use all of this
information to answer the question, “What is
the probability of having cancer, given you
screened positive for cancer?”
– Guesses?
• We can answer this with Baye’s theorem
Measures of Association for
Categorical Data
P(C) = (8 + 2)/1000 = .01; P(C’) = .99
P(+)=(8+110)/1000=.12; sensitivity=.8
or
Screen Pos
Screen Neg
Cancer
8
2
No Cancer
110
880
Total
118
882
Thoughts? Is this what you expected?
Total
10
990
1000
Measures of Association for
Categorical Data
–
When this requirement is not met, you can use exact statistic to perform
the hypothesis test. The exact statistic is based on the empirical
probability of observing a certain configuration of cell frequencies with
fixed marginal frequencies. On the last page, several such configurations
were shown. To perform an exact test, you would rank order the tables
based on the value of one of the cells, determine the probability of
observing a value in that cell equal to or less than the observed value,
and declare that probability as the p-value for your hypothesis test. For
this class, you don’t need to know how to do an exact test, but you do
need to know that it is an alternative when expected cell frequencies are
small.
•
Inclusion of Nonoccurrences: Another requirement of the chi-square
test is that all cases in the data set be included in the contingency table.
That is, the coding system must be exhaustive—it must represent all
elements of the sample.
Measures of Association for
Categorical Data
•
A slightly different index, a measure of agreement rather than
association, is coefficient kappa (κ—aka Cohen’s kappa). This index is
referred to as a measure of agreement rather than a measure of
association because it goes beyond merely indicating whether there is a
relationship between two variables—kappa actually indicates the degree
to which the categorizations of the two variables are identical.
•
Coefficient kappa is commonly used to depict the level of agreement
between two raters.
Measures of Association for
Categorical Data
•
For example, the frequency table below provides an overly optimistic measure of
association of the level of agreement between two raters. Cramer’s V for this
table equals 0.54 indicating fairly strong association. However, they only agree in
12 out of 36 cases.
rater1
rater2
Frequency‚
Percent ‚
Row Pct ‚
Col Pct ‚
0‚
1‚
2‚ Total
--------------------------------------------0 ‚
1 ‚
10 ‚
1 ‚
12
‚
2.78 ‚ 27.78 ‚
2.78 ‚ 33.33
‚
8.33 ‚ 83.33 ‚
8.33 ‚
‚ 33.33 ‚ 83.33 ‚
4.76 ‚
--------------------------------------------1 ‚
1 ‚
1 ‚
10 ‚
12
‚
2.78 ‚
2.78 ‚ 27.78 ‚ 33.33
‚
8.33 ‚
8.33 ‚ 83.33 ‚
‚ 33.33 ‚
8.33 ‚ 47.62 ‚
-----------------------------------------------------2 ‚
1 ‚
1 ‚
10 ‚
12
‚
2.78 ‚
2.78 ‚ 27.78 ‚ 33.33
‚
8.33 ‚
8.33 ‚ 83.33 ‚
‚ 33.33 ‚
8.33 ‚ 47.62 ‚
-----------------------------------------------------Total
3
12
21
36
8.33
33.33
58.33
100.00
Measures of Association for
Categorical Data
•
One measure of agreement in the table below would be to sum the relative
cell frequencies in cases where the two raters agree (e.g., 0,0; 1,1; and 2,2). In
table on the previous page, the percentage of agreement between the raters
would be 33% (12/36)--not that great.
rater1
rater2
Frequency‚
Percent ‚
Row Pct ‚
Col Pct ‚
0‚
1‚
2‚ Total
--------------------------------------------0 ‚
1 ‚
10 ‚
1 ‚
12
‚
2.78 ‚ 27.78 ‚
2.78 ‚ 33.33
‚
8.33 ‚ 83.33 ‚
8.33 ‚
‚ 33.33 ‚ 83.33 ‚
4.76 ‚
--------------------------------------------1 ‚
1 ‚
1 ‚
10 ‚
12
‚
2.78 ‚
2.78 ‚ 27.78 ‚ 33.33
‚
8.33 ‚
8.33 ‚ 83.33 ‚
‚ 33.33 ‚
8.33 ‚ 47.62 ‚
-----------------------------------------------------2 ‚
1 ‚
1 ‚
10 ‚
12
‚
2.78 ‚
2.78 ‚ 27.78 ‚ 33.33
‚
8.33 ‚
8.33 ‚ 83.33 ‚
‚ 33.33 ‚
8.33 ‚ 47.62 ‚
-----------------------------------------------------Total
3
12
21
36
8.33
33.33
58.33
100.00
Measures of Association for
Categorical Data
•
However, such an index is misleading because it ignores the fact that raters may
agree by chance. Cohen’s kappa corrects for this problem by depicting the
proportion of agreement attained beyond that attainable by chance. As shown
below, kappa gives us the proportion of agreement that was attained once the
proportion attainable by chance is removed from the actual proportion of
agreement.
Attainable by chance
Attainable beyond chance
0.00
1.00
Actual
k
31
Measures of Association for
Categorical Data
•
The computation formula for kappa demonstrates this relationship. In this formula, D
indicates the diagonal elements of the frequency table (the cells in which the raters agree).
•
The first element of the numerator indicates the sum of the number of observed
agreements. The second element of the numerator indicates that you subtract the sum of
the expected agreements from this (where the expected value is defined as the product of
the marginal frequencies and N as was the case for the chi-square test). Hence, the
numerator gives you the number of observations in agreement beyond those expected by chance.
•
The denominator takes the total number of observations and subtracts the number of
expected agreements giving us the number of observations beyond those expected to
agree given the marginal frequencies. Hence, the numerator divided by the denominator
(kappa) gives us the proportion of observations in agreement beyond those expected by
chance.
Measures of Association for
Categorical Data
•
Recall that in the table below, Cramer’s V equals 0.54 and the observed level of agreement
equals 0.33 (12 out of 36 cases). Cohen’s kappa for this table equals 0.00 indicating that the
observed level of agreement (0.33) is no better than that expected by chance alone.
rater1
rater2
Frequency‚
Percent ‚
Row Pct ‚
Col Pct ‚
0‚
1‚
2‚ Total
--------------------------------------------0 ‚
1 ‚
10 ‚
1 ‚
12
‚
2.78 ‚ 27.78 ‚
2.78 ‚ 33.33
‚
8.33 ‚ 83.33 ‚
8.33 ‚
‚ 33.33 ‚ 83.33 ‚
4.76 ‚
--------------------------------------------1 ‚
1 ‚
1 ‚
10 ‚
12
‚
2.78 ‚
2.78 ‚ 27.78 ‚ 33.33
‚
8.33 ‚
8.33 ‚ 83.33 ‚
‚ 33.33 ‚
8.33 ‚ 47.62 ‚
-----------------------------------------------------2 ‚
1 ‚
1 ‚
10 ‚
12
‚
2.78 ‚
2.78 ‚ 27.78 ‚ 33.33
‚
8.33 ‚
8.33 ‚ 83.33 ‚
‚ 33.33 ‚
8.33 ‚ 47.62 ‚
-----------------------------------------------------Total
3
12
21
36
8.33
33.33
58.33
100.00
Measures of Association for
Categorical Data
•
6.29 Dabbs and Morris (1990) examined archival data from military
records to study the relationship between high testosterone levels
and antisocial behavior in males. Of 4016 men in the Normal
Testosterone group, 10.0% had a record of adult delinquency. Of
446 men in the High Testosterone group, 22.6% had a record of
adult delinquency. Is this relationship significant?
•
6.30 What’s the odds ratio? How would you interpret it?
Measures of Association for
Categorical Data
•
According to the description, the data for this study look like:
Delinquency
No
Yes
Testosterone
High Normal
345
3614
101
402
446
4016
Total
3959
503
4462
(O  E ) 2
c 
E
(345  395.723)2 (3614  3563.277)2 (101  50.277)2 (402  452.723)2




395.723
3563.277
50.277
452.723
 64.08
2
•
The critical value for this study is χ2(1)=3.84 at 0.05 level.
Measures of Association for
Categorical Data
•
According to the description, the data for this study look like:
Delinquency
No
Yes
Testosterone
High Normal
345
3614
101
402
446
4016
Total
3959
503
4462
The odds of adult delinquency for high testosterone group is
ODDhigh=101/345=0.2928
The odds of adult delinquency for normal testosterone group is
ODDnormal=402/3614=0.1112
And the odds ratio OR=.2928/.1112=2.63
•
The odds of engaging in behaviors of adult delinquency are 2.63 times higher if
you are a member of the high testosterone group.
Download