Document

Lecture 5 Chi-Squares II (other categorical measures of association) Measures of Association for Categorical Data • One problem with the hypothesis testing framework that we’ll discuss later is the fact that any observed difference has the potential to be statistically significant, provided the sample size is large enough. • Hence, the results of a hypothesis test could be depicted as a test to determine whether your sample size is large enough to detect the true difference between two populations. • A more informative way of describing observed differences relies on effect size indices (statistics that attempt to depict differences in a metric that provides substantive meaning to the observed difference). • In the context of chi-square tests, the appropriate effect size indices are measures of association (statistics that depict the magnitude of the relationship between the two variables in the table). Measures of Association for Categorical Data • For example, consider the two tables below. Both have comparable chisquare and p-values, but most people would say that the one on the left shows evidence of a stronger relationship than the one on the right (particularly given the expected values shown in parentheses). Measures of association for these tables illustrate this difference. 30 19 (24.5) (24.5) 300 (281) 262 (281) 19 30 (24.5) (24.5) 262 (281) 300 (281) c2 = 4.94, p = .03 c2 = 5.14, p = .02 Measures of Association for Categorical Data • • There are four relevant measures of association for nominal (categorical) data. The first three are interpreted similarly, and the last has a different interpretation. • Contingency coefficient • Phi Coefficient • Cramer’s phi coefficient • Odds ratio • Risk ratio The contingency coefficient cannot have a maximum value of 1, so its interpretation is somewhat difficult. Measures of Association for Categorical Data • The phi coefficient and Cramer’s phi coefficient have a range from 0 to 1 with 0 indicating no association and 1 indicating a perfect relationship between the two variables in the contingency table. • As a rule, values less than .2 indicate a negligible relationship, values from .2 up to .5 indicate an important relationship, and values from .5 up to 1 indicate a very strong relationship. • The phi coefficient only applies to 2 x 2 tables, and Cramers phi (aka Cramer’s V) applies to any two-way table. As you can see, the equations for the three indices are similar. Measures of Association for Categorical Data • Going back to our original example, let’s apply what we now know… 300 (281) 262 (281) 262 (281) 300 (281) c2 = 4.94, p = .03 Measures of Association for Categorical Data • Going back to our original example, let’s apply what we now know… 30 19 (24.5) (24.5) 19 30 (24.5) (24.5) c2 = 5.14, p = .02 Measures of Association for Categorical Data • The introduction of these correlation-based (we’ll learn more about correlations in Chapter 9) statistics introduces a new way of thinking about the null hypothesis of the Pearson chi-square test of association. • Recall that our null hypothesis is that there is no relationship between the two variables depicted by the table and that we represent this symbolically as • Ho: ρij = ρ i+1,j, for all i . • That is, the proportion of observations in one row equal the proportion of observations in another row for each column of the table. Measures of Association for Categorical Data • Recall that this is a test of association and that Cramer’s V is a measure of association. Also, note that the null hypothesis implies that there is no relationship between the two variables in the table (i.e., that the proportion of observations in an individual cell is dictated by the marginal frequencies for the two variables). • Hence, we can restate the null hypothesis for the Pearson chisquare test of association as • Ho: Cramer = 0 • That is, there is no relationship between the two variables in the table, which is equivalent to saying that the proportion of observations in one row equal the proportion of observations in another row for each column of the table. Measures of Association for Categorical Data • The odds ratio (OR) is a little more difficult to understand, but it also has a straightforward interpretation. Note that the odds of an event is represented as a fraction 2/1, sometimes represented 2:1 or 2 to 1. • The odds represents the likelihood of one event instead of the converse of that event. • For example, you could describe the odds of a 1, 2, 3, or 4 on the roll of a six-sided die rather than something other than a 1 through 4 (i.e., a 5 or a 6) as 2/1 or simply 2 (4 to 2, simplified to 2 to 1). This means that an outcome of 1 through 4 is twice as likely as an outcome of 5 or 6. Hence, the odds of a 5 or 6 rather than its converse is 2/4 or 1/2 or simply 0.5a 5 or 6 is half as likely as a 1 through 4. • • Note that in this example, the ratio for the odds is a simplification of a ratio of probabilities. The probability of a 1 through 4 is (4/6) and the probability of a 5 or 6 is (2/6). So, the odds of 1 through 4 is 4 6  4 22 2 2 1 6 Measures of Association for Categorical Data • An odds ratio, on the other hand, is a ratio of odds compared between two groups. • Let’s compare the odds of a fair die resulting in a 1 through 4 outcome to the odds of a “loaded” die resulting in a 1 through 4 outcome as the ratio of the odds for each event. • For the fair die, the odds would be 2/1 or 2 (as stated on the previous slide). • For the loaded die, the odds might be 4/1—loading the die has doubled the chances of seeing a 1 through 4. • Hence, the odds ratio between an outcome of 1 through 4 for a fair versus a loaded die would be (2/1)/(4/1)=2/4, which equals 0.50. That means that the odds of a fair die showing a 1 through 4 is only 50% as large as the odds of a loaded die showing a 1 through 4. Measures of Association for Categorical Data • Alternatively, you can turn around this odds ratio by inverting the original odds ratio. Hence, 1 / .50 = 2, which is the odds ratio between an outcome of 1-4 for a loaded die versus a fair die. We can confirm this by constructing the odds ratio from the actual odds of each event: • (4/1)/(2/1) =4/2= 2. Hence, the odds of a 1-4 on an loaded die is 2 times larger than the odds of a 1-4 on a fair die. Example in out text… Measures of Association for Categorical Data Aspirin Placebo Outcome Heart Attack No Heart Attack 104 10933 189 10845 293 21778 11037 11034 22071 Table 6.4 The effect of aspirin on the incidence of heart attacks Odds of heart attack given that participants did not take aspirin: OddNoAspirin=189/10845=0.0174 Odds of heart attack given that participants did take aspirin: OddAspirin=104/10933=0.0095 OR= OddNoAspirin/OddAspirin=0.0174/0.0095=1.83 • Thus, the odds of having a heart attack given you didn’t take aspirin are 1.83 times greater than the odds of having a heart attack with aspirin. Measures of Association for Categorical Data • • • An alternative calculation is simply dividing the cross products. Again we want to divide the odds of the treatment group (experimental group) by the odds of the no-treatment group (control group) Example in out text: Table 6.4 The effect of aspirin on the incidence of heart attacks AD/BC or BC/AD  will yield different OR’s and different interpretations Outcome Heart Attack No Heart Attack Aspirin 104 10933 11037 Placebo 189 10845 11034 293 21778 22071 Odds of heart attack given that participants did take aspirin: OddNoAspirin=189(10933)=2066337 Odds of heart attack given that participants did not take aspirin: OddAspirin=104(10845)=1127880 OR= OddNoAspirin/OddAspirin=2066337/1127880=1.83 Measures of Association for Categorical Data • • Another commonly seen measure of association is relative risk (RR). The relative risk is a measure of the relative size of the probabilities of two events: p1 / p2. We know that the probability of a 1 through 4 on a fair die is 4/6 (or 2/3 = .67). From the odds ratio, for the loaded die, we can see that the probability of a 1 through 4 is 4/5 (p/1-p = 4, so p = .80). Hence, the relative risk of a 1 through 4 on a fair versus a loaded die is 2/3 / 4/5 or .83. That is, the likelihood of a 1 through 4 on a fair die is 83% of the likelihood of a 1 through 4 on a loaded die. This is different than the odds ratio for these events which equals .50. Measures of Association for Categorical Data • Back to the example on the previous slide: • Risk of heart attack given that participants did not take aspirin: RiskNoAspirin=189/11034=0.0171 Odds of heart attack given that participants did take aspirin: RiskAspirin=104/11037=0.0094 Risk Difference = .0171-.0094=.0077 RR= RiskNoAspirin/RiskAspirin=0.0171/0.0094=1.819 • Therefore, the risk of having a heart attack given you did not take aspirin is 1.82 times as likely than if you had taken aspirin • Note: The odds ratio is only relevant for 2 x 2 tables Measures of Association for Categorical Data • Some quick notes on risk and odds – Risk is intuitive but limited • It is future oriented and inapplicable in retrospective studies – Odds is less intuitive • But it is applicable in retrospective and prospective studies • Can make odds more intuitive with some simple transformations Measures of Association for Categorical Data • Example Aspirin Placebo Outcome Heart Attack No Heart Attack 104 10933 189 10845 293 21778 11037 11034 22071 • The odds of having a heart attack given you took aspirin are .54 times the odds of having a heart attack given you were in the placebo group • The probability of having a heart attack given you were in the aspirin group is OR/(1+OR) = .54/1.54=.35 • The probability of having a heart attack given you were in the placebo group is 1.83/2.83 = .65 • .65+.35 = 1 Measures of Association for Categorical Data • A quick reminder… • All of the tests that we present in this course will place certain requirement, expectations, or assumptions on the data in order for the test interpretation to be valid. For the chi-square test, the assumptions are: • Independence: We assume that observations are independent of one another. That is, the value of any one observation does not depend on or is not influenced by the value of other observations in the dataset. Don’t confuse this with the test of independence, which focuses on independence between variables (not observations). • One way to ensure independence among observations is to verify that the categories constitute mutually exclusive codes (an individual cannot be a member of multiple categories). • Another way to ensure independence among observations is to use simple random sampling from the population. A third way to ensure independence is to evaluate your research design to determine whether there are opportunities for participants to interact or to for group clusters. Measures of Association for Categorical Data • Normality: Recall that the chi-square distribution can be formed by summing squared observations from a standard normal curve (z-scores from a normal distribution). This suggests that the chi-square distribution relies on a normality assumption in some way. Look at the tables below. If you fix the margins as indicated, there are several configurations of allocating individuals to cells that allows you to maintain these marginal frequencies. – 5 5 5 15 4 6 5 15 4 5 6 15 4 4 7 15 4 3 8 15 5 5 5 15 6 4 5 15 6 5 4 15 6 6 3 15 6 7 2 15 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 3 5 7 15 3 7 5 15 3 4 8 15 3 3 9 15 3 2 10 15 7 5 3 15 7 3 5 15 7 6 2 15 7 7 1 15 7 8 0 10 10 10 10 10 10 10 10 10 10 10 10 15 10 10 10 6 4 5 15 6 3 6 15 6 2 7 15 6 1 8 15 6 0 9 15 4 6 5 15 4 7 4 15 4 8 3 15 4 9 2 15 4 10 1 15 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 20 Measures of Association for Categorical Data – – In fact, the distribution of possible values for any single cell in the table is normally distributed, given that the sample size is large enough and the probability of an observation falling in that cell is not extreme. Also, recall that the expected cell frequencies for the chi-square test are defined as Np (total sample size times the probability of being in that cell). Hence, the requirement of normality can be satisfied if the expected cell frequencies are of sufficient size. A rule of thumb is that all of the expected cell frequencies should be 5 or greater. Measures of Association for Categorical Data • Sensitivity is the probability that an outcome occurs given a positive result on some (predictive) measure for that outcome • Specificity is the opposite; the probability of not having some outcome or meeting criteria for some outcome given you screened negatively on some predictive measure. Cancer No Cancer Total Screen Pos 9 110 18 Screen Neg 1 880 881 Total 10 990 1000 Measures of Association for Categorical Data Cancer No Cancer Total Screen Pos 8 110 118 Screen Neg 2 880 882 Total 10 990 1000 • This data is similar to mammography data predicting the presence and absence of breast cancer (not real data) • We need to consider the conditional and marginal distributions to get at the answer of sensitivity and specificity Measures of Association for Categorical Data Cancer No Cancer Total Screen Pos 8 110 118 Screen Neg 2 880 882 Total 10 990 1000 • sensitivity = 8/10 = .80 (the probability of screening positive in the diagnostic cancer population) • specificity = 880/990 = .89 (the probability of screening negative in the non-diagnostic cancer population) Measures of Association for Categorical Data • Going one step further… • What if we wanted to use all of this information to answer the question, “What is the probability of having cancer, given you screened positive for cancer?” – Guesses? • We can answer this with Baye’s theorem Measures of Association for Categorical Data P(C) = (8 + 2)/1000 = .01; P(C’) = .99 P(+)=(8+110)/1000=.12; sensitivity=.8 or Screen Pos Screen Neg Cancer 8 2 No Cancer 110 880 Total 118 882 Thoughts? Is this what you expected? Total 10 990 1000 Measures of Association for Categorical Data – When this requirement is not met, you can use exact statistic to perform the hypothesis test. The exact statistic is based on the empirical probability of observing a certain configuration of cell frequencies with fixed marginal frequencies. On the last page, several such configurations were shown. To perform an exact test, you would rank order the tables based on the value of one of the cells, determine the probability of observing a value in that cell equal to or less than the observed value, and declare that probability as the p-value for your hypothesis test. For this class, you don’t need to know how to do an exact test, but you do need to know that it is an alternative when expected cell frequencies are small. • Inclusion of Nonoccurrences: Another requirement of the chi-square test is that all cases in the data set be included in the contingency table. That is, the coding system must be exhaustive—it must represent all elements of the sample. Measures of Association for Categorical Data • A slightly different index, a measure of agreement rather than association, is coefficient kappa (κ—aka Cohen’s kappa). This index is referred to as a measure of agreement rather than a measure of association because it goes beyond merely indicating whether there is a relationship between two variables—kappa actually indicates the degree to which the categorizations of the two variables are identical. • Coefficient kappa is commonly used to depict the level of agreement between two raters. Measures of Association for Categorical Data • For example, the frequency table below provides an overly optimistic measure of association of the level of agreement between two raters. Cramer’s V for this table equals 0.54 indicating fairly strong association. However, they only agree in 12 out of 36 cases. rater1 rater2 Frequency‚ Percent ‚ Row Pct ‚ Col Pct ‚ 0‚ 1‚ 2‚ Total --------------------------------------------0 ‚ 1 ‚ 10 ‚ 1 ‚ 12 ‚ 2.78 ‚ 27.78 ‚ 2.78 ‚ 33.33 ‚ 8.33 ‚ 83.33 ‚ 8.33 ‚ ‚ 33.33 ‚ 83.33 ‚ 4.76 ‚ --------------------------------------------1 ‚ 1 ‚ 1 ‚ 10 ‚ 12 ‚ 2.78 ‚ 2.78 ‚ 27.78 ‚ 33.33 ‚ 8.33 ‚ 8.33 ‚ 83.33 ‚ ‚ 33.33 ‚ 8.33 ‚ 47.62 ‚ -----------------------------------------------------2 ‚ 1 ‚ 1 ‚ 10 ‚ 12 ‚ 2.78 ‚ 2.78 ‚ 27.78 ‚ 33.33 ‚ 8.33 ‚ 8.33 ‚ 83.33 ‚ ‚ 33.33 ‚ 8.33 ‚ 47.62 ‚ -----------------------------------------------------Total 3 12 21 36 8.33 33.33 58.33 100.00 Measures of Association for Categorical Data • One measure of agreement in the table below would be to sum the relative cell frequencies in cases where the two raters agree (e.g., 0,0; 1,1; and 2,2). In table on the previous page, the percentage of agreement between the raters would be 33% (12/36)--not that great. rater1 rater2 Frequency‚ Percent ‚ Row Pct ‚ Col Pct ‚ 0‚ 1‚ 2‚ Total --------------------------------------------0 ‚ 1 ‚ 10 ‚ 1 ‚ 12 ‚ 2.78 ‚ 27.78 ‚ 2.78 ‚ 33.33 ‚ 8.33 ‚ 83.33 ‚ 8.33 ‚ ‚ 33.33 ‚ 83.33 ‚ 4.76 ‚ --------------------------------------------1 ‚ 1 ‚ 1 ‚ 10 ‚ 12 ‚ 2.78 ‚ 2.78 ‚ 27.78 ‚ 33.33 ‚ 8.33 ‚ 8.33 ‚ 83.33 ‚ ‚ 33.33 ‚ 8.33 ‚ 47.62 ‚ -----------------------------------------------------2 ‚ 1 ‚ 1 ‚ 10 ‚ 12 ‚ 2.78 ‚ 2.78 ‚ 27.78 ‚ 33.33 ‚ 8.33 ‚ 8.33 ‚ 83.33 ‚ ‚ 33.33 ‚ 8.33 ‚ 47.62 ‚ -----------------------------------------------------Total 3 12 21 36 8.33 33.33 58.33 100.00 Measures of Association for Categorical Data • However, such an index is misleading because it ignores the fact that raters may agree by chance. Cohen’s kappa corrects for this problem by depicting the proportion of agreement attained beyond that attainable by chance. As shown below, kappa gives us the proportion of agreement that was attained once the proportion attainable by chance is removed from the actual proportion of agreement. Attainable by chance Attainable beyond chance 0.00 1.00 Actual k 31 Measures of Association for Categorical Data • The computation formula for kappa demonstrates this relationship. In this formula, D indicates the diagonal elements of the frequency table (the cells in which the raters agree). • The first element of the numerator indicates the sum of the number of observed agreements. The second element of the numerator indicates that you subtract the sum of the expected agreements from this (where the expected value is defined as the product of the marginal frequencies and N as was the case for the chi-square test). Hence, the numerator gives you the number of observations in agreement beyond those expected by chance. • The denominator takes the total number of observations and subtracts the number of expected agreements giving us the number of observations beyond those expected to agree given the marginal frequencies. Hence, the numerator divided by the denominator (kappa) gives us the proportion of observations in agreement beyond those expected by chance. Measures of Association for Categorical Data • Recall that in the table below, Cramer’s V equals 0.54 and the observed level of agreement equals 0.33 (12 out of 36 cases). Cohen’s kappa for this table equals 0.00 indicating that the observed level of agreement (0.33) is no better than that expected by chance alone. rater1 rater2 Frequency‚ Percent ‚ Row Pct ‚ Col Pct ‚ 0‚ 1‚ 2‚ Total --------------------------------------------0 ‚ 1 ‚ 10 ‚ 1 ‚ 12 ‚ 2.78 ‚ 27.78 ‚ 2.78 ‚ 33.33 ‚ 8.33 ‚ 83.33 ‚ 8.33 ‚ ‚ 33.33 ‚ 83.33 ‚ 4.76 ‚ --------------------------------------------1 ‚ 1 ‚ 1 ‚ 10 ‚ 12 ‚ 2.78 ‚ 2.78 ‚ 27.78 ‚ 33.33 ‚ 8.33 ‚ 8.33 ‚ 83.33 ‚ ‚ 33.33 ‚ 8.33 ‚ 47.62 ‚ -----------------------------------------------------2 ‚ 1 ‚ 1 ‚ 10 ‚ 12 ‚ 2.78 ‚ 2.78 ‚ 27.78 ‚ 33.33 ‚ 8.33 ‚ 8.33 ‚ 83.33 ‚ ‚ 33.33 ‚ 8.33 ‚ 47.62 ‚ -----------------------------------------------------Total 3 12 21 36 8.33 33.33 58.33 100.00 Measures of Association for Categorical Data • 6.29 Dabbs and Morris (1990) examined archival data from military records to study the relationship between high testosterone levels and antisocial behavior in males. Of 4016 men in the Normal Testosterone group, 10.0% had a record of adult delinquency. Of 446 men in the High Testosterone group, 22.6% had a record of adult delinquency. Is this relationship significant? • 6.30 What’s the odds ratio? How would you interpret it? Measures of Association for Categorical Data • According to the description, the data for this study look like: Delinquency No Yes Testosterone High Normal 345 3614 101 402 446 4016 Total 3959 503 4462 (O  E ) 2 c  E (345  395.723)2 (3614  3563.277)2 (101  50.277)2 (402  452.723)2     395.723 3563.277 50.277 452.723  64.08 2 • The critical value for this study is χ2(1)=3.84 at 0.05 level. Measures of Association for Categorical Data • According to the description, the data for this study look like: Delinquency No Yes Testosterone High Normal 345 3614 101 402 446 4016 Total 3959 503 4462 The odds of adult delinquency for high testosterone group is ODDhigh=101/345=0.2928 The odds of adult delinquency for normal testosterone group is ODDnormal=402/3614=0.1112 And the odds ratio OR=.2928/.1112=2.63 • The odds of engaging in behaviors of adult delinquency are 2.63 times higher if you are a member of the high testosterone group.

Document

Related documents

Products

Support

Document

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib