Chapter 12 Two by two table for survey questions “What is your Sex” and “Do you believe female and male college sports programs should receive equal financial support?” Tabulated statistics: Sex, Equal Support Rows: Sex Columns: EqualSupport No Yes All Female 16 14.41 95 85.59 111 100.00 Male 8 40.00 12 60.00 20 100.00 All 24 18.32 107 81.68 131 100.00 Cell Contents: Count % of Row Commonly the explanatory variable is placed on the row and the response variable on the vertical. So for this data Gender would be explaining the attitude toward monetary support. Conditional percents for rows are comprised by the taking a cell total divided by the row total; for columns this conditional percent is taken by cell total divided by column total. For example, of Females the conditional percent who said “Yes” to women receiving equal support is 95/111 = 85.59%; for Men this conditional percent who said “Yes” is 12/20 = 60%. As to columns, of those that said “Yes” the conditional percent that were Female is 95/107 = 88.79% Probability, Risk, and Odds If we randomly selected a student, what is probability that the student said “Yes”? 107/131 = 0.817 What is the risk that a randomly selected student said “Yes?” Again, 107/131 = 0.817 What is the proportion of those who said “Yes”? Again, 107/131 = 0.817 What is the percent of those who said “Yes”? Similarly 107/131 = 0.817 times 100% = 81.7% As we can see, these four are equivalently saying the same thing just using different phrasing. What are the odds a person says “Yes”? This is now 107/ 24 which is approximately 4.5 to 1. Questions to class: What is probability that a Male says “Yes”? 12/20 = 0.60 What is the risk that a Male says “Yes”? 12/20 = 0.60 1 How about the proportion and percentage of Males who say Yes? Proportion also 12/20 = 0.60 while the percentage is 60% What are the odds that a Male says Yes? 12/8 or 3 to 2 (note that even odds of 1 to 1 would imply that for every male saying ‘Yes’ there is a male saying ‘No’. How are the odds interpreted? We would say then that the odds of a Male saying Yes is 3 to 1; in other words, for every five males roughly three say Yes and two say No. Relative Risk is when two risks are compared. For instance the relative risk that Females say Yes to Males that say Yes would be calculated by the ratio of risk for Females saying Yes to the risk of Males saying Yes. The Female-Yes risk is 95/111 and the Male-Yes risk is 12/20. This would make the relative risk (95/111)/(12/20). This computes to 0.8559/0.60 about 1.4 The interpretation is that the risk of agreeing that females deserve equal pay in sports as males is about 1.4 times greater for Females than for Males. Baseline risk is the risk to which another risk is compared to in a relative risk. That is it is the risk in the denominator of the relative risk. In the previous example, Female risk of Yes was compare to the Male risk of Yes putting the male risk in the denominator. This makes the baseline risk the risk of Males saying Yes. Odds ratio is similar to relative risk except it is the ratio of two odds. The odds ratio that Females say Yes to Males say Yes is (95/16)/ (12/8) = 4.0 meaning the odds of Females saying Yes is about 4 times the odds of Males saying Yes. Important To Interpret When reading a report that provides a risk it is extremely important to know or be given the baseline risk. For instance, say a study reported that women who binge drink are 3 times more likely to develop liver disease than women who do not drink. This may alarm some females, understandably. But what if the risk of getting liver disease for women who do not binge drink (i.e. the baseline risk) was 0.001 or 1 out of a 1000 women who do not binge drink are likely to develop liver disease. This would mean that risk of females who do binge drink developing liver disease is 3/1000 or 0.003 Not that alarming! To calculate: Odds are the (number of interest with trait)/(number of interest without trait) Risk is the (number of interest with trait)/(over total number of interest) 2 Chapter 13 Ho: The two variables are not related in population (i.e. they are independent) Ha: The two variables are related in the population (i.e. they are dependent) Keeping with the Sex and Equal Support from chapter 12, how would tables look as they went from independent to dependent? Tabulated statistics: Sex, Equal Support Rows: Sex Columns: EqualSupport No Yes All Female 16 14.41 95 85.59 111 100.00 Male 8 40.00 12 60.00 20 100.00 All 24 18.32 107 81.68 131 100.00 Cell Contents: Count % of Row Pearson Chi-Square = 7.413, DF = 1, P-Value = 0.006 What will indicate a relationship is a change in direction in the conditional percents from one level of the explanatory to another level. As in the third 2x2 table the percentage changes of Yes for Females is very different than that for Males. But what determines HOW Different? We apply what is called a Chi-square Test of Independence. This is done by first taking the observed values to compute what you would expect to see if the two variables were independent. This is done by creating expected counts in each cell of the table by using the row and column tables compared to the overall total. Finding Expected Table (i.e. based on the data collected, these are the counts we would expect to find if the variables were independent) Female Male Total NO (111x24)/131 = 20.34 (20x24)/131 = 3.66 24 Yes Total (111x107)/131 111 = 90.66 (20x107)/131 20 = 16.34 107 131 Observed Counts from Data Collection 3 Female Male Total NO 16 8 24 Yes 95 12 107 Total 111 20 120 Expected Counts if Two Variables Independent – based on observed data Female Male Total NO 20.34 3.66 24 Yes 90.66 16.34 107 Total 111 20 120 This computes to, if independent, we would expect to see conditional percentages for Females and Males of 22.5% and 77.5% (for No and Yes, respectively). That is, based on this sample data if a student’s sex and attitude toward equal support were independent, we would have expected about 22.5% of the students to say No and about 77.5% say Yes. The next step is to statistical compare what we would observed to what we expected. To do this we calculate a chi-square test statistic and associated p-value (for this class the p-value will be provided). The general formula is: 2 (Observed Expectect) 2 Expected Applying that to this data: (16 20.34) 2 (95 90.66) 2 (8 3.66) 2 (12 16.34) 2 7.41 20.34 90.66 3.66 16.34 2 The p-value for this comes to 0.006 Decision and conclusion: As with the test for a correlation the p-value is the probability the sample data would produce a result given the null hypothesis is true. If the p-value is small then this indicates that the variables are related. Again will use 0.05 as a level of significance to compare with our p-value. Here the p-value of 0.006 is less than 0.05 so we reject the null hypothesis and conclude that sex and attitude about equal level of support are related where Females are more likely to believe in equal monetary support for the sexes than Males. NOTE: Keep in mind that the results can only be extended to sample group unless the data was randomly selected and we cannot conclude a causal relationship unless random assignment was involved. Possible Effect of Confounding Variables in Categorical Data Relationships Sometimes when a confounding variable is present (i.e. a variable that affects the relationship but was not considered in the study) the statistical results may not reflect the true relationship. This can lead to what is called Simpson’s Paradox. Simpson’s Paradox occurs when combined data 4 leads to one result but when we separate the data by another lurking variable we get opposite results. Example: Following a 1972 Supreme Court ruling to eliminate racial disparities in capital cases, several studies were conducted to follow-up on sentences of those found guilty of capital offenses. One such study considered homicides in Florida between 1976 and 1977 to examine if a relationship existed between race and assignment of the death penalty (see Michael Ravelet, American Sociological Review, 1981 vol. 46). Overall table: Defendant/Death Penalty White Black Total NO 141 149 290 Yes 19 17 36 Total 160 166 326 % Yes 11.9% 10.2% From this table it shows that White defendants that were guilty were slightly more likely to get the death penalty than Black guilty defendants. However, a lurking variable victim’s race provides a different look: Victim: White Defendant/Death Penalty White Black Total NO 132 52 184 Yes 19 11 30 151 63 214 Victim: Black Defendant/Death Penalty White Black Total NO 9 97 106 Yes 0 6 6 9 103 112 Total % Yes 12.6 17.5% Total % Yes 0% 5.85 As we can see, in both instances when the victim’s race is considered the percentage of White defendant’s who receive the death penalty is now lower than the percentage of defendant’s who were Black. 5