Methods for Two Categorical Variables – R x C Tables In previous notes we could use either Fisher’s Exact Test or the Chi-square Test to analyze contingency tables with two rows and two columns. However, to analyze contingency tables with r rows and c columns (where r > 2 and c > 2) we can only use the Chi-square Test. Usually this is referred to as a Chisquare Test of Independence. This test is used to compare two or more population proportions. Example: Prevention of deep venous thrombosis (DVT) is a critical issue in patients undergoing total hip replacement surgery. Orthopedic surgeons recognize the importance of prophylaxis in the management of their patients, but do not agree on an optimal method. In this study, three different prophylactic measures are to be compared for the prevention of a proximal DVT after total hip replacement surgery. Three independent groups of patients undergoing total hip replacements were given different prophylactics. After surgery it was noted whether patients had complications from a proximal DVT or not. The results are given in the contingency table below. Group 1 Group 2 Group 3 Total DVT Complication 76 71 69 216 No Complication 9 4 11 24 Total 85 75 80 240 Step 0: Define the research question Are prophylactic measure and DVT status independent of one another? That is, is there a relationship between prophylactic measure and DVT status? Step 1: Determine the null and alternative hypothesis H0: Prophylactic measure and DVT status are independent of one another (i.e. there is no relationship) Ha: Prophylactic measure and DVT status are not independent of one another (i.e. there is a relationship) Step 2: Calculate the test statistic and p-value We will again use the Chi-square test statistic. Therefore, we must again find expected counts in order to make sure the Chi-square test is valid. We’ll first have to find the overall percentage for DVT complications and No complications. DVT Complications No Complications 1 Once we have the overall percentages, we can find the expected counts for each cell in the contingency table. DVT Complication Group 1 Total 85 Group 2 75 Group 3 80 Total 216 No Complication 24 240 We can use JMP to find the test statistic and p-value. We’ll have to enter the data into JMP as follows: Then we’ll again choose Analyze Fit Y by X, and put Prophylactic Measure in for X, Factor and DVT Status in for Y, Response. You should then get the following output. You’ll again want to use the Pearson test statistic and p-value. Step 3: Report the conclusion in context of the research question. 2 We can also learn about the relationship between the two variables by looking at the mosaic plot. Example: A standardized procedure for determining a person’s susceptibility to hypnosis is the Stanford Hypnotic Scale, Form C (SHSS:C). Recently, a new method called the Computer-Assisted Hypnosis Scale (CAHS), which uses a computer as a facilitator of hypnosis, has been developed. Each scale classifies a person’s hypnotic susceptibility as low, medium, or high. Researchers at the University of Tennessee compared the two scales by administering both tests to each of 130 undergraduate volunteers (Psychological Assessment, March 1995). The hypnotic classifications are summarized in the contingency table given below. SHSS:C Level Low Medium High Total Low 32 11 6 49 CAHS Level Medium 14 14 16 44 High 2 6 29 37 Total 48 31 51 130 Questions: 1. Looking at the mosaic plot given below. Does is appear there is a relationship between SHSS:C level and CAHS level? Explain. 3 2. Carry out the hypothesis test to determine whether there is a relationship between SHSS:C level and CAHS level. Step 0: Define the research question Are SHSS:C level and CAHS level independent of one another? That is, is there a relationship between SHSS:C level and CAHS level? Step 1: Determine the null and alternative hypotheses Step 2: Calculate the test statistic and p-value Step 3: Report the conclusion in context of the research question 4 Example: Boles and Johnson (Journal of Addictive Diseases 2001) examined the beliefs held by adolescents regarding smoking and weight. Respondents characterized their weight into three categories: underweight, overweight, or appropriate. Smoking status was characterized according to the question “Do you currently smoke, meaning one or more cigarettes per day?” The data are given in the table below. Smoke Do Not Smoke Total Underweight 17 97 114 Overweight 25 142 167 Appropriate 96 816 912 Total 138 1055 1193 Questions: 3. Create a mosaic plot of the data. Looking at the mosaic plot, does it appear there is a relationship between an adolescent’s perception of weight and smoking status? Explain. 4. Carry out the hypothesis test to determine whether there is a relationship between perception of weight and smoking status. Step 0: Define the research question Are perception of weight and smoking status independent of each other? That is, is there a relationship between perception of weight and smoking status? Step 1: Determine the null and alternative hypotheses Step 2: Calculate the test statistic and p-value Step 3: Report the conclusion in context of the research question 5 Example: Gardemann et al. (1998) surveyed genotypes at an insertion/deletion polymorphism of the apolipoprotein B signal peptide in 2,259 men. The data are given in the following table. Ins/Ins Ins/Del Del/Del Total No Coronary Artery Disease 268 199 42 509 Coronary Artery Disease 807 759 184 1,750 Total 1,075 958 226 2,259 Questions: 5. Create a mosaic plot of the data. Looking at the mosaic plot, does it appear there is a relationship between apolipoprotein B signal peptide and coronary artery disease? Explain. 6. Carry out the hypothesis test to determine whether there is a relationship between apolipoprotein B signal peptide and coronary artery disease. Step 0: Define the research question Are apolipoprotein B signal peptide and coronary artery disease independent of each other? That is, is there a relationship between apolipoprotein B signal peptide and coronary artery disease? Step 1: Determine the null and alternative hypotheses Step 2: Calculate the test statistic and p-value 6 Step 3: Report the conclusion in context of the research question Example: A study was done to investigate the relationship between maternal drinking and congenital malformation. After the first three months of pregnancy, the women in the sample completed a questionnaire about alcohol consumption. Following childbirth, observations were recorded on presence of congenital sex organ malformations. The data are given in the contingency table below. Alcohol Consumption 0 <1 1–2 3–5 ≥6 Total Malformation Absent Present 17,066 48 14,464 38 788 5 126 1 37 1 32,481 93 Total 17,114 14,502 793 127 38 32,574 Questions: 7. Create a mosaic plot of the data. Looking at the mosaic plot, does it appear there is a relationship between day of the week and type of trap used? Explain. 8. Carry out the hypothesis test to determine whether there is a relationship between alcohol consumption and malformation status. Step 0: Define the research question Are alcohol consumption and malformation status independent of each other? That is, is there a relationship between alcohol consumption and malformation status? Step 1: Determine the null and alternative hypotheses 7 Step 2: Calculate the test statistic and p-value Step 3: Report the conclusion in context of the research question 8