STA 6505, Fall 2008, Homework #3 Solutions Chapter 3: 3.4acd, 3.9b, 3.11a, 3.13ab (no need to discuss how the small-sample C.I. was calculated; it is somewhat complicated) 3.4. We have the following contingency table, with explanatory variable X = race, having two levels, Black and White, and response variable Y = party affiliation, having three levels, Democrat, Independent, and Republican. Party Affiliation Race Democrat Independent Republican Black 103 15 11 White 341 105 405 a) Using X2 and G2, test the hypothesis of independence between Party Affiliation and Race. Report the p-values and interpret. Step 1: H0: ij i j , for all i = 1, 2 and j = 1, 2, 3. HA: ij i j , for some i and j. Step 2: We have n = 980, I = 2, J = 3, and we choose = 0.05. 2 3 2 3 n ˆ ij 2 nij ij Step 3: The test statistic is either X 2 or G 2 2 ln 2 nij ln , ˆ ˆ ij i 1 j 1 i 1 j 1 ij where ˆ ij nˆ ij nˆ i ˆ j ni n j for all i, j, and under the null hypothesis, either statistic has n an approximate chi-square distribution with d.f. = 2. Step 4: We will reject the null hypothesis if either X 2 22, 0.05 5.99 , G 2 22, 0.05 5.99 . Step 5: From the table above or the SAS output below, we have X 2 79.4310 with a p-value < 0.0001 , and G 2 90.3311 with a p-value < 0.0001. Comparison Between Race and Party The FREQ Procedure Table of race by party race party Frequency‚Democrat‚Independ‚Republic‚ ‚ ‚ent ‚an ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Black ‚ 103 ‚ 15 ‚ 11 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ White ‚ 341 ‚ 105 ‚ 405 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 444 120 416 Total 129 851 980 Statistics for Table of race by party Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 2 79.4310 <.0001 Likelihood Ratio Chi-Square 2 90.3311 <.0001 Mantel-Haenszel Chi-Square 1 79.3336 <.0001 Phi Coefficient 0.2847 Contingency Coefficient 0.2738 Cramer's V 0.2847 Sample Size = 980 Step 6: We reject the null hypothesis at the 0.05 level of significance. We have sufficient evidence to conclude that Race and Party Affiliation are not independent. c) Partition chi-squared into components regarding the choice between Democrat and Independent and between those two combined and Republican. Interpret. First subtable; X = Race, Y = Party Affiliation (Democrat v. Independent). Step 1: H0: ij i j , for all i = 1, 2 and j = 1, 2. HA: ij i j , for some i and j. Step 2: We have n = 564, I = 2, J = 2, and we choose = 0.05. 2 2 n 2 2 ˆ ij 2 nij ij 2 Step 3: The test statistic is either X or G 2 2 ln 2 nij ln , ˆ ˆ ij i 1 j 1 i 1 j 1 ij where ˆ ij nˆ ij nˆ i ˆ j ni n j for all i, j, and under the null hypothesis, either statistic has n an approximate chi-square distribution with d.f. = 1. Step 4: We will reject the null hypothesis if either X 2 12, 0.05 3.84 , G 2 12, 0.05 3.84 . Step 5: From the table above or the SAS output below, we have X 2 6.5350 with a p-value = 0.0106 , and G 2 7.1561 with a p-value = 0.0075. Comparison Between Race and Party The FREQ Procedure Table of race by party race party Frequency‚Democrat‚Independ‚ Total ‚ ‚ent ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Black ‚ 103 ‚ 15 ‚ 118 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ White ‚ 341 ‚ 105 ‚ 446 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 444 120 564 Statistics for Table of race by party Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 6.5350 0.0106 Likelihood Ratio Chi-Square 1 7.1561 0.0075 Continuity Adj. Chi-Square 1 5.9044 0.0151 Mantel-Haenszel Chi-Square 1 6.5234 0.0106 Phi Coefficient 0.1076 Contingency Coefficient 0.1070 Cramer's V 0.1076 The FREQ Procedure Statistics for Table of race by party Statistic Value ASE ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Gamma 0.3578 0.1299 Kendall's Tau-b 0.1076 0.0361 Stuart's Tau-c 0.0717 0.0246 Somers' D C|R 0.1083 0.0367 Somers' D R|C 0.1070 0.0362 Pearson Correlation Spearman Correlation Lambda Asymmetric C|R Lambda Asymmetric R|C Lambda Symmetric Uncertainty Coefficient C|R Uncertainty Coefficient R|C Uncertainty Coefficient Symmetric 0.1076 0.1076 0.0000 0.0000 0.0000 0.0123 0.0124 0.0123 0.0361 0.0361 0.0000 0.0000 0.0000 0.0087 0.0087 0.0087 Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control (Odds Ratio) Cohort (Col1 Risk) Cohort (Col2 Risk) Sample 2.1144 1.1417 0.5400 Size = 564 1.1789 1.0476 0.3270 3.7921 1.2442 0.8916 Step 6: We reject the null hypothesis at the 0.05 level of significance. We have sufficient evidence to conclude that Race and Party Affiliation are not independent. The phi coefficient is 0.1076, showing a relatively weak positive association between Race and Party Affiliation, when Party Affiliation includes only Democrat v. Independent. Second subtable – X = Race, Y = Party Affiliation (Democrat/Independent v. Republican). Step 1: H0: ij i j , for all i = 1, 2 and j = 1, 2. HA: ij i j , for some i and j. Step 2: We have n = 564, I = 2, J = 2, and we choose = 0.05. 2 2 n 2 2 ˆ ij 2 nij ij 2 Step 3: The test statistic is either X or G 2 2 ln 2 nij ln , ˆ ˆ ij i 1 j 1 i 1 j 1 ij where ˆ ij nˆ ij nˆ i ˆ j ni n j for all i, j, and under the null hypothesis, either statistic has n an approximate chi-square distribution with d.f. = 1. Step 4: We will reject the null hypothesis if either X 2 12, 0.05 3.84 , G 2 12, 0.05 3.84 . Step 5: From the table above or the SAS output below, we have X 2 69.9721 with a p-value < 0.0001 , and G 2 83.1750 with a p-value < 0.0001. Comparison Between Race and Party Affiliation The FREQ Procedure Table of race by party race party Frequency‚Democrat‚Republic‚ Total ‚/Indepen‚an ‚ ‚dent ‚ ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Black ‚ 118 ‚ 11 ‚ 129 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ White ‚ 446 ‚ 405 ‚ 851 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 564 416 980 Statistics for Table of race by party Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 69.9721 <.0001 Likelihood Ratio Chi-Square 1 83.1750 <.0001 Continuity Adj. Chi-Square 1 68.3822 <.0001 Mantel-Haenszel Chi-Square 1 69.9007 <.0001 Phi Coefficient 0.2672 Contingency Coefficient 0.2582 Cramer's V 0.2672 The FREQ Procedure Statistics for Table of race by party Statistic Value ASE ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Gamma 0.8138 0.0545 Kendall's Tau-b 0.2672 0.0222 Stuart's Tau-c 0.1786 0.0185 Somers' D C|R 0.3906 0.0300 Somers' D R|C Pearson Correlation Spearman Correlation Lambda Asymmetric C|R Lambda Asymmetric R|C Lambda Symmetric Uncertainty Coefficient C|R Uncertainty Coefficient R|C Uncertainty Coefficient Symmetric 0.1828 0.2672 0.2672 0.0000 0.0000 0.0000 0.0623 0.1090 0.0792 0.0188 0.0222 0.0222 0.0000 0.0000 0.0000 0.0117 0.0191 0.0143 Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control (Odds Ratio) 9.7411 5.1758 18.3332 Cohort (Col1 Risk) 1.7454 1.6065 1.8963 Cohort (Col2 Risk) 0.1792 0.1014 0.3167 Sample Size = 980 Step 6: We reject the null hypothesis at the 0.05 level of significance. We have sufficient evidence to conclude that Race and Party Affiliation are not independent. In particular, the phi coefficient is 0.2672, showing a somewhat weak positive correlation between Race and Party Affiliation when Party Affiliation is dichotomized as Democrat/Independent v. Republican. d) Summarize association by constructing a 95% confidence interval for the odds ratio between Race and whether a Democrat or Republican. Interpret. If we look only at Democrats and Republicans, we have n = 860, and a 95% confidence interval for the odds of a Black person being a Democrat are 11.1210 times the odds of a White person being a Democrat. A 95% confidence interval for the odds ratio is (5.8747, 21.0524). Hence, we conclude that the odds ratio is statistically significant. Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control (Odds Ratio) 11.1210 5.8747 21.0524 Cohort (Col1 Risk) 1.9766 1.7911 2.1813 Cohort (Col2 Risk) 0.1777 0.1010 0.3129 Sample Size = 860 3.9. The table below classifies a sample of psychiatric patients by their diagnosis and by whether their treatment prescribed drugs. Diagnosis Drugs No drugs Schizophrenia 105 8 Affective disorder 12 2 Neurosis 18 19 Personality disorder 47 52 Special symptoms 0 13 b) Partition chi-squared into three components to describe differences and similarities among the diagnoses, by comparing i) the first two rows, ii) the third and fourth rows, and iii) the last row to the first and second rows combined and the third and fourth rows combined. i) The comparison of X = Diagnosis v. Y = Treatment, with X having two values: 1 = Schizophrenia and 2 = Affective Disorder. Step 1: H0: ij i j , for all i = 1, 2 and j = 1, 2. HA: ij i j , for some i and j. Step 2: We have n = 127, I = 2, J = 2, and we choose = 0.05. 2 2 Step 3: The test statistic is either X 2 i 1 j 1 where ˆ ij nˆ ij nˆ i ˆ j n ij 2 2 2 nij ˆ ij or G 2 2 ln 2 nij ln , ˆ ˆ ij i 1 j 1 ij ni n j for all i, j, and under the null hypothesis, either statistic has n an approximate chi-square distribution with d.f. = 1. Step 4: We will reject the null hypothesis if either X 2 12, 0.05 3.84 , G 2 12, 0.05 3.84 . Step 5: From the table above or the SAS output below, we have X 2 0.8917 with a p-value =0.3450 , and G 2 0.7530 with a p-value =0.3855. Relationship Between Diagnosis And Treatment The FREQ Procedure Table of diag by drug diag drug Frequency ‚Drugs ‚No Drugs‚ Total ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Schizophrenia ‚ 105 ‚ 8 ‚ 113 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Affective Disord ‚ 12 ‚ 2 ‚ 14 er ‚ ‚ ‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 117 10 127 Statistics for Table of diag by drug Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 0.8917 0.3450 Likelihood Ratio Chi-Square 1 0.7530 0.3855 Continuity Adj. Chi-Square 1 0.1750 0.6757 Mantel-Haenszel Chi-Square 1 0.8847 0.3469 Phi Coefficient 0.0838 Contingency Coefficient 0.0835 Cramer's V 0.0838 WARNING: 25% of the cells have expected counts less than 5. Chi-Square may not be a valid test. Relationship Between Diagnosis And Treatment The FREQ Procedure Statistics for Table of diag by drug Statistic Value ASE ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Gamma 0.3725 0.3648 Kendall's Tau-b 0.0838 0.1110 Stuart's Tau-c 0.0283 0.0384 Somers' D C|R 0.0721 0.0966 Somers' D R|C 0.0974 0.1296 Pearson Correlation 0.0838 0.1110 Spearman Correlation 0.0838 0.1110 Lambda Asymmetric C|R 0.0000 0.0000 Lambda Asymmetric R|C 0.0000 0.0000 Lambda Symmetric 0.0000 0.0000 Uncertainty Coefficient C|R 0.0108 0.0265 Uncertainty Coefficient R|C 0.0085 0.0211 Uncertainty Coefficient Symmetric 0.0095 0.0234 Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control (Odds Ratio) 2.1875 0.4157 11.5117 Cohort (Col1 Risk) 1.0841 0.8701 1.3506 Cohort (Col2 Risk) 0.4956 0.1166 2.1054 Sample Size = 127 Step 6: We fail to reject the null hypothesis at the 0.05 level of significance. We do not have sufficient evidence to conclude that there is a relationship between Diagnosis and Treatment when Diagnosis is dichotomized as either 1 = Schizophrenia or 2 = Affective Disorder. ii) The comparison of X = Diagnosis v. Y = Treatment, with X having two values: 1 = Neurosis and 2 = Personality Disorder. Step 1: H0: ij i j , for all i = 1, 2 and j = 1, 2. HA: ij i j , for some i and j. Step 2: We have n = 136, I = 2, J = 2, and we choose = 0.05. 2 2 n 2 2 ˆ ij 2 nij ij 2 Step 3: The test statistic is either X or G 2 2 ln 2 nij ln , ˆ ˆ ij i 1 j 1 i 1 j 1 ij where ˆ ij nˆ ij nˆ i ˆ j ni n j for all i, j, and under the null hypothesis, either statistic has n an approximate chi-square distribution with d.f. = 1. Step 4: We will reject the null hypothesis if either X 2 12, 0.05 3.84 , G 2 12, 0.05 3.84 . Step 5: From the table above or the SAS output below, we have X 2 0.0149 with a p-value =0.9029 , and G 2 0.0149 with a p-value =0.9029. Relationship Between Diagnosis And Treatment The FREQ Procedure Table of diag by drug diag drug Frequency ‚Drugs ‚No Drugs‚ Total ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Neurosis ‚ 18 ‚ 19 ‚ 37 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Personality Diso ‚ 47 ‚ 52 ‚ 99 rder ‚ ‚ ‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 65 71 136 Statistics for Table of diag by drug Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 0.0149 0.9029 Likelihood Ratio Chi-Square 1 0.0149 0.9029 Continuity Adj. Chi-Square 1 0.0000 1.0000 Mantel-Haenszel Chi-Square 1 0.0148 0.9033 Phi Coefficient 0.0105 Contingency Coefficient 0.0105 Cramer's V 0.0105 Relationship Between Diagnosis And Treatment The FREQ Procedure Statistics for Table of diag by drug Statistic Value ASE ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Gamma 0.0235 0.1927 Kendall's Tau-b 0.0105 0.0858 Stuart's Tau-c 0.0093 0.0763 Somers' D C|R 0.0117 0.0963 Somers' D R|C 0.0093 0.0764 Pearson Correlation Spearman Correlation Lambda Asymmetric C|R Lambda Asymmetric R|C Lambda Symmetric Uncertainty Coefficient C|R Uncertainty Coefficient R|C Uncertainty Coefficient Symmetric 0.0105 0.0105 0.0000 0.0000 0.0000 0.0001 0.0001 0.0001 0.0858 0.0858 0.0000 0.0000 0.0000 0.0013 0.0015 0.0014 Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control (Odds Ratio) 1.0482 0.4923 2.2318 Cohort (Col1 Risk) 1.0247 0.6934 1.5143 Cohort (Col2 Risk) 0.9777 0.6785 1.4087 Sample Size = 136 Step 6: We fail to reject the null hypothesis at the 0.05 level of significance. We do not have sufficient evidence to conclude that there is a relationship between Diagnosis and Treatment, when Diagnosis is dichotomized as either 1 = Neurosis or 2 = Personality Disorder. iii) The comparison of X = Diagnosis v. Y = Treatment, with X having three values: 1 = Schizophrenia/Affective Disorder, 2 = Neurosis/Personality Disorder, or 3 = Special Diagnosis. Step 1: H0: ij i j , for all i = 1, 2, 3 and j = 1, 2. HA: ij i j , for some i and j. Step 2: We have n = 276, I = 3, J = 2, and we choose = 0.05. 2 2 2 2 n ˆ ij 2 nij ij 2 Step 3: The test statistic is either X or G 2 2 ln 2 nij ln , ˆ ˆ ij i 1 j 1 i 1 j 1 ij where ˆ ij nˆ ij nˆ i ˆ j ni n j for all i, j, and under the null hypothesis, either statistic has n an approximate chi-square distribution with d.f. = 2. Step 4: We will reject the null hypothesis if either X 2 22, 0.05 5.99 , G 2 22, 0.05 5.99 . Step 5: From the table above or the SAS output below, we have X 2 83.8839 with a p-value < 0.0001 , and G 2 95.7691 with a p-value < 0.0001. Relationship Between Diagnosis And Treatment The FREQ Procedure Table of diagnose by drug diagnose drug Frequency ‚Drugs ‚No Drugs‚ Total ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Schizophrenia/Af ‚ 117 ‚ 10 ‚ 127 fective Disorder ‚ ‚ ‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Neurosis/Persona ‚ 65 ‚ 71 ‚ 136 lity Disorder ‚ ‚ ‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Special Symptoms ‚ 0 ‚ 13 ‚ 13 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 182 94 276 Statistics for Table of diagnose by drug Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 2 83.8839 <.0001 Likelihood Ratio Chi-Square Mantel-Haenszel Chi-Square Phi Coefficient Contingency Coefficient Cramer's V 2 1 95.7691 83.5334 0.5513 0.4828 0.5513 <.0001 <.0001 Relationship Between Diagnosis And Treatment The FREQ Procedure Statistics for Table of diagnose by drug Statistic Value ASE ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Gamma 0.8852 0.0394 Kendall's Tau-b 0.5327 0.0421 Stuart's Tau-c 0.5263 0.0474 Somers' D C|R 0.4844 0.0401 Somers' D R|C 0.5859 0.0471 Pearson Correlation 0.5511 0.0409 Spearman Correlation 0.5435 0.0438 Lambda Asymmetric C|R 0.2021 0.1160 Lambda Asymmetric R|C 0.3714 0.0764 Lambda Symmetric 0.3034 0.0807 Uncertainty Coefficient C|R 0.2705 0.0431 Uncertainty Coefficient R|C 0.2042 0.0309 Uncertainty Coefficient Symmetric 0.2327 0.0357 Sample Size = 276 Step 6: We reject the null hypothesis at the 0.05 level of significance. We have sufficient evidence to conclude that there is a relationship between Diagnosis and Treatment, when Diagnosis is coded as either 1 = Schizophrenia/Affective Disorder or 2 = Neurosis/Personality Disorder, or 3 = Special Symptoms. If we look at the frequency table, we see that Special Symptoms are never (in the sample) treated with drugs, while Schizophrenia and Affective Disorders are most often (in the sample) treated with drugs, while Neurosis/Personality Disorder are nearly equally likely (in the sample) to be treated with or without drugs. 3.11 a) A study on educational aspirations of high school students (S. Crysdale, International Journal of Comparative Sociology, 16: 19-36, 1975) measured aspirations with the scale (some high school, high school graduate, some college, college graduate). The student counts in these categories were (9, 44, 13, 10) when family income was low, (11, 52, 23, 22) when family income was middle, and (9, 41, 12, 27) when family income was high. Test independence of Educational Aspirations and Family Income using either Pearson’s chi-square statistic or the likelihood ratio chi-square statistic. Explain the deficiency of this test for these data. The comparison of X = Family Income Level v. Y = Educational Aspiration. Step 1: H0: ij i j , for all i = 1, 2, 3 and j = 1, 2, 3, 4. HA: ij i j , for some i and j. Step 2: We have n = 273, I = 3, J = 4, and we choose = 0.05. 2 2 2 2 n ˆ ij 2 nij ij 2 Step 3: The test statistic is either X or G 2 2 ln 2 nij ln , ˆ ˆ ij i 1 j 1 i 1 j 1 ij where ˆ ij nˆ ij nˆ i ˆ j ni n j for all i, j, and under the null hypothesis, either statistic has n an approximate chi-square distribution with d.f. = 6. Step 4: We will reject the null hypothesis if either X 2 62, 0.05 12.59 , G 2 62, 0.05 12.59 . Step 5: From the table above or the SAS output below, we have X 2 8.8709 with a p-value = 0.1810 , and G 2 8.9165 with a p-value = 0.1783. Relationship Between Educational Aspiration And Family Income Level The FREQ Procedure Table of edu by inc edu inc Frequency ‚Low ‚Middle ‚High ‚ ƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Some HS ‚ 9 ‚ 11 ‚ 9 ‚ ƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ HS Grad ‚ 44 ‚ 52 ‚ 41 ‚ ƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Some Coll ‚ 13 ‚ 23 ‚ 12 ‚ ƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Coll Grad ‚ 10 ‚ 22 ‚ 27 ‚ ƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 76 108 89 Total 29 137 48 59 273 Statistics for Table of edu by inc Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 6 8.8709 0.1810 Likelihood Ratio Chi-Square 6 8.9165 0.1783 Mantel-Haenszel Chi-Square 1 4.7489 0.0293 Phi Coefficient 0.1803 Contingency Coefficient 0.1774 Cramer's V 0.1275 Relationship Between Educational Aspiration And Family Income Level The FREQ Procedure Statistics for Table of edu by inc Statistic Value ASE ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Gamma 0.1625 0.0795 Kendall's Tau-b 0.1076 0.0530 Stuart's Tau-c 0.1064 0.0525 Somers' D C|R 0.1076 0.0530 Somers' D R|C 0.1075 0.0530 Pearson Correlation 0.1321 0.0594 Spearman Correlation 0.1212 0.0600 Lambda Asymmetric C|R 0.0303 0.0418 Lambda Asymmetric R|C 0.0000 0.0000 Lambda Symmetric 0.0166 0.0230 Uncertainty Coefficient C|R 0.0150 0.0099 Uncertainty Coefficient R|C 0.0134 0.0088 Uncertainty Coefficient Symmetric 0.0141 0.0094 Sample Size = 273 Step 6: We fail to reject the null hypothesis at the 0.05 level of significance. We do not have sufficient evidence to conclude that there is a relationship between Family Income Level and Educational Aspiration. The test used here assumes that the two categorical variables are nominal. In fact, both of these variables may be considered to be ordinal. Hence, there may be a more appropriate test to use for the relationship. 3.13. The first table shown below shows the results of a retrospective study comparing radiation therapy with surgery in treating cancer of the larynx. The response variable indicates whether the cancer was controlled for at least two years following treatment. The SAS program for analyzing the data, using Fisher’s exact test and calculating 95% C.I.’s for the odds ratio, is listed next, followed by the SAS output. Surgery Radiation therapy Cancer controlled 21 15 Cancer not controlled 2 3 proc format; value outfmt 1 = "Cancer Controlled " 2 = "Cancer Not Controlled"; value trtfmt 1 = "Surgery " 2 = "Radiation"; data severe; input treat outcome count; format outcome outfmt. treat trtfmt.; cards; 1 1 21 1 2 2 2 1 15 2 2 3 ; proc freq; weight count; tables treat*outcome / norow nocol nopercent; exact fisher or / alpha=0.05; ; run; SAS output: The SAS System The FREQ Procedure Table of treat by outcome treat outcome Frequency ‚Cancer C‚Cancer N‚ Total ‚ontrolle‚ot Contr‚ ‚d ‚olled ‚ ƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Surgery ‚ 21 ‚ 2 ‚ 23 ƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Radiation ‚ 15 ‚ 3 ‚ 18 ƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 36 5 41 Statistics for Table of treat by outcome Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 0.5992 0.4389 Likelihood Ratio Chi-Square 1 0.5948 0.4406 Continuity Adj. Chi-Square 1 0.0860 0.7694 Mantel-Haenszel Chi-Square 1 0.5845 0.4445 Phi Coefficient 0.1209 Contingency Coefficient 0.1200 Cramer's V 0.1209 WARNING: 50% of the cells have expected counts less than 5. Chi-Square may not be a valid test. Fisher's Exact Test ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Cell (1,1) Frequency (F) 21 Left-sided Pr <= F 0.8947 Right-sided Pr >= F 0.3808 Table Probability (P) 0.2755 Two-sided Pr <= P 0.6384 The SAS System The FREQ Procedure Statistics for Table of treat by outcome Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control (Odds Ratio) 2.1000 0.3116 14.1523 Cohort (Col1 Risk) 1.0957 0.8601 1.3957 Cohort (Col2 Risk) 0.5217 0.0973 2.7981 Odds Ratio (Case-Control Study) ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Odds Ratio 2.1000 Asymptotic Conf Limits 95% Lower Conf Limit 0.3116 95% Upper Conf Limit 14.1523 Exact Conf Limits 95% Lower Conf Limit 0.2089 95% Upper Conf Limit 27.5522 Sample Size = 41 a) Report and interpret the p-value for Fisher’s exact test with (i) Ha: > 1. Explain how the p-value is calculated. For this directional hypothesis test, the p-value is 0.3808. The test statistic for Fisher’s exact test is the cell frequency in the 1-1 cell, which under the null hypothesis has a hypergeometric distribution with parameters n = 41, n1+ = 23, and n+1 = 36. The possible values of this cell frequency are positive integers from 18 through 23. The table of the distribution is given below: N11 P(N11 = n11) 18 0.0449 19 0.2127 20 0.3616 21 0.2755 22 0.0939 23 0.0114 The observed value for the 1-1 cell frequency was 21. The p-value for this hypothesis test is then the sum of the last three probabilities in the table. (ii) Ha: 1. Explain how the p-value is calculated. For the non-directional hypothesis test, the p-value is 0.6384. The p-value is calculated by summing all of those probabilities in the table that are no greater than the probability of the observed cell frequency. In this case, 0.6384 = 0.2755 + 0.0939 + 0.0114 + 0.2127 + 0.0449. b) Interpret the confidence interval for . In this case, we are 95% confident that the odds that Surgery results in control of the cancer are estimated to be between 0.3316 and 14.1523 times the odds that Radiation results in control of the cancer. Since the C.I. includes 0, we cannot be confident that Surgery is more likely to result in control of the cancer than Radiation.