STA 6505, Fall 2008, Homework #1 Solutions

advertisement
STA 6505, Fall 2008, Homework #3 Solutions
Chapter 3: 3.4acd, 3.9b, 3.11a, 3.13ab (no need to discuss how the small-sample C.I. was
calculated; it is somewhat complicated)
3.4. We have the following contingency table, with explanatory variable X = race, having two
levels, Black and White, and response variable Y = party affiliation, having three levels,
Democrat, Independent, and Republican.
Party Affiliation
Race
Democrat
Independent Republican
Black
103
15
11
White
341
105
405
a) Using X2 and G2, test the hypothesis of independence between Party Affiliation and Race.
Report the p-values and interpret.
Step 1: H0:  ij   i   j , for all i = 1, 2 and j = 1, 2, 3.
HA:  ij   i   j , for some i and j.
Step 2: We have n = 980, I = 2, J = 3, and we choose  = 0.05.
2
3
2
3 n  
ˆ ij 2
 nij 
ij
Step 3: The test statistic is either X 2  
or G 2  2 ln    2 nij ln   ,
 ˆ 
ˆ ij
i 1 j 1
i 1 j 1
 ij 
where ˆ ij  nˆ ij  nˆ i ˆ  j 
ni  n j
for all i, j, and under the null hypothesis, either statistic has
n
an approximate chi-square distribution with d.f. = 2.
Step 4: We will reject the null hypothesis if either X 2   22, 0.05  5.99 , G 2   22, 0.05  5.99 .
Step 5: From the table above or the SAS output below, we have X 2  79.4310 with a
p-value < 0.0001 , and G 2  90.3311 with a p-value < 0.0001.
Comparison Between Race and Party
The FREQ Procedure
Table of race by party
race
party
Frequency‚Democrat‚Independ‚Republic‚
‚
‚ent
‚an
‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Black
‚
103 ‚
15 ‚
11 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
White
‚
341 ‚
105 ‚
405 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total
444
120
416
Total
129
851
980
Statistics for Table of race by party
Statistic
DF
Value
Prob
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Chi-Square
2
79.4310
<.0001
Likelihood Ratio Chi-Square
2
90.3311
<.0001
Mantel-Haenszel Chi-Square
1
79.3336
<.0001
Phi Coefficient
0.2847
Contingency Coefficient
0.2738
Cramer's V
0.2847
Sample Size = 980
Step 6: We reject the null hypothesis at the 0.05 level of significance. We have sufficient
evidence to conclude that Race and Party Affiliation are not independent.
c) Partition chi-squared into components regarding the choice between Democrat and
Independent and between those two combined and Republican. Interpret.
First subtable; X = Race, Y = Party Affiliation (Democrat v. Independent).
Step 1: H0:  ij   i   j , for all i = 1, 2 and j = 1, 2.
HA:  ij   i   j , for some i and j.
Step 2: We have n = 564, I = 2, J = 2, and we choose  = 0.05.
2
2 n  
2
2
ˆ ij 2
 nij 
ij
2
Step 3: The test statistic is either X  
or G 2  2 ln    2 nij ln   ,
 ˆ 
ˆ ij
i 1 j 1
i 1 j 1
 ij 
where ˆ ij  nˆ ij  nˆ i ˆ  j 
ni  n j
for all i, j, and under the null hypothesis, either statistic has
n
an approximate chi-square distribution with d.f. = 1.
Step 4: We will reject the null hypothesis if either X 2  12, 0.05  3.84 , G 2  12, 0.05  3.84 .
Step 5: From the table above or the SAS output below, we have X 2  6.5350 with a
p-value = 0.0106 , and G 2  7.1561 with a p-value = 0.0075.
Comparison Between Race and Party
The FREQ Procedure
Table of race by party
race
party
Frequency‚Democrat‚Independ‚ Total
‚
‚ent
‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Black
‚
103 ‚
15 ‚
118
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
White
‚
341 ‚
105 ‚
446
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total
444
120
564
Statistics for Table of race by party
Statistic
DF
Value
Prob
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Chi-Square
1
6.5350
0.0106
Likelihood Ratio Chi-Square
1
7.1561
0.0075
Continuity Adj. Chi-Square
1
5.9044
0.0151
Mantel-Haenszel Chi-Square
1
6.5234
0.0106
Phi Coefficient
0.1076
Contingency Coefficient
0.1070
Cramer's V
0.1076
The FREQ Procedure
Statistics for Table of race by party
Statistic
Value
ASE
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Gamma
0.3578
0.1299
Kendall's Tau-b
0.1076
0.0361
Stuart's Tau-c
0.0717
0.0246
Somers' D C|R
0.1083
0.0367
Somers' D R|C
0.1070
0.0362
Pearson Correlation
Spearman Correlation
Lambda Asymmetric C|R
Lambda Asymmetric R|C
Lambda Symmetric
Uncertainty Coefficient C|R
Uncertainty Coefficient R|C
Uncertainty Coefficient Symmetric
0.1076
0.1076
0.0000
0.0000
0.0000
0.0123
0.0124
0.0123
0.0361
0.0361
0.0000
0.0000
0.0000
0.0087
0.0087
0.0087
Estimates of the Relative Risk (Row1/Row2)
Type of Study
Value
95% Confidence Limits
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Case-Control (Odds Ratio)
Cohort (Col1 Risk)
Cohort (Col2 Risk)
Sample
2.1144
1.1417
0.5400
Size = 564
1.1789
1.0476
0.3270
3.7921
1.2442
0.8916
Step 6: We reject the null hypothesis at the 0.05 level of significance. We have sufficient
evidence to conclude that Race and Party Affiliation are not independent. The phi coefficient is
0.1076, showing a relatively weak positive association between Race and Party Affiliation, when
Party Affiliation includes only Democrat v. Independent.
Second subtable – X = Race, Y = Party Affiliation (Democrat/Independent v. Republican).
Step 1: H0:  ij   i   j , for all i = 1, 2 and j = 1, 2.
HA:  ij   i   j , for some i and j.
Step 2: We have n = 564, I = 2, J = 2, and we choose  = 0.05.
2
2 n  
2
2
ˆ ij 2
 nij 
ij
2
Step 3: The test statistic is either X  
or G 2  2 ln    2 nij ln   ,
 ˆ 
ˆ ij
i 1 j 1
i 1 j 1
 ij 
where ˆ ij  nˆ ij  nˆ i ˆ  j 
ni  n j
for all i, j, and under the null hypothesis, either statistic has
n
an approximate chi-square distribution with d.f. = 1.
Step 4: We will reject the null hypothesis if either X 2  12, 0.05  3.84 , G 2  12, 0.05  3.84 .
Step 5: From the table above or the SAS output below, we have X 2  69.9721 with a
p-value < 0.0001 , and G 2  83.1750 with a p-value < 0.0001.
Comparison Between Race and Party Affiliation
The FREQ Procedure
Table of race by party
race
party
Frequency‚Democrat‚Republic‚ Total
‚/Indepen‚an
‚
‚dent
‚
‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Black
‚
118 ‚
11 ‚
129
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
White
‚
446 ‚
405 ‚
851
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total
564
416
980
Statistics for Table of race by party
Statistic
DF
Value
Prob
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Chi-Square
1
69.9721
<.0001
Likelihood Ratio Chi-Square
1
83.1750
<.0001
Continuity Adj. Chi-Square
1
68.3822
<.0001
Mantel-Haenszel Chi-Square
1
69.9007
<.0001
Phi Coefficient
0.2672
Contingency Coefficient
0.2582
Cramer's V
0.2672
The FREQ Procedure
Statistics for Table of race by party
Statistic
Value
ASE
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Gamma
0.8138
0.0545
Kendall's Tau-b
0.2672
0.0222
Stuart's Tau-c
0.1786
0.0185
Somers' D C|R
0.3906
0.0300
Somers' D R|C
Pearson Correlation
Spearman Correlation
Lambda Asymmetric C|R
Lambda Asymmetric R|C
Lambda Symmetric
Uncertainty Coefficient C|R
Uncertainty Coefficient R|C
Uncertainty Coefficient Symmetric
0.1828
0.2672
0.2672
0.0000
0.0000
0.0000
0.0623
0.1090
0.0792
0.0188
0.0222
0.0222
0.0000
0.0000
0.0000
0.0117
0.0191
0.0143
Estimates of the Relative Risk (Row1/Row2)
Type of Study
Value
95% Confidence Limits
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Case-Control (Odds Ratio)
9.7411
5.1758
18.3332
Cohort (Col1 Risk)
1.7454
1.6065
1.8963
Cohort (Col2 Risk)
0.1792
0.1014
0.3167
Sample Size = 980
Step 6: We reject the null hypothesis at the 0.05 level of significance. We have sufficient
evidence to conclude that Race and Party Affiliation are not independent. In particular, the phi
coefficient is 0.2672, showing a somewhat weak positive correlation between Race and Party
Affiliation when Party Affiliation is dichotomized as Democrat/Independent v. Republican.
d) Summarize association by constructing a 95% confidence interval for the odds ratio between
Race and whether a Democrat or Republican. Interpret. If we look only at Democrats and
Republicans, we have n = 860, and a 95% confidence interval for the odds of a Black person
being a Democrat are 11.1210 times the odds of a White person being a Democrat. A 95%
confidence interval for the odds ratio is (5.8747, 21.0524). Hence, we conclude that the odds
ratio is statistically significant.
Estimates of the Relative Risk (Row1/Row2)
Type of Study
Value
95% Confidence Limits
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Case-Control (Odds Ratio)
11.1210
5.8747
21.0524
Cohort (Col1 Risk)
1.9766
1.7911
2.1813
Cohort (Col2 Risk)
0.1777
0.1010
0.3129
Sample Size = 860
3.9. The table below classifies a sample of psychiatric patients by their diagnosis and by whether
their treatment prescribed drugs.
Diagnosis
Drugs
No drugs
Schizophrenia
105
8
Affective disorder
12
2
Neurosis
18
19
Personality disorder
47
52
Special symptoms
0
13
b) Partition chi-squared into three components to describe differences and similarities among the
diagnoses, by comparing i) the first two rows, ii) the third and fourth rows, and iii) the last row to
the first and second rows combined and the third and fourth rows combined.
i) The comparison of X = Diagnosis v. Y = Treatment, with X having two values: 1 =
Schizophrenia and 2 = Affective Disorder.
Step 1: H0:  ij   i   j , for all i = 1, 2 and j = 1, 2.
HA:  ij   i   j , for some i and j.
Step 2: We have n = 127, I = 2, J = 2, and we choose  = 0.05.
2
2
Step 3: The test statistic is either X  
2
i 1 j 1
where ˆ ij  nˆ ij  nˆ i ˆ  j 
n
ij
2
2
2
 nij 
 ˆ ij 
or G 2  2 ln    2 nij ln   ,
 ˆ 
ˆ ij
i 1 j 1
 ij 
ni  n j
for all i, j, and under the null hypothesis, either statistic has
n
an approximate chi-square distribution with d.f. = 1.
Step 4: We will reject the null hypothesis if either X 2  12, 0.05  3.84 , G 2  12, 0.05  3.84 .
Step 5: From the table above or the SAS output below, we have X 2  0.8917 with a
p-value =0.3450 , and G 2  0.7530 with a p-value =0.3855.
Relationship Between Diagnosis
And Treatment
The FREQ Procedure
Table of diag by drug
diag
drug
Frequency
‚Drugs
‚No Drugs‚ Total
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Schizophrenia
‚
105 ‚
8 ‚
113
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Affective Disord ‚
12 ‚
2 ‚
14
er
‚
‚
‚
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total
117
10
127
Statistics for Table of diag by drug
Statistic
DF
Value
Prob
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Chi-Square
1
0.8917
0.3450
Likelihood Ratio Chi-Square
1
0.7530
0.3855
Continuity Adj. Chi-Square
1
0.1750
0.6757
Mantel-Haenszel Chi-Square
1
0.8847
0.3469
Phi Coefficient
0.0838
Contingency Coefficient
0.0835
Cramer's V
0.0838
WARNING: 25% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Relationship Between Diagnosis
And Treatment
The FREQ Procedure
Statistics for Table of diag by drug
Statistic
Value
ASE
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Gamma
0.3725
0.3648
Kendall's Tau-b
0.0838
0.1110
Stuart's Tau-c
0.0283
0.0384
Somers' D C|R
0.0721
0.0966
Somers' D R|C
0.0974
0.1296
Pearson Correlation
0.0838
0.1110
Spearman Correlation
0.0838
0.1110
Lambda Asymmetric C|R
0.0000
0.0000
Lambda Asymmetric R|C
0.0000
0.0000
Lambda Symmetric
0.0000
0.0000
Uncertainty Coefficient C|R
0.0108
0.0265
Uncertainty Coefficient R|C
0.0085
0.0211
Uncertainty Coefficient Symmetric
0.0095
0.0234
Estimates of the Relative Risk (Row1/Row2)
Type of Study
Value
95% Confidence Limits
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Case-Control (Odds Ratio)
2.1875
0.4157
11.5117
Cohort (Col1 Risk)
1.0841
0.8701
1.3506
Cohort (Col2 Risk)
0.4956
0.1166
2.1054
Sample Size = 127
Step 6: We fail to reject the null hypothesis at the 0.05 level of significance. We do not have
sufficient evidence to conclude that there is a relationship between Diagnosis and Treatment
when Diagnosis is dichotomized as either 1 = Schizophrenia or 2 = Affective Disorder.
ii) The comparison of X = Diagnosis v. Y = Treatment, with X having two values: 1 = Neurosis
and 2 = Personality Disorder.
Step 1: H0:  ij   i   j , for all i = 1, 2 and j = 1, 2.
HA:  ij   i   j , for some i and j.
Step 2: We have n = 136, I = 2, J = 2, and we choose  = 0.05.
2
2 n  
2
2
ˆ ij 2
 nij 
ij
2
Step 3: The test statistic is either X  
or G 2  2 ln    2 nij ln   ,
 ˆ 
ˆ ij
i 1 j 1
i 1 j 1
 ij 
where ˆ ij  nˆ ij  nˆ i ˆ  j 
ni  n j
for all i, j, and under the null hypothesis, either statistic has
n
an approximate chi-square distribution with d.f. = 1.
Step 4: We will reject the null hypothesis if either X 2  12, 0.05  3.84 , G 2  12, 0.05  3.84 .
Step 5: From the table above or the SAS output below, we have X 2  0.0149 with a
p-value =0.9029 , and G 2  0.0149 with a p-value =0.9029.
Relationship Between Diagnosis
And Treatment
The FREQ Procedure
Table of diag by drug
diag
drug
Frequency
‚Drugs
‚No Drugs‚ Total
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Neurosis
‚
18 ‚
19 ‚
37
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Personality Diso ‚
47 ‚
52 ‚
99
rder
‚
‚
‚
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total
65
71
136
Statistics for Table of diag by drug
Statistic
DF
Value
Prob
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Chi-Square
1
0.0149
0.9029
Likelihood Ratio Chi-Square
1
0.0149
0.9029
Continuity Adj. Chi-Square
1
0.0000
1.0000
Mantel-Haenszel Chi-Square
1
0.0148
0.9033
Phi Coefficient
0.0105
Contingency Coefficient
0.0105
Cramer's V
0.0105
Relationship Between Diagnosis
And Treatment
The FREQ Procedure
Statistics for Table of diag by drug
Statistic
Value
ASE
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Gamma
0.0235
0.1927
Kendall's Tau-b
0.0105
0.0858
Stuart's Tau-c
0.0093
0.0763
Somers' D C|R
0.0117
0.0963
Somers' D R|C
0.0093
0.0764
Pearson Correlation
Spearman Correlation
Lambda Asymmetric C|R
Lambda Asymmetric R|C
Lambda Symmetric
Uncertainty Coefficient C|R
Uncertainty Coefficient R|C
Uncertainty Coefficient Symmetric
0.0105
0.0105
0.0000
0.0000
0.0000
0.0001
0.0001
0.0001
0.0858
0.0858
0.0000
0.0000
0.0000
0.0013
0.0015
0.0014
Estimates of the Relative Risk (Row1/Row2)
Type of Study
Value
95% Confidence Limits
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Case-Control (Odds Ratio)
1.0482
0.4923
2.2318
Cohort (Col1 Risk)
1.0247
0.6934
1.5143
Cohort (Col2 Risk)
0.9777
0.6785
1.4087
Sample Size = 136
Step 6: We fail to reject the null hypothesis at the 0.05 level of significance. We do not have
sufficient evidence to conclude that there is a relationship between Diagnosis and Treatment,
when Diagnosis is dichotomized as either 1 = Neurosis or 2 = Personality Disorder.
iii) The comparison of X = Diagnosis v. Y = Treatment, with X having three values: 1 =
Schizophrenia/Affective Disorder, 2 = Neurosis/Personality Disorder, or 3 = Special Diagnosis.
Step 1: H0:  ij   i   j , for all i = 1, 2, 3 and j = 1, 2.
HA:  ij   i   j , for some i and j.
Step 2: We have n = 276, I = 3, J = 2, and we choose  = 0.05.
2
2
2
2 n  
ˆ ij 2
 nij 
ij
2
Step 3: The test statistic is either X  
or G 2  2 ln    2 nij ln   ,
 ˆ 
ˆ ij
i 1 j 1
i 1 j 1
 ij 
where ˆ ij  nˆ ij  nˆ i ˆ  j 
ni  n j
for all i, j, and under the null hypothesis, either statistic has
n
an approximate chi-square distribution with d.f. = 2.
Step 4: We will reject the null hypothesis if either X 2   22, 0.05  5.99 , G 2   22, 0.05  5.99 .
Step 5: From the table above or the SAS output below, we have X 2  83.8839 with a
p-value < 0.0001 , and G 2  95.7691 with a p-value < 0.0001.
Relationship Between Diagnosis
And Treatment
The FREQ Procedure
Table of diagnose by drug
diagnose
drug
Frequency
‚Drugs
‚No Drugs‚ Total
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Schizophrenia/Af ‚
117 ‚
10 ‚
127
fective Disorder ‚
‚
‚
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Neurosis/Persona ‚
65 ‚
71 ‚
136
lity Disorder
‚
‚
‚
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Special Symptoms ‚
0 ‚
13 ‚
13
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total
182
94
276
Statistics for Table of diagnose by drug
Statistic
DF
Value
Prob
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Chi-Square
2
83.8839
<.0001
Likelihood Ratio Chi-Square
Mantel-Haenszel Chi-Square
Phi Coefficient
Contingency Coefficient
Cramer's V
2
1
95.7691
83.5334
0.5513
0.4828
0.5513
<.0001
<.0001
Relationship Between Diagnosis
And Treatment
The FREQ Procedure
Statistics for Table of diagnose by drug
Statistic
Value
ASE
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Gamma
0.8852
0.0394
Kendall's Tau-b
0.5327
0.0421
Stuart's Tau-c
0.5263
0.0474
Somers' D C|R
0.4844
0.0401
Somers' D R|C
0.5859
0.0471
Pearson Correlation
0.5511
0.0409
Spearman Correlation
0.5435
0.0438
Lambda Asymmetric C|R
0.2021
0.1160
Lambda Asymmetric R|C
0.3714
0.0764
Lambda Symmetric
0.3034
0.0807
Uncertainty Coefficient C|R
0.2705
0.0431
Uncertainty Coefficient R|C
0.2042
0.0309
Uncertainty Coefficient Symmetric
0.2327
0.0357
Sample Size = 276
Step 6: We reject the null hypothesis at the 0.05 level of significance. We have sufficient
evidence to conclude that there is a relationship between Diagnosis and Treatment, when
Diagnosis is coded as either 1 = Schizophrenia/Affective Disorder or 2 = Neurosis/Personality
Disorder, or 3 = Special Symptoms. If we look at the frequency table, we see that Special
Symptoms are never (in the sample) treated with drugs, while Schizophrenia and Affective
Disorders are most often (in the sample) treated with drugs, while Neurosis/Personality Disorder
are nearly equally likely (in the sample) to be treated with or without drugs.
3.11 a) A study on educational aspirations of high school students (S. Crysdale, International
Journal of Comparative Sociology, 16: 19-36, 1975) measured aspirations with the scale (some
high school, high school graduate, some college, college graduate). The student counts in these
categories were (9, 44, 13, 10) when family income was low, (11, 52, 23, 22) when family
income was middle, and (9, 41, 12, 27) when family income was high. Test independence of
Educational Aspirations and Family Income using either Pearson’s chi-square statistic or the
likelihood ratio chi-square statistic. Explain the deficiency of this test for these data.
The comparison of X = Family Income Level v. Y = Educational Aspiration.
Step 1: H0:  ij   i   j , for all i = 1, 2, 3 and j = 1, 2, 3, 4.
HA:  ij   i   j , for some i and j.
Step 2: We have n = 273, I = 3, J = 4, and we choose  = 0.05.
2
2
2
2 n  
ˆ ij 2
 nij 
ij
2
Step 3: The test statistic is either X  
or G 2  2 ln    2 nij ln   ,
 ˆ 
ˆ ij
i 1 j 1
i 1 j 1
 ij 
where ˆ ij  nˆ ij  nˆ i ˆ  j 
ni  n j
for all i, j, and under the null hypothesis, either statistic has
n
an approximate chi-square distribution with d.f. = 6.
Step 4: We will reject the null hypothesis if either X 2   62, 0.05  12.59 , G 2   62, 0.05  12.59 .
Step 5: From the table above or the SAS output below, we have X 2  8.8709 with a
p-value = 0.1810 , and G 2  8.9165 with a p-value = 0.1783.
Relationship Between Educational Aspiration
And Family Income Level
The FREQ Procedure
Table of edu by inc
edu
inc
Frequency ‚Low
‚Middle ‚High
‚
ƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Some HS
‚
9 ‚
11 ‚
9 ‚
ƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
HS Grad
‚
44 ‚
52 ‚
41 ‚
ƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Some Coll ‚
13 ‚
23 ‚
12 ‚
ƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Coll Grad ‚
10 ‚
22 ‚
27 ‚
ƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total
76
108
89
Total
29
137
48
59
273
Statistics for Table of edu by inc
Statistic
DF
Value
Prob
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Chi-Square
6
8.8709
0.1810
Likelihood Ratio Chi-Square
6
8.9165
0.1783
Mantel-Haenszel Chi-Square
1
4.7489
0.0293
Phi Coefficient
0.1803
Contingency Coefficient
0.1774
Cramer's V
0.1275
Relationship Between Educational Aspiration
And Family Income Level
The FREQ Procedure
Statistics for Table of edu by inc
Statistic
Value
ASE
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Gamma
0.1625
0.0795
Kendall's Tau-b
0.1076
0.0530
Stuart's Tau-c
0.1064
0.0525
Somers' D C|R
0.1076
0.0530
Somers' D R|C
0.1075
0.0530
Pearson Correlation
0.1321
0.0594
Spearman Correlation
0.1212
0.0600
Lambda Asymmetric C|R
0.0303
0.0418
Lambda Asymmetric R|C
0.0000
0.0000
Lambda Symmetric
0.0166
0.0230
Uncertainty Coefficient C|R
0.0150
0.0099
Uncertainty Coefficient R|C
0.0134
0.0088
Uncertainty Coefficient Symmetric
0.0141
0.0094
Sample Size = 273
Step 6: We fail to reject the null hypothesis at the 0.05 level of significance. We do not have
sufficient evidence to conclude that there is a relationship between Family Income Level and
Educational Aspiration. The test used here assumes that the two categorical variables are
nominal. In fact, both of these variables may be considered to be ordinal. Hence, there may be a
more appropriate test to use for the relationship.
3.13. The first table shown below shows the results of a retrospective study comparing radiation
therapy with surgery in treating cancer of the larynx. The response variable indicates whether
the cancer was controlled for at least two years following treatment. The SAS program for
analyzing the data, using Fisher’s exact test and calculating 95% C.I.’s for the odds ratio, is listed
next, followed by the SAS output.
Surgery
Radiation therapy
Cancer
controlled
21
15
Cancer not
controlled
2
3
proc format;
value outfmt 1 = "Cancer Controlled
"
2 = "Cancer Not Controlled";
value trtfmt 1 = "Surgery "
2 = "Radiation";
data severe;
input treat outcome count;
format outcome outfmt. treat trtfmt.;
cards;
1 1 21
1 2
2
2 1 15
2 2
3
;
proc freq;
weight count;
tables treat*outcome / norow nocol nopercent;
exact fisher or / alpha=0.05;
;
run;
SAS output:
The SAS System
The FREQ Procedure
Table of treat by outcome
treat
outcome
Frequency ‚Cancer C‚Cancer N‚ Total
‚ontrolle‚ot Contr‚
‚d
‚olled
‚
ƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Surgery
‚
21 ‚
2 ‚
23
ƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Radiation ‚
15 ‚
3 ‚
18
ƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total
36
5
41
Statistics for Table of treat by outcome
Statistic
DF
Value
Prob
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Chi-Square
1
0.5992
0.4389
Likelihood Ratio Chi-Square
1
0.5948
0.4406
Continuity Adj. Chi-Square
1
0.0860
0.7694
Mantel-Haenszel Chi-Square
1
0.5845
0.4445
Phi Coefficient
0.1209
Contingency Coefficient
0.1200
Cramer's V
0.1209
WARNING: 50% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Fisher's Exact Test
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Cell (1,1) Frequency (F)
21
Left-sided Pr <= F
0.8947
Right-sided Pr >= F
0.3808
Table Probability (P)
0.2755
Two-sided Pr <= P
0.6384
The SAS System
The FREQ Procedure
Statistics for Table of treat by outcome
Estimates of the Relative Risk (Row1/Row2)
Type of Study
Value
95% Confidence Limits
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Case-Control (Odds Ratio)
2.1000
0.3116
14.1523
Cohort (Col1 Risk)
1.0957
0.8601
1.3957
Cohort (Col2 Risk)
0.5217
0.0973
2.7981
Odds Ratio (Case-Control Study)
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Odds Ratio
2.1000
Asymptotic Conf Limits
95% Lower Conf Limit
0.3116
95% Upper Conf Limit
14.1523
Exact Conf Limits
95% Lower Conf Limit
0.2089
95% Upper Conf Limit
27.5522
Sample Size = 41
a) Report and interpret the p-value for Fisher’s exact test with
(i) Ha:  > 1. Explain how the p-value is calculated.
For this directional hypothesis test, the p-value is 0.3808. The test statistic for Fisher’s exact test
is the cell frequency in the 1-1 cell, which under the null hypothesis has a hypergeometric
distribution with parameters n = 41, n1+ = 23, and n+1 = 36. The possible values of this cell
frequency are positive integers from 18 through 23. The table of the distribution is given below:
N11
P(N11 = n11)
18
0.0449
19
0.2127
20
0.3616
21
0.2755
22
0.0939
23
0.0114
The observed value for the 1-1 cell frequency was 21. The p-value for this hypothesis test is
then the sum of the last three probabilities in the table.
(ii) Ha:   1. Explain how the p-value is calculated.
For the non-directional hypothesis test, the p-value is 0.6384. The p-value is calculated by
summing all of those probabilities in the table that are no greater than the probability of the
observed cell frequency. In this case, 0.6384 = 0.2755 + 0.0939 + 0.0114 + 0.2127 + 0.0449.
b) Interpret the confidence interval for . In this case, we are 95% confident that the odds that
Surgery results in control of the cancer are estimated to be between 0.3316 and 14.1523 times the
odds that Radiation results in control of the cancer. Since the C.I. includes 0, we cannot be
confident that Surgery is more likely to result in control of the cancer than Radiation.
Download