Tutorial 5 Answers - Individual.utoronto.ca

advertisement
TUTORIAL 5
1.
Determine association between the number of physician visits by women during
the first trimester of pregnancy (FTV) and the risk of having a low weight baby
A.
Reminder of our dataset:
proc freq; tables low*ftv; run;
The FREQ Procedure
Table of low by ftv
low
ftv
Frequency‚
Percent ‚
Row Pct ‚
Col Pct ‚
0‚
1‚
2‚
3‚
4‚
6‚ Total
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
0 ‚
64 ‚
36 ‚
23 ‚
3 ‚
3 ‚
1 ‚
130
‚ 33.86 ‚ 19.05 ‚ 12.17 ‚
1.59 ‚
1.59 ‚
0.53 ‚ 68.78
‚ 49.23 ‚ 27.69 ‚ 17.69 ‚
2.31 ‚
2.31 ‚
0.77 ‚
‚ 64.00 ‚ 76.60 ‚ 76.67 ‚ 42.86 ‚ 75.00 ‚ 100.00 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
1 ‚
36 ‚
11 ‚
7 ‚
4 ‚
1 ‚
0 ‚
59
‚ 19.05 ‚
5.82 ‚
3.70 ‚
2.12 ‚
0.53 ‚
0.00 ‚ 31.22
‚ 61.02 ‚ 18.64 ‚ 11.86 ‚
6.78 ‚
1.69 ‚
0.00 ‚
‚ 36.00 ‚ 23.40 ‚ 23.33 ‚ 57.14 ‚ 25.00 ‚
0.00 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total
100
47
30
7
4
1
189
52.91
24.87
15.87
3.70
2.12
0.53
100.00
In class – pooled data for women with 4 or more visits (modeled as a categorical
variable):
data x1; infile "c:\phd\ta\bwt.dat";
input id low age lwt race smoke ptl ht ui ftv bwt;
if ftv ge 4 then ftv=4;
;
proc logistic descending; class ftv (param = ref ref = '0');
model low = ftv; run;
The LOGISTIC Procedure
Testing Global Null Hypothesis: BETA=0
Test
Chi-Square
DF
Pr > ChiSq
5.6804
5.7541
5.4946
4
4
4
0.2243
0.2183
0.2402
Likelihood Ratio
Score
Wald
Type III Analysis of Effects
Effect
DF
Wald
Chi-Square
Pr > ChiSq
4
5.4946
0.2402
ftv
Analysis of Maximum Likelihood Estimates
Standard
Error
Chi-Square
Parameter
DF
Estimate
Intercept
ftv
ftv
ftv
ftv
1
1
1
1
1
-0.5754
-0.6103
-0.6142
0.8630
-0.8109
1
2
3
4
0.2083
0.4026
0.4793
0.7917
1.1373
Pr > ChiSq
7.6273
2.2976
1.6422
1.1885
0.5084
Odds Ratio Estimates
Point
Estimate
Effect
ftv
ftv
ftv
ftv
1
2
3
4
vs
vs
vs
vs
0
0
0
0
0.543
0.541
2.370
0.444
95% Wald
Confidence Limits
0.247
0.211
0.502
0.048
1.196
1.384
11.186
4.129
0.0057
0.1296
0.2000
0.2756
0.4758
Now, refit data omitting pooling and model FTV as a categorical variable:
data x1; infile "c:\phd\ta\bwt.dat";
input id low age lwt race smoke ptl ht ui ftv bwt;
;
proc logistic descending; class ftv (param = ref ref = '0');
model low = ftv; run;
The LOGISTIC Procedure
Testing Global Null Hypothesis: BETA=0
Test
Chi-Square
DF
Pr > ChiSq
6.1858
5.9870
5.2697
5
5
5
0.2886
0.3075
0.3839
Likelihood Ratio
Score
Wald
Type III Analysis of Effects
Effect
DF
Wald
Chi-Square
Pr > ChiSq
5
5.2697
0.3839
ftv
Analysis of Maximum Likelihood Estimates
Parameter
Intercept
ftv
ftv
ftv
ftv
ftv
1
2
3
4
6
DF
Estimate
1
1
1
1
1
1
-0.5754
-0.6103
-0.6142
0.8630
-0.5232
-13.8292
Standard
Error
Chi-Square
0.2083
0.4026
0.4793
0.7917
1.1733
1342.5
Pr > ChiSq
7.6273
2.2976
1.6422
1.1885
0.1989
0.0001
0.0057
0.1296
0.2000
0.2756
0.6556
0.9918
Odds Ratio Estimates
Effect
ftv 1 vs 0
ftv 2 vs 0
Point
Estimate
0.543
0.541
95% Wald
Confidence Limits
0.247
0.211
1.196
1.384
ftv 3 vs 0
ftv 4 vs 0
ftv 6 vs 0
2.370
0.593
<0.001
0.502
0.059
<0.001
11.186
5.909
>999.999
Model FTV as a quantitative variable:
proc logistic descending; model low = ftv; run;
The LOGISTIC Procedure
Analysis of Maximum Likelihood Estimates
Standard
Error
Chi-Square
Parameter
DF
Estimate
Intercept
ftv
1
1
-0.6868
-0.1351
0.1948
0.1567
Pr > ChiSq
12.4277
0.7432
0.0004
0.3886
Odds Ratio Estimates
Effect
ftv
Point
Estimate
0.874
95% Wald
Confidence Limits
0.643
1.188
CONCLUSIONS:
If fitting a logistic model using dummy variables without pooling when there is a
zero cell count – proc logistic cannot calculate OR for this category (standard error
is very large). Validity of resulting tests is then likely in question. When modeling
FTV as a quantitative variable given linearity on the logit scale, you gain validity
and have a more parsimonious model.
2.
Use the dummy variable scheme in Table 1 to model the association
between the number of physician visits and the risk of having a low weight
infant.
data x1; infile "c:\phd\ta\bwt.dat";
input id low age lwt race smoke ptl ht ui ftv bwt;
D1 = 0; D2 = 0; D3 = 0; D4 = 0;
if ftv eq 1 then D1 = 1;
if ftv eq 2 then D1 = 1;
if ftv eq 2 then D2 = 1;
if ftv eq 3 then D1 = 1;
if ftv eq 3 then D2 = 1;
if ftv eq 3 then D3 = 1;
if
if
if
if
;
ftv
ftv
ftv
ftv
ge
ge
ge
ge
4
4
4
4
then
then
then
then
D1
D2
D3
D4
=
=
=
=
1;
1;
1;
1;
proc logistic descending;
model low = D1 D2 D3 D4; run;
The LOGISTIC Procedure
Model Fit Statistics
Criterion
AIC
SC
-2 Log L
Intercept
Only
Intercept
and
Covariates
236.672
239.914
234.672
238.992
255.200
228.992
Testing Global Null Hypothesis: BETA=0
Test
Likelihood Ratio
Score
Wald
Chi-Square
DF
Pr > ChiSq
5.6804
5.7541
5.4946
4
4
4
0.2243
0.2183
0.2402
Type III Analysis of Effects
Effect
DF
Wald
Chi-Square
Pr > ChiSq
1
1
1
1
2.2976
0.0001
2.8354
1.5285
0.1296
0.9943
0.0922
0.2163
D1
D2
D3
D4
Analysis of Maximum Likelihood Estimates
Parameter
Intercept
D1
D2
D3
D4
1
1
1
1
DF
Estimate
1
1
1
1
1
-0.5754
-0.6103
-0.00396
1.4773
-1.6740
Standard
Error
Chi-Square
0.2083
0.4026
0.5523
0.8773
1.3540
Pr > ChiSq
7.6273
2.2976
0.0001
2.8354
1.5285
0.0057
0.1296
0.9943
0.0922
0.2163
Odds Ratio Estimates
Point
Estimate
Effect
D1
D2
D3
D4
1
1
1
1
vs
vs
vs
vs
0
0
0
0
0.543
0.996
4.381
0.188
95% Wald
Confidence Limits
0.247
0.337
0.785
0.013
1.196
2.940
24.453
2.664
CONCLUSIONS:
Regression coefficients provide comparisons of responses from the j’th group to
the j-1’th group (not the same reference category).
D1: log odds ratio of having a low weight baby comparing individuals with 1
physician visit during the first trimester to individuals with 0 visits.
D2: log odds ratio of having a low weight baby comparing individuals with 2
physician visits during the first trimester to individuals with 1 visit.
D3: log odds ratio of having a low weight baby comparing individuals with 3
physician visits during the first trimester to individuals with 2 visits.
D4: log odds ratio of having a low weight baby comparing individuals with 4 or
more physician visits during the first trimester to individuals with 3 visits.
Advantage: easier to observe a trend; i.e., incremental increase or decrease in odd
ratios.
3.
Determine if pre-treatment PSA scores are associated with whether or not a
subject’s cancer is pathologically organ confined.
LOGISTIC REGRESSION:
data x1; input id poc psa;
cards;
1
0
5.0
2
0
6.1
3
0
6.1
4
0
7.5
5
0
8.2
6
0
8.7
7
0
9.5
8
0
11.0
9
0
11.0
10
0
13.7
11
0
14.1
12
0
16.0
13
0
20.0
14
0
21.4
15
0
23.0
16
0
30.0
17
1
2.0
18
1
3.3
19
1
4.8
20
1
5.4
21
1
5.7
;
proc logistic descending;model poc=psa;
The LOGISTIC Procedure
Model Information
Model Fit Statistics
Criterion
Intercept
Only
Intercept
and
Covariates
25.053
26.097
23.053
11.208
13.297
7.208
AIC
SC
-2 Log L
Testing Global Null Hypothesis: BETA=0
Test
Likelihood Ratio
Score
Wald
Chi-Square
DF
Pr > ChiSq
15.8447
5.9234
1.8047
1
1
1
<.0001
0.0149
0.1791
The LOGISTIC Procedure
Analysis of Maximum Likelihood Estimates
Standard
Error
Chi-Square
Parameter
DF
Estimate
Intercept
psa
1
1
11.8528
-2.1477
8.9166
1.5987
Pr > ChiSq
1.7670
1.8047
0.1838
0.1791
Odds Ratio Estimates
Effect
psa
Point
Estimate
0.117
95% Wald
Confidence Limits
0.005
TWO-SAMPLE T TEST:
proc ttest; class poc; var psa; run;
The TTEST Procedure
Statistics
2.680
Variable
psa
psa
psa
Class
N
0
1
16
5
Diff (1-2)
Lower CL
Mean
9.3889
2.3072
2.0975
Mean
Upper CL
Mean
13.206
4.24
8.9663
Lower CL
Std Dev Std Dev
17.024
6.1728
15.835
5.292
0.9326
4.8711
7.1639
1.5566
6.4053
Statistics
Variable
psa
psa
psa
Upper CL
Std Dev
Std Err
Class
0
1
Diff (1-2)
11.088
4.473
9.3554
1.791
0.6961
3.2817
Minimum
Maximum
5
2
30
5.7
T-Tests
Variable
Method
Variances
psa
psa
Pooled
Satterthwaite
Equal
Unequal
DF
t Value
Pr > |t|
19
18.3
2.73
4.67
0.0132
0.0002
MANN-WHITNEY U-TEST (WILCOXON RANK-SUM TEST):
proc npar1way; class poc; var psa; run;
The NPAR1WAY Procedure
Wilcoxon Scores (Rank Sums) for Variable psa
Classified by Variable poc
Sum of
Expected
Std Dev
Mean
poc
N
Scores
Under H0
Under H0
Score
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
0
16
214.0
176.0
12.102735
13.3750
1
5
17.0
55.0
12.102735
3.4000
Average scores were used for ties.
Wilcoxon Two-Sample Test
Statistic
17.0000
Normal Approximation
Z
One-Sided Pr < Z
Two-Sided Pr > |Z|
-3.0985
0.0010
0.0019
t Approximation
One-Sided Pr < Z
Two-Sided Pr > |Z|
0.0028
0.0057
Z includes a continuity correction of 0.5.
CONCLUSIONS:
Pre-treatment PSA scores are negatively associated with organ confinement of their cancer. Higher PSA scores
suggest that the cancer is not pathologically organ confined while lower PSA scores suggest that the cancer is
pathologically organ confined. This study has a relatively small sample size (5 individuals whose cancer is
pathologically organ confined and 16 individuals whose cancer is not pathologically organ confined) and the size of
the effect is large – Wald test suggests no association but is not as robust a test as the others. See Hosmer and
Lemeshow (p 16) which refers to Hauck and Donner (1977). The authors examined the performance of the Wald test
and found that it can behave in an aberrant manner when the sample size is small and the estimated odds ratio is far
from 1 then often failing to reject the null hypothesis when the coefficient was significant. They recommend that the
likelihood ratio test be used.
Download