TUTORIAL 5 1. Determine association between the number of physician visits by women during the first trimester of pregnancy (FTV) and the risk of having a low weight baby A. Reminder of our dataset: proc freq; tables low*ftv; run; The FREQ Procedure Table of low by ftv low ftv Frequency‚ Percent ‚ Row Pct ‚ Col Pct ‚ 0‚ 1‚ 2‚ 3‚ 4‚ 6‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 0 ‚ 64 ‚ 36 ‚ 23 ‚ 3 ‚ 3 ‚ 1 ‚ 130 ‚ 33.86 ‚ 19.05 ‚ 12.17 ‚ 1.59 ‚ 1.59 ‚ 0.53 ‚ 68.78 ‚ 49.23 ‚ 27.69 ‚ 17.69 ‚ 2.31 ‚ 2.31 ‚ 0.77 ‚ ‚ 64.00 ‚ 76.60 ‚ 76.67 ‚ 42.86 ‚ 75.00 ‚ 100.00 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 1 ‚ 36 ‚ 11 ‚ 7 ‚ 4 ‚ 1 ‚ 0 ‚ 59 ‚ 19.05 ‚ 5.82 ‚ 3.70 ‚ 2.12 ‚ 0.53 ‚ 0.00 ‚ 31.22 ‚ 61.02 ‚ 18.64 ‚ 11.86 ‚ 6.78 ‚ 1.69 ‚ 0.00 ‚ ‚ 36.00 ‚ 23.40 ‚ 23.33 ‚ 57.14 ‚ 25.00 ‚ 0.00 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 100 47 30 7 4 1 189 52.91 24.87 15.87 3.70 2.12 0.53 100.00 In class – pooled data for women with 4 or more visits (modeled as a categorical variable): data x1; infile "c:\phd\ta\bwt.dat"; input id low age lwt race smoke ptl ht ui ftv bwt; if ftv ge 4 then ftv=4; ; proc logistic descending; class ftv (param = ref ref = '0'); model low = ftv; run; The LOGISTIC Procedure Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq 5.6804 5.7541 5.4946 4 4 4 0.2243 0.2183 0.2402 Likelihood Ratio Score Wald Type III Analysis of Effects Effect DF Wald Chi-Square Pr > ChiSq 4 5.4946 0.2402 ftv Analysis of Maximum Likelihood Estimates Standard Error Chi-Square Parameter DF Estimate Intercept ftv ftv ftv ftv 1 1 1 1 1 -0.5754 -0.6103 -0.6142 0.8630 -0.8109 1 2 3 4 0.2083 0.4026 0.4793 0.7917 1.1373 Pr > ChiSq 7.6273 2.2976 1.6422 1.1885 0.5084 Odds Ratio Estimates Point Estimate Effect ftv ftv ftv ftv 1 2 3 4 vs vs vs vs 0 0 0 0 0.543 0.541 2.370 0.444 95% Wald Confidence Limits 0.247 0.211 0.502 0.048 1.196 1.384 11.186 4.129 0.0057 0.1296 0.2000 0.2756 0.4758 Now, refit data omitting pooling and model FTV as a categorical variable: data x1; infile "c:\phd\ta\bwt.dat"; input id low age lwt race smoke ptl ht ui ftv bwt; ; proc logistic descending; class ftv (param = ref ref = '0'); model low = ftv; run; The LOGISTIC Procedure Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq 6.1858 5.9870 5.2697 5 5 5 0.2886 0.3075 0.3839 Likelihood Ratio Score Wald Type III Analysis of Effects Effect DF Wald Chi-Square Pr > ChiSq 5 5.2697 0.3839 ftv Analysis of Maximum Likelihood Estimates Parameter Intercept ftv ftv ftv ftv ftv 1 2 3 4 6 DF Estimate 1 1 1 1 1 1 -0.5754 -0.6103 -0.6142 0.8630 -0.5232 -13.8292 Standard Error Chi-Square 0.2083 0.4026 0.4793 0.7917 1.1733 1342.5 Pr > ChiSq 7.6273 2.2976 1.6422 1.1885 0.1989 0.0001 0.0057 0.1296 0.2000 0.2756 0.6556 0.9918 Odds Ratio Estimates Effect ftv 1 vs 0 ftv 2 vs 0 Point Estimate 0.543 0.541 95% Wald Confidence Limits 0.247 0.211 1.196 1.384 ftv 3 vs 0 ftv 4 vs 0 ftv 6 vs 0 2.370 0.593 <0.001 0.502 0.059 <0.001 11.186 5.909 >999.999 Model FTV as a quantitative variable: proc logistic descending; model low = ftv; run; The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standard Error Chi-Square Parameter DF Estimate Intercept ftv 1 1 -0.6868 -0.1351 0.1948 0.1567 Pr > ChiSq 12.4277 0.7432 0.0004 0.3886 Odds Ratio Estimates Effect ftv Point Estimate 0.874 95% Wald Confidence Limits 0.643 1.188 CONCLUSIONS: If fitting a logistic model using dummy variables without pooling when there is a zero cell count – proc logistic cannot calculate OR for this category (standard error is very large). Validity of resulting tests is then likely in question. When modeling FTV as a quantitative variable given linearity on the logit scale, you gain validity and have a more parsimonious model. 2. Use the dummy variable scheme in Table 1 to model the association between the number of physician visits and the risk of having a low weight infant. data x1; infile "c:\phd\ta\bwt.dat"; input id low age lwt race smoke ptl ht ui ftv bwt; D1 = 0; D2 = 0; D3 = 0; D4 = 0; if ftv eq 1 then D1 = 1; if ftv eq 2 then D1 = 1; if ftv eq 2 then D2 = 1; if ftv eq 3 then D1 = 1; if ftv eq 3 then D2 = 1; if ftv eq 3 then D3 = 1; if if if if ; ftv ftv ftv ftv ge ge ge ge 4 4 4 4 then then then then D1 D2 D3 D4 = = = = 1; 1; 1; 1; proc logistic descending; model low = D1 D2 D3 D4; run; The LOGISTIC Procedure Model Fit Statistics Criterion AIC SC -2 Log L Intercept Only Intercept and Covariates 236.672 239.914 234.672 238.992 255.200 228.992 Testing Global Null Hypothesis: BETA=0 Test Likelihood Ratio Score Wald Chi-Square DF Pr > ChiSq 5.6804 5.7541 5.4946 4 4 4 0.2243 0.2183 0.2402 Type III Analysis of Effects Effect DF Wald Chi-Square Pr > ChiSq 1 1 1 1 2.2976 0.0001 2.8354 1.5285 0.1296 0.9943 0.0922 0.2163 D1 D2 D3 D4 Analysis of Maximum Likelihood Estimates Parameter Intercept D1 D2 D3 D4 1 1 1 1 DF Estimate 1 1 1 1 1 -0.5754 -0.6103 -0.00396 1.4773 -1.6740 Standard Error Chi-Square 0.2083 0.4026 0.5523 0.8773 1.3540 Pr > ChiSq 7.6273 2.2976 0.0001 2.8354 1.5285 0.0057 0.1296 0.9943 0.0922 0.2163 Odds Ratio Estimates Point Estimate Effect D1 D2 D3 D4 1 1 1 1 vs vs vs vs 0 0 0 0 0.543 0.996 4.381 0.188 95% Wald Confidence Limits 0.247 0.337 0.785 0.013 1.196 2.940 24.453 2.664 CONCLUSIONS: Regression coefficients provide comparisons of responses from the j’th group to the j-1’th group (not the same reference category). D1: log odds ratio of having a low weight baby comparing individuals with 1 physician visit during the first trimester to individuals with 0 visits. D2: log odds ratio of having a low weight baby comparing individuals with 2 physician visits during the first trimester to individuals with 1 visit. D3: log odds ratio of having a low weight baby comparing individuals with 3 physician visits during the first trimester to individuals with 2 visits. D4: log odds ratio of having a low weight baby comparing individuals with 4 or more physician visits during the first trimester to individuals with 3 visits. Advantage: easier to observe a trend; i.e., incremental increase or decrease in odd ratios. 3. Determine if pre-treatment PSA scores are associated with whether or not a subject’s cancer is pathologically organ confined. LOGISTIC REGRESSION: data x1; input id poc psa; cards; 1 0 5.0 2 0 6.1 3 0 6.1 4 0 7.5 5 0 8.2 6 0 8.7 7 0 9.5 8 0 11.0 9 0 11.0 10 0 13.7 11 0 14.1 12 0 16.0 13 0 20.0 14 0 21.4 15 0 23.0 16 0 30.0 17 1 2.0 18 1 3.3 19 1 4.8 20 1 5.4 21 1 5.7 ; proc logistic descending;model poc=psa; The LOGISTIC Procedure Model Information Model Fit Statistics Criterion Intercept Only Intercept and Covariates 25.053 26.097 23.053 11.208 13.297 7.208 AIC SC -2 Log L Testing Global Null Hypothesis: BETA=0 Test Likelihood Ratio Score Wald Chi-Square DF Pr > ChiSq 15.8447 5.9234 1.8047 1 1 1 <.0001 0.0149 0.1791 The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standard Error Chi-Square Parameter DF Estimate Intercept psa 1 1 11.8528 -2.1477 8.9166 1.5987 Pr > ChiSq 1.7670 1.8047 0.1838 0.1791 Odds Ratio Estimates Effect psa Point Estimate 0.117 95% Wald Confidence Limits 0.005 TWO-SAMPLE T TEST: proc ttest; class poc; var psa; run; The TTEST Procedure Statistics 2.680 Variable psa psa psa Class N 0 1 16 5 Diff (1-2) Lower CL Mean 9.3889 2.3072 2.0975 Mean Upper CL Mean 13.206 4.24 8.9663 Lower CL Std Dev Std Dev 17.024 6.1728 15.835 5.292 0.9326 4.8711 7.1639 1.5566 6.4053 Statistics Variable psa psa psa Upper CL Std Dev Std Err Class 0 1 Diff (1-2) 11.088 4.473 9.3554 1.791 0.6961 3.2817 Minimum Maximum 5 2 30 5.7 T-Tests Variable Method Variances psa psa Pooled Satterthwaite Equal Unequal DF t Value Pr > |t| 19 18.3 2.73 4.67 0.0132 0.0002 MANN-WHITNEY U-TEST (WILCOXON RANK-SUM TEST): proc npar1way; class poc; var psa; run; The NPAR1WAY Procedure Wilcoxon Scores (Rank Sums) for Variable psa Classified by Variable poc Sum of Expected Std Dev Mean poc N Scores Under H0 Under H0 Score ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 0 16 214.0 176.0 12.102735 13.3750 1 5 17.0 55.0 12.102735 3.4000 Average scores were used for ties. Wilcoxon Two-Sample Test Statistic 17.0000 Normal Approximation Z One-Sided Pr < Z Two-Sided Pr > |Z| -3.0985 0.0010 0.0019 t Approximation One-Sided Pr < Z Two-Sided Pr > |Z| 0.0028 0.0057 Z includes a continuity correction of 0.5. CONCLUSIONS: Pre-treatment PSA scores are negatively associated with organ confinement of their cancer. Higher PSA scores suggest that the cancer is not pathologically organ confined while lower PSA scores suggest that the cancer is pathologically organ confined. This study has a relatively small sample size (5 individuals whose cancer is pathologically organ confined and 16 individuals whose cancer is not pathologically organ confined) and the size of the effect is large – Wald test suggests no association but is not as robust a test as the others. See Hosmer and Lemeshow (p 16) which refers to Hauck and Donner (1977). The authors examined the performance of the Wald test and found that it can behave in an aberrant manner when the sample size is small and the estimated odds ratio is far from 1 then often failing to reject the null hypothesis when the coefficient was significant. They recommend that the likelihood ratio test be used.