Lesson 11 - Topics • Statistical procedures: PROC LOGIST, REG, LIFETEST, & PHREG • Multiple logistic and linear regression • Life-table plots and Cox-regression • Programs 21-22 Logistic Regression Model a binary factor (yes/no) as a function of one or more independent variables. TOMHS Example: Smoking as a function of age, gender, race, and education DATA stat ; INFILE '~/SAS_Files/tomhsfull.data' ; INPUT @1 ptid $10. @27 age 2. @30 sex 1. @32 race 1. @49 educ 1. @51 eversmk 1. @53 nowsmk 1. @180 energy 5. ; if race = 2 then aa = 1; else aa = 0; if sex = 2 then women = 1; else women = 0; if educ in(1,2,3,4,5,6) then collgrad = 0; else if educ in(7,8,9) then collgrad = 1; if eversmk = 2 then currsmk = 2; else currsmk = nowsmk; if eversmk = 2 then currsmk = 2; else currsmk = nowsmk; Did you ever smoke cigarettes? 1 = yes, 2= no Var: eversmk Do you now smoke cigarettes? 1 = yes, 2= no Var: nowsmk Note: Second question only answered if first question is answered yes. PROC MEANS; VAR age women collgrad aa dietfat; CLASS currsmk; RUN; N currsmk Obs Variable N Mean -----------------------------------------------------1 98 age 98 52.31 women 98 0.44 collgrad 98 0.23 aa 98 0.45 2 801 age 801 55.08 women 801 0.38 collgrad 799 0.38 aa 801 0.17 ------------------------------------------------------ ODS SELECT ParameterEstimates OddsRatios PROC LOGIST; MODEL currsmk = age women collgrad aa ; RUN; Analysis of Maximum Likelihood Estimates Parameter DF Estimate Standard Error Intercept age women collgrad aa 1 1 1 1 1 1.7422 -0.0732 -0.2367 -0.6866 1.3394 1.0235 0.0189 0.2407 0.2618 0.2416 Wald Chi-Square Pr > ChiSq 2.8976 15.0704 0.9672 6.8805 30.7354 0.0887 0.0001 0.3254 0.0087 <.0001 Odds Ratio Estimates Effect age women collgrad aa Point Estimate 0.929 0.789 0.503 3.817 95% Wald Confidence Limits 0.896 0.492 0.301 2.377 0.964 1.265 0.841 6.128 OR = exp(estimate) OR (age) = exp(-0.07) = 0.93 Comparison of univariate versus multivariate results Multivariate Parameter DF Estimate Standard Error Intercept age women collgrad aa 1 1 1 1 1 1.7422 -0.0732 -0.2367 -0.6866 1.3394 1.0235 0.0189 0.2407 0.2618 0.2416 Wald Chi-Square Pr > ChiSq 2.8976 15.0704 0.9672 6.8805 30.7354 0.0887 0.0001 0.3254 0.0087 <.0001 Wald Chi-Square Pr > ChiSq 2.8976 15.8221 1.4026 7.7635 39.5071 0.0887 <.0001 0.2363 0.0053 <.0001 Univariate (Separate regression runs) Parameter DF Estimate Standard Error Intercept age women collgrad aa 1 1 1 1 1 1.7422 -0.0736 0.2561 -0.6945 1.4091 1.0235 0.0185 0.2162 0.2492 0.2242 Note: Women more likely to be AA then men in TOMHS and AA more likely to be smokers. Linear Regression • Model a continuous factor as a function of one or more independent variables. • TOMHS Example: • Energy (calories) intake as a function of age, gender, race, and education ODS SELECT ParameterEstimates ; PROC REG; MODEL energy = age women collgrad aa ; RUN; The REG Procedure Model: MODEL1 Dependent Variable: energy Parameter Estimates Variable Intercept age women collgrad aa DF Parameter Estimate Standard Error t Value Pr > |t| 1 1 1 1 1 3574.78842 -20.67969 -570.45804 -109.19062 -253.62159 184.91689 3.25993 44.34733 44.01230 54.07279 19.33 -6.34 -12.86 -2.48 -4.69 <.0001 <.0001 <.0001 0.0133 <.0001 Energy = 3575 -21*age – 570*women – 109*collgrad – 253*aa Multivariate Analysis Variable DF Parameter Estimate age women collgrad aa 1 1 1 1 -20.67969 -570.45804 -109.19062 -253.62159 Standard Error t Value Pr > |t| 3.25993 44.34733 44.01230 54.07279 -6.34 -12.86 -2.48 -4.69 <.0001 <.0001 0.0133 <.0001 Univariate Analysis (Separate regression runs) Variable DF Parameter Estimate age women collgrad aa 1 1 1 1 -17.1154 -595.40078 41.21749 -388.19448 Standard Error t Value Pr > |t| 3.60184 43.74189 48.61549 57.32940 -4.75 -13.61 0.85 -6.77 <.0001 <.0001 0.3968 <.0001 Women less likely to be college graduates and also to have lower coloric intake. PROC MEANS; VAR energy; CLASS women aa collgrad; RUN; Analysis Variable : energy N women aa collgrad Obs N Mean -------------------------------------------------------------------------0 0 0 277 276 2445.043 1 1 0 1 1 213 213 2338.319 0 42 42 2141.714 1 23 23 1992.261 0 162 162 1795.938 1 71 71 1853.366 0 92 92 1694.196 1 20 20 1532.300 Time to Event Analyses - Framework • Each patient has an event indicator (1=yes, 0=no) • Each patient has a follow-up time – Time from entry into study until time of event – Time from entry into study until time patient no longer followed (end of study, lost-to-follow-up, or death) For each person there is a time zero where the person becomes at risk for the event of interest Kaplan-Meier Life Curves PROGRAM 22 DATA lifetable; INFILE ‘C:\SAS_Files\endpoint.csv' DSD FIRSTOBS=2; INPUT ptid $ age allcvd tallcvd active; LABEL active = 'Treatment Group'; LABEL tallcvd = 'Follow-up Time in Years'; PROC PRINT DATA=lifetable (OBS=20); TITLE 'First 20 Obs of Dataset Lifetable'; RUN; First Observations of Dataset Lifetable Obs ptid age allcvd tallcvd active 1 A00001 54 1 3.868 1 2 A00010 62 0 5.334 0 3 A00021 64 0 5.014 1 4 A00023 47 0 5.279 1 5 A00056 51 0 5.277 1 6 A00075 62 0 4.992 1 7 A00083 59 0 5.066 1 8 A00105 63 1 4.753 1 9 A00133 64 0 5.052 1 10 A00143 52 0 5.049 1 Goal: Do life-table analyses and create K-M plot PROC FORMAT; VALUE groupF 1='Active' 0 = 'Placebo'; RUN; ODS GRAPHICS ; Create survival curve PROC LIFETEST DATA=lifetable PLOTS = survival (NOCENSOR TEST ATRISK = 0 to 5 by 1) ; TIME tallcvd*allcvd(0); STRATA active ; FORMAT active groupF.; RUN; Time variable Event indicator variable (0) censored Results from PROC LIFETEST The LIFETEST Procedure Summary of the Number of Censored and Uncensored Values Stratum active Total Failed Censored Percent Censored 1 Active 668 74 594 88.92 2 Placebo 234 38 196 83.76 ---------------------------------------------------------------Total 902 112 790 87.58 Test of Equality over Strata Test Log-Rank Wilcoxon -2Log(LR) Chi-Square DF Pr > Chi-Square 4.6639 4.9973 4.3354 1 1 1 0.0308 0.0254 0.0373 Goal: Do life-table analyses and create customized K-M plot PROC LIFETEST NOTABLE DATA=lifetable; OUTSURV=ltpoints Create output dataset where= (_censor_ ne 1) ); Include only non-censored points TIME tallcvd*allcvd(0); STRATA active; RUN; PROC PRINT DATA=ltpoints (OBS=20); TITLE 'Display of Life Table Points'; RUN; Display of Life Table Points Obs 1 2 3 4 5 6 7 8 9 10 active 0 0 0 0 0 0 0 0 0 0 tallcvd 0.000 0.236 0.359 0.803 0.849 0.879 0.901 1.000 1.060 1.326 _CENSOR_ 0 0 0 0 0 0 0 0 0 0 SURVIVAL SDF_LCL SDF_UCL STRATUM 1.00000 0.99573 0.99145 0.98718 0.98291 0.97863 0.97436 0.94017 0.93590 0.93162 1.00000 0.98737 0.97966 0.97277 0.96630 0.96010 0.95411 0.90978 0.90451 0.89929 1.00000 1.00000 1.00000 1.00000 0.99951 0.99716 0.99461 0.97056 0.96728 0.96396 1 1 1 1 1 1 1 1 1 1 Want to plot variable survival (y) by variable tallcvd (x) for each treatment group PROC SGPLOT DATA=ltpoints; XAXIS LABEL = 'Years of Follow-up‘ VALUES = (0 to 5 by 1); YAXIS LABEL = "Survival Rate" VALUES = (.6 to 1 by .1); STEP X=tallcvd Y=survival/GROUP=active; FORMAT active groupF. TITLE 'Life Table Graph Comparing Active to Placebo'; RUN; Use step function to connect points Creating KM Plot Using PROC SGPLOT PROC PHREG DATA=lifetable ; MODEL tallcvd*allcvd(0) = active/RL; TITLE 'Results from PROC PHREG'; RUN; PARTIAL PHREG OUTPUT Summary of the Number of Event and Censored Values Percent Total Event Censored Censored 902 112 790 87.58 Analysis of Maximum Likelihood Estimates Variable DF active 1 Variable active Parameter Estimate -0.42652 Hazard Ratio 0.653 Standard Error Chi-Square 0.19958 4.5671 95% Hazard Ratio Confidence Limits 0.441 0.965 Pr > ChiSq 0.0326 35% Lower risk of CVD with treatment