Lesson 11 - Topics
• Statistical procedures: PROC LOGIST, REG,
LIFETEST, & PHREG
• Multiple logistic and linear regression
• Life-table plots and Cox-regression
• Programs 21-22
Logistic Regression
Model a binary factor (yes/no) as a function of one or more
independent variables.
TOMHS Example:
Smoking as a function of age, gender, race, and education
DATA stat ;
INFILE '~/SAS_Files/tomhsfull.data' ;
INPUT
@1
ptid
$10.
@27
age
2.
@30
sex
1.
@32
race
1.
@49
educ
1.
@51
eversmk
1.
@53
nowsmk
1.
@180
energy
5.
;
if race = 2 then aa = 1; else aa = 0;
if sex = 2 then women = 1; else women = 0;
if educ in(1,2,3,4,5,6) then collgrad = 0; else
if educ in(7,8,9) then collgrad = 1;
if eversmk = 2 then currsmk = 2; else currsmk = nowsmk;
if eversmk = 2 then currsmk = 2; else currsmk = nowsmk;
Did you ever smoke cigarettes? 1 = yes, 2= no
Var: eversmk
Do you now smoke cigarettes?
1 = yes, 2= no
Var: nowsmk
Note: Second question only answered if first question is
answered yes.
PROC MEANS;
VAR age women collgrad aa dietfat;
CLASS currsmk;
RUN;
N
currsmk
Obs
Variable
N
Mean
-----------------------------------------------------1
98
age
98
52.31
women
98
0.44
collgrad
98
0.23
aa
98
0.45
2
801
age
801
55.08
women
801
0.38
collgrad
799
0.38
aa
801
0.17
------------------------------------------------------
ODS SELECT ParameterEstimates OddsRatios
PROC LOGIST;
MODEL currsmk = age women collgrad aa ;
RUN;
Analysis of Maximum Likelihood Estimates
Parameter
DF
Estimate
Standard
Error
Intercept
age
women
collgrad
aa
1
1
1
1
1
1.7422
-0.0732
-0.2367
-0.6866
1.3394
1.0235
0.0189
0.2407
0.2618
0.2416
Wald
Chi-Square
Pr > ChiSq
2.8976
15.0704
0.9672
6.8805
30.7354
0.0887
0.0001
0.3254
0.0087
<.0001
Odds Ratio Estimates
Effect
age
women
collgrad
aa
Point
Estimate
0.929
0.789
0.503
3.817
95% Wald
Confidence Limits
0.896
0.492
0.301
2.377
0.964
1.265
0.841
6.128
OR = exp(estimate)
OR (age) = exp(-0.07) = 0.93
Comparison of univariate versus multivariate results
Multivariate
Parameter
DF
Estimate
Standard
Error
Intercept
age
women
collgrad
aa
1
1
1
1
1
1.7422
-0.0732
-0.2367
-0.6866
1.3394
1.0235
0.0189
0.2407
0.2618
0.2416
Wald
Chi-Square
Pr > ChiSq
2.8976
15.0704
0.9672
6.8805
30.7354
0.0887
0.0001
0.3254
0.0087
<.0001
Wald
Chi-Square
Pr > ChiSq
2.8976
15.8221
1.4026
7.7635
39.5071
0.0887
<.0001
0.2363
0.0053
<.0001
Univariate (Separate regression runs)
Parameter
DF
Estimate
Standard
Error
Intercept
age
women
collgrad
aa
1
1
1
1
1
1.7422
-0.0736
0.2561
-0.6945
1.4091
1.0235
0.0185
0.2162
0.2492
0.2242
Note: Women more likely to be AA then men in TOMHS and AA more likely to be
smokers.
Linear Regression
• Model a continuous factor as a function of
one or more independent variables.
• TOMHS Example:
• Energy (calories) intake as a function of
age, gender, race, and education
ODS SELECT ParameterEstimates ;
PROC REG;
MODEL energy = age women collgrad aa ;
RUN;
The REG Procedure
Model: MODEL1
Dependent Variable: energy
Parameter Estimates
Variable
Intercept
age
women
collgrad
aa
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
1
1
1
1
1
3574.78842
-20.67969
-570.45804
-109.19062
-253.62159
184.91689
3.25993
44.34733
44.01230
54.07279
19.33
-6.34
-12.86
-2.48
-4.69
<.0001
<.0001
<.0001
0.0133
<.0001
Energy = 3575 -21*age – 570*women – 109*collgrad – 253*aa
Multivariate Analysis
Variable
DF
Parameter
Estimate
age
women
collgrad
aa
1
1
1
1
-20.67969
-570.45804
-109.19062
-253.62159
Standard
Error
t Value
Pr > |t|
3.25993
44.34733
44.01230
54.07279
-6.34
-12.86
-2.48
-4.69
<.0001
<.0001
0.0133
<.0001
Univariate Analysis (Separate regression runs)
Variable
DF
Parameter
Estimate
age
women
collgrad
aa
1
1
1
1
-17.1154
-595.40078
41.21749
-388.19448
Standard
Error
t Value
Pr > |t|
3.60184
43.74189
48.61549
57.32940
-4.75
-13.61
0.85
-6.77
<.0001
<.0001
0.3968
<.0001
Women less likely to be college graduates and also to have lower coloric
intake.
PROC MEANS;
VAR energy;
CLASS women aa collgrad;
RUN;
Analysis Variable : energy
N
women
aa
collgrad
Obs
N
Mean
-------------------------------------------------------------------------0
0
0
277
276
2445.043
1
1
0
1
1
213
213
2338.319
0
42
42
2141.714
1
23
23
1992.261
0
162
162
1795.938
1
71
71
1853.366
0
92
92
1694.196
1
20
20
1532.300
Time to Event Analyses - Framework
• Each patient has an event indicator (1=yes, 0=no)
• Each patient has a follow-up time
– Time from entry into study until time of event
– Time from entry into study until time patient no longer
followed (end of study, lost-to-follow-up, or death)
For each person there is a time zero where the person becomes
at risk for the event of interest
Kaplan-Meier
Life Curves
PROGRAM 22
DATA lifetable;
INFILE ‘C:\SAS_Files\endpoint.csv' DSD FIRSTOBS=2;
INPUT ptid $ age allcvd tallcvd active;
LABEL active = 'Treatment Group';
LABEL tallcvd = 'Follow-up Time in Years';
PROC PRINT DATA=lifetable (OBS=20);
TITLE 'First 20 Obs of Dataset Lifetable';
RUN;
First Observations of Dataset Lifetable
Obs
ptid
age
allcvd
tallcvd
active
1
A00001
54
1
3.868
1
2
A00010
62
0
5.334
0
3
A00021
64
0
5.014
1
4
A00023
47
0
5.279
1
5
A00056
51
0
5.277
1
6
A00075
62
0
4.992
1
7
A00083
59
0
5.066
1
8
A00105
63
1
4.753
1
9
A00133
64
0
5.052
1
10
A00143
52
0
5.049
1
Goal: Do life-table analyses and create
K-M plot
PROC FORMAT;
VALUE groupF 1='Active' 0 = 'Placebo';
RUN;
ODS GRAPHICS ;
Create survival curve
PROC LIFETEST DATA=lifetable PLOTS = survival
(NOCENSOR TEST ATRISK = 0 to 5 by 1) ;
TIME tallcvd*allcvd(0);
STRATA active ;
FORMAT active groupF.;
RUN;
Time variable
Event indicator variable (0) censored
Results from PROC LIFETEST
The LIFETEST Procedure
Summary of the Number of Censored and Uncensored Values
Stratum
active
Total
Failed
Censored
Percent
Censored
1
Active
668
74
594
88.92
2
Placebo
234
38
196
83.76
---------------------------------------------------------------Total
902
112
790
87.58
Test of Equality over Strata
Test
Log-Rank
Wilcoxon
-2Log(LR)
Chi-Square
DF
Pr >
Chi-Square
4.6639
4.9973
4.3354
1
1
1
0.0308
0.0254
0.0373
Goal: Do life-table analyses and
create customized K-M plot
PROC LIFETEST NOTABLE DATA=lifetable;
OUTSURV=ltpoints Create output dataset
where= (_censor_ ne 1) ); Include only non-censored points
TIME tallcvd*allcvd(0);
STRATA active;
RUN;
PROC PRINT DATA=ltpoints (OBS=20);
TITLE 'Display of Life Table Points';
RUN;
Display of Life Table Points
Obs
1
2
3
4
5
6
7
8
9
10
active
0
0
0
0
0
0
0
0
0
0
tallcvd
0.000
0.236
0.359
0.803
0.849
0.879
0.901
1.000
1.060
1.326
_CENSOR_
0
0
0
0
0
0
0
0
0
0
SURVIVAL
SDF_LCL
SDF_UCL
STRATUM
1.00000
0.99573
0.99145
0.98718
0.98291
0.97863
0.97436
0.94017
0.93590
0.93162
1.00000
0.98737
0.97966
0.97277
0.96630
0.96010
0.95411
0.90978
0.90451
0.89929
1.00000
1.00000
1.00000
1.00000
0.99951
0.99716
0.99461
0.97056
0.96728
0.96396
1
1
1
1
1
1
1
1
1
1
Want to plot variable survival (y) by variable tallcvd
(x) for each treatment group
PROC SGPLOT DATA=ltpoints;
XAXIS LABEL = 'Years of Follow-up‘
VALUES = (0 to 5 by 1);
YAXIS LABEL = "Survival Rate"
VALUES = (.6 to 1 by .1);
STEP X=tallcvd Y=survival/GROUP=active;
FORMAT active groupF.
TITLE 'Life Table Graph Comparing Active to
Placebo';
RUN;
Use step function to connect points
Creating KM Plot Using PROC SGPLOT
PROC PHREG DATA=lifetable ;
MODEL tallcvd*allcvd(0) = active/RL;
TITLE 'Results from PROC PHREG';
RUN;
PARTIAL PHREG OUTPUT
Summary of the Number of Event and Censored Values
Percent
Total
Event
Censored
Censored
902
112
790
87.58
Analysis of Maximum Likelihood Estimates
Variable
DF
active
1
Variable
active
Parameter
Estimate
-0.42652
Hazard
Ratio
0.653
Standard
Error
Chi-Square
0.19958
4.5671
95% Hazard Ratio
Confidence Limits
0.441
0.965
Pr > ChiSq
0.0326
35% Lower risk of CVD with treatment