1. Survival time 2. Censored observations 3. Proc Lifetest: Kaplan

advertisement
Lecture 23
1. Survival time
2. Censored observations
3. Proc Lifetest: Kaplan-Meier estimate of the survival distribution
4. Comparing survival distributions
5. Proportional hazards regression: Proc PHreg
References:
Collett (2003) Modelling Survival Data in Medical Research, 2nd ed.
Allison (1995) Survival Analysis Using the SAS System.
Cantor (2003) SAS Survival Analysis Techniques for Medical Research
Singer & Willett (2003) Applied Longitudinal Data Analysis: Modeling Change and
Event Occurrence
1
Time-to-event or survival data
In many situations, time until an event occurs is important:
• New treatment for brain cancer: do patients survive longer than after standard
treatment?
• In the AHC, are men awarded tenure earlier and more often than women?
• Are young adults getting married later than 10 years ago? Are women delaying
the birth of their first child?
In theory, each individual has their own time Ti to the event. In reality, some do
not have the event during observation period (censoring).
Objective is not point estimate (mean, slope, odds ratio) but an estimate of the
distribution of these times {Ti }.
2
US Census Bureau cross-section (“synthetic cohort”) for 2002
Histogram of survival times for 2002 US population, truncated at 101.
3.5
Percent of Deaths by age, 2002
3.0
2.5
2.0
1.5
1.0
0.5
0.0
0
20
40
60
80
100
US Population, Age in Years in 2002
E. Arias (2004) United States Life Tables, 2002 (National vital statistics reports; vol 53 no 6. Hyattsville, Maryland:
National Center for Health Statistics.)
3
Survivor function S(t ) = chance of surviving to age t
Percent Surviving
= percent still alive (free of the event) at age t
100
100
80
80
60
60
40
40
20
20
0
0
0
20
40
60
US Population, Age in Years in 2002
4
80
100
Hazard function h(t ) = age-specific death rate
= percent dying at age t of those alive at age ∏ t
Age−Specific Death Rate per 100,000
0.20
0.15
0.10
0.05
0.00
0
20
40
60
80
US Population, Age in Years in 2002
5
Probability theory defines distribution by:
• histogram of lifetimes, called probability density function f (t )
• cumulative distribution function = cumulative area under histogram, starting
from left.
F (t ) =
Zt
f (u)d u
°1
Survivor function S(t ) = 1 ° F (t ).
Percent without the event (still alive) at time t .
Hazard function h(t ) =
f (t )
chance of event at time t
=
S(t )
percent at risk at time t
Hazard h(t ) gives the chance of event during a short interval after time t ,
for those who are at risk (alive) at time t .
6
Outline: two main analyses for survival data
1. Estimate survivor function, compare survivor functions between groups.
Proc Lifetest gives nonparametric product-limit (Kaplan-Meier) or lifetable
estimate, draws graphs, tests for differences.
Nice pictures, but no adjustments—only strata.
Proc LifeReg gives regression adjustment but must specify parametric formula
for survivor function; rarely used in health sciences.
2. Estimate ratio of hazard functions between groups, compare ratio to 1.
Proc PHreg does proportional hazards regression to estimate ratio.
No pictures (almost) but regression adjustment for fixed and time-varying
predictors.
7
Censored observation times
Common problem in survival data is that we don’t observe all event times:
• we stop the study and analyze the data before everyone has had the event
• a person leaves the study and we cannot find out whether they had the event
In these cases, all we have is final time t 0 subject was known to be alive;
we know only that T > t 0
The final time t 0 is called a censored observation, and it’s a lower bound for the
unknown event time T .
8
Clinical study example: eligible participants were enrolled as soon as they
volunteered, and recruitment lasted 2.5 years. The study ended on 1/1/2008.
Subjects died (open circle), dropped out (triangle), or were still alive at study end
(gray dot).
●
●
●
●
●
start
end
1/1/2005
1/1/2006
1/1/2007
1/1/2008
Calendar Time
9
Analysis of clinical study example:
each subject’s time is aligned to start at “study time” = 0.
●
●
●
●
●
start
0.0
●
end
0.5
1.0
1.5
2.0
2.5
3.0
Time from Enrollment (years)
* marks study enrollment, horizontal line indicates time participant was alive,
deaths are indicated by an open circle, censoring by a gray dot.
10
No histogram of survival times with censored data
We can draw a histogram of all the times t i
If there are censored times, we know that t i < actual survival time.
No correct place in histogram for censored observations, because they are lower
bounds, not observed times.
However, excluding them gives a biased histogram.
Kaplan and Meier (1958) proposed break-through method to estimate
survivor function S(t ) from partially censored data.
11
Kaplan-Meier estimate of the survivor function
Order the event times from earliest to latest: t 0 (baseline), t 1, t 2, . . . , t v .
Within each interval [t i , t i +1) (left end included, right end excluded) let
n i = number at risk of event at start of interval
d i = number of events within interval
Then d i /n i = event rate in interval,
°
¢
and 1 ° d i /n i = proportion with no event (surviving)
12
Survivor function gives chance of surviving to time t
Estimate this by product of chances of surviving each interval up to t :
µ
∂
di
Ŝ(t ) =
1°
ni
t i <t
Y
Notice that length of time intervals is ignored.
13
Stomach cancer example
Survival times after treatments A or B for 89 patients with stomach cancer
(source: Chapter 12, Der and Everitt).
• 45 received treatment A: 38 died, 7 still alive at end of study (= censored)
• 44 received treatment B: 41 died, 3 still alive
Obs
1
2
3
4
5
days
17
185
542
1
383
trt
A
A
A
B
B
years
0.04654
0.50650
1.48392
0.00274
1.04860
died
1
1
1
1
1
days, years give times t patients were last known alive.
died = 1 if an event happened at t . died = 0 if censored.
14
Proc Lifetest: Kaplan-Meier (Product-Limit) estimate of survivor function
ODS graphics on;
Proc Lifetest
data = stomach_cancer
plots =(survival(atrisk=0 to 4 by 1));
TIME
years * died(0);
STRATA
trt ;
run;
ODS graphics off;
TIME statement is like model statement, specifies response
TIME length-of-time * event-status ( censored-value ) ;
STRATA variable identifying treatment groups to be compared by test
15
plots=(survival(atrisk=0 to 4 by 1)) censoredsymbol="|" ;
Sample sizes given at bottom. Need at least 10–15 in each group.
16
plots=( survival( CL atrisk=0 to 4 by 1)) ;
17
The LIFETEST Procedure
Stratum 1: trt = A
years
0.00000
0.04654
0.11499
0.12047
0.13142
0.16427
. . . .
3.32375
3.37303*
3.73990
3.98357*
4.33949*
4.44079*
4.45175*
4.75291*
Survival
Failure
Survival
Standard
Error
Number
Failed
Number
Left
1.0000
0.9778
0.9556
0.9333
0.9111
0.8889
0
0.0222
0.0444
0.0667
0.0889
0.1111
0
0.0220
0.0307
0.0372
0.0424
0.0468
0
1
2
3
4
5
45
44
43
42
41
40
0.1750
.
0.1458
.
.
.
.
.
0.8250
.
0.8542
.
.
.
.
.
0.0572
.
0.0546
.
.
.
.
.
37
37
38
38
38
38
38
38
7
6
5
4
3
2
1
0
NOTE: The marked survival times are censored observations.
18
years: time t when survivor function starts a new value
Survival: Kaplan-Meier (product-limit) estimate Ŝ(t ) of the survivor function
Failure: Kaplan-Meier estimate of cumulative mortality, [1 ° Ŝ(t )] = F̂ (t )
Survival Standard Error: the pointwise standard error of the estimate Ŝ(t )
Number Failed: the total number of events
Number Left: the number still under observation and at risk for the event
95% confidence interval for the estimated survivor function from the usual
formula with a standard error (from output): Ŝ(t ) ± 1.965 § SE{Ŝ(t )}
19
Stratum 1: trt = A
Quartile Estimates
Percent
75
50
25
Point
Estimate
1.58795
0.69541
0.39425
Mean
1.34660
95% Confidence Interval
[Lower
Upper)
1.27036
.
0.52841
1.32512
0.20260
0.53388
Standard Error
0.19441
Median survival time is time t when Ŝ(t ) = 0.5, the survivor function equals 50%.
If Ŝ(t ) = 0.5 over an interval, the median is midpoint of the interval.
Mean survival time is area under the Kaplan-Meier survival curve.
If the largest observed time in the data is censored, then this area is unspecified.
Don’t report mean survival time if there is any censoring.
20
Summary of censoring in each group.
Summary of the Number of Censored and Uncensored Values
Stratum
group
Total
Failed
Censored
Percent
Censored
1
A
45
38
7
15.56
2
B
44
41
3
6.82
------------------------------------------------------------------Total
89
79
10
11.24
Precision of estimates depends on the number of events (“Failed”) not the number
of observations.
21
Tests to compare population survivor functions
Lifetest compares population survivor functions S(t ) between groups listed in
the STRATA statement. Null hypothesis: all groups have the same population
survivor function; here, S A (t ) = S B (t ).
• Log rank
• Wilcoxon
• Likelihood ratio test
Ignore likelihood ratio test—it depends on strong assumption (exponential
density) that is usually wrong.
22
Rank Statistics
trt
A
B
Log-Rank
Wilcoxon
3.3043
-3.3043
502.00
-502.00
Test of Equality over Strata
Test
Log-Rank
Wilcoxon
-2Log(LR)
Chi-Square
DF
Pr >
Chi-Square
0.5654
4.3162
0.3574
1
1
1
0.4521
0.0378
0.5500
Two usable tests disagree here.
23
All three tests are based on H0 : S A (t ) = S B (t ):
• combine all groups to get a common event rate on each time interval
• for each group in each interval, multiply event rate by sample size to get
expected numbers of events
e j k = expected numbers of events in group j during time period k
d j k = observed numbers of events in group j at time k.
24
Log-rank test statistic is cumulative difference between observed and expected:
dL =
X°
¢
d 1k ° e 1k .
k
Rank Statistics
trt
A
B
Log-Rank
3.3043
-3.3043
Wilcoxon
502.00
-502.00
Test statistic for A was +3.3043, indicating more deaths than expected.
Test statistic for B was °3.3043, indicating fewer deaths than expected.
Usually more sensitive test. Best test when the estimated survivor functions do not
cross each other. Often the basis for sample size calculations.
25
Wilcoxon test. Sum of differences between observed and expected events,
weighted by sample size:
dW =
X
k
n k (d 1k ° e 1k ).
Rank Statistics
trt
A
B
Log-Rank
3.3043
-3.3043
Wilcoxon
502.00
-502.00
Wilcoxon test gives more weight to the early part of the estimated survivor
functions, where there is more information.
Wilcoxon is less sensitive to late differences in survivor functions.
Use Wilcoxon when estimated survivor functions cross each other.
26
27
Rank Statistics
trt
A
B
Log-Rank
Wilcoxon
3.3043
-3.3043
502.00
-502.00
Test of Equality over Strata
Test
Log-Rank
Wilcoxon
-2Log(LR)
Chi-Square
DF
Pr >
Chi-Square
0.5654
4.3162
0.3574
1
1
1
0.4521
0.0378
0.5500
Which test should we report?
Think about sample size as well as whether survivor curves cross.
28
Usually display K-M curves only where sample size is at least 10.
To truncate plot, set maxtime.
ODS graphics on;
Proc Lifetest
data=two_years
maxtime=3.0
plots=survival(atrisk=0 to 4 by 1) ;
time years * censor(1);
strata trt ;
run;
ODS graphics off;
Has no effect on tests, which still use all the data.
29
30
Proc Lifetest TEST statment
Proc Lifetest also compares groups identified in the TEST statement.
This is intended to test the effect of a continuous explanatory variable.
When used with a categorical variable, such as treatment results are not the same as
from STRATA.
Use STRATA not TEST.
31
Summarizing tests comparing survivor functions
When a test finds a significant difference between survivor curves, it does not tell
us when they differ.
Three common approaches to summarizing comparison of two groups A and B :
1. If curves are consistently separated, report overall result.
If Ŝ A is consistently lower than Ŝ B , then group A suffered more early events and
had a smaller percent surviving at any given time. Report results of log-rank test.
During the ten years after treatment, the proportion surviving was significantly
higher in group B (log-rank test, p = .016).
32
2. Select a percent surviving, and report times when each group reached it.
Common choice is 50% surviving, the median survival time.
Group A
Percent
75
50
25
Group B
75
50
25
Quartile Estimates
Point
95% Confidence Interval
Estimate
[Lower
Upper)
1.58795
1.27036
.
0.69541
0.52841
1.32512
0.39425
0.20260
0.53388
2.39836
1.38535
0.95277
1.55784
1.04860
0.68446
3.47981
1.85079
1.06229
Median survival in group A was 0.7 years (95% CI: 0.5–1.3 years), while in group B
median survival was 1.4 years, twice as long (95% CI: 1.0–1.9 years).
33
3. Select a time and report percent in each group surviving at that time.
Percent surviving (and confidence interval) beyond a particular time comes from
theProc Lifetest listing of Ŝ(t ).
Stratum 1: group = A
years
0.00000
. . . .
0.86242
1.09788
Survival
1.0000
Failure
0
Survival
Standard
Error
0
0.4444
0.4222
0.5556
0.5778
0.0741
0.0736
Number
Failed
0
Number
Left
45
25
26
20
19
After one year of treatment, estimated proportion surviving in group A was 44% ±
7%, but in group B the estimated proportion surviving was 68% ± 7%.
34
Hazard function
hazard function h(t ) =
f (t )
chance of an event at time t
=
S(t )
percent alive at time t
Hazard h(t) is time-specific event rate.
Survivor function S(t ) = percent still at risk (alive) at time t .
Hazard function h(t ) = chance of event at time t for the subset of people at risk.
35
Proportional Hazards Regression
Proportional hazards regression (D.R Cox, 1972) assumes that different groups
have proportional hazard functions. With two groups A and B, there is a common
hazard function h(t ), which applies to group A. Being in group B multiplies the
hazard by r.
h B (t ) = r · h A (t ).
Proportional hazards regression estimates r without estimating h(t ).
Since hazards are chances, this means that the ratio of the hazard functions
r =
h B (t )
h A (t )
can be interpreted as a relative risk or relative rate.
36
Proportional hazards regression makes several assumptions:
1. There is a baseline hazard function h 0(t ) common to all individuals in all the
study groups.
2. Study group j has a hazard function h j (t ) that is a positive multiple of the
baseline hazard:
h j (t ) = r j h 0(t ).
Each group has its own hazard ratio r j . For reference group, r j = 1.
3. Explanatory variables act only on the r j not the baseline hazard.
37
Proportional Hazards Regression Model
Model the hazard ratios (relative risks) on the log scale as function of predictors:
°
¢
log hazard ratio r j = Ø1Group + Ø2 X + Ø3 Z + . . .
No intercept—it is part of baseline hazard.
What makes proportional hazards regression work is that we can fit the model
without needing to estimate the baseline hazard h 0(t ).
Proportional hazards regression is about the hazard ratio or relative risk, not the
hazard.
38
Interpretation of the regression coefficients is very similar to logistic regression:
• Class variable A :
exp(Ø̂i ) is the hazard ratio or relative risk comparing i -th level of A to
reference level.
• Continuous variable X :
exp(Ø̂ j ) is relative risk corresponding to a 1-unit increase in X , comparing those
with X = x + 1 to those with X = x.
39
Breast cancer example
Study in 1987 compared survival times of women diagnosed with breast cancer
divided into two groups: staining test of biopsy tissue positive or negative.
Data from Collett (2003) Example 1.2.
40
Proc Lifetest
TIME
data=breast_cancer
;
surv_months * died(0);
STRATA
positive_stain;
positive_
Percent
Stratum
stain
Total
Failed
Censored
Censored
1
0
13
5
8
61.54
2
1
32
21
11
34.38
Pr >
Test
Chi-Square
DF
Chi-Square
Log-Rank
3.5150
1
0.0608
Wilcoxon
4.1800
1
0.0409
-2Log(LR)
4.3563
1
0.0369
41
Which test is appropriate here?
Test gives no estimate of difference between groups.
42
PHreg: proportional hazards regression
Proc PHreg
MODEL
;
time * event-status ( censored_value ) = predictors
/ risklimits
CI for hazard ratios
ties=efron ;
Proc PHreg
fitting method option
data = breast_cancer;
model surv_months * died(0) = positive_stain
/ risklimits ties=efron;
The response is specified in the same way as for Proc Lifetest.
43
Analysis of Maximum Likelihood Estimates
Parameter
positive_stain
DF
Parameter
Estimate
Standard
Error
Chi-Square
Pr > ChiSq
1
0.90933
0.50089
3.2957
0.0695
Analysis of Maximum Likelihood Estimates
Parameter
positive_stain
Hazard
Ratio
95% Hazard Ratio
Confidence Limits
2.483
0.930
6.626
Hazard rate in the positive-stain group was estimated to be 2.5 times greater than in the
negative-stain group, although this did not reach significance (p = .0695).
44
Download