Lecture 10: Hypothesis Testing II

advertisement
Lecture 10: Hypothesis Testing II
Weight Functions
Trend Tests
Testing >2 Samples in R
> ###2-sample testing using toy example
> time<-c(3,6,9,9,11,16,8,9,10,12,19,23)
> cens<-c(1,0,1,1,0,1,1,1,0,0,1,0)
> grp<-c(1,1,1,1,1,1,2,2,2,2,2,2)
> grp<-as.factor(grp)
>
> sdat<-Surv(time, cens)
> survdiff(sdat~grp)
Call:
survdiff(formula = sdat ~ grp)
N Observed Expected (O-E)^2/E (O-E)^2/V
grp=1 6
4
2.57
0.800
1.62
grp=2 6
3
4.43
0.463
1.62
Chisq= 1.6 on 1 degrees of freedom, p= 0.203
Testing >2 Samples in R
> survdiff(sdat~grp, rho=1)
Call:
survdiff(formula = sdat ~ grp)
N Observed Expected (O-E)^2/E (O-E)^2/V
grp=1 6
3.2
2.15
0.513
1.23
grp=2 6
2.11
3.16
0.349
1.23
Chisq= 1.6 on 1 degrees of freedom, p= 0.268
Revisit ‘Linear Dependence’ of Zj(t)
• How are they linearly dependent?
• Two sample case:
Revisit ‘Linear Dependence’ of Zj(t)
• K-sample case:
Beyond Log-Rank
• Log-rank has optimum power to detect Ha when the
hazard rates of our K groups are proportional
• What if they’re not…
• We’ve mentioned using other weight functions
• Depending on the choice of weight functions, we can
place emphasis on different regions of the survival
curve.
Example: Kidney Infection
• Data on 119 kidney dialysis patients
• Comparing time to kidney infection between
two groups
– Catheters placed percutaneously (n = 76)
– Catheters placed surgically (n = 43)
Log-Rank Test
ti
0.5
1.5
2.5
3.5
4.5
5.5
6.5
8.5
9.5
10.5
11.5
15.5
16.5
18.5
23.5
26.5
Sum
Yi1
43
43
42
40
36
33
31
25
22
20
18
11
10
9
4
2
di1
0
1
0
1
2
1
0
2
1
1
1
1
1
1
1
1
Yi2
76
60
56
49
43
40
35
30
27
25
22
14
13
11
5
3
di2
6
0
2
1
0
0
1
0
0
0
0
1
0
0
0
0
Yi
119
103
98
89
79
73
66
55
49
45
40
25
23
20
9
5
di
6
1
2
2
2
1
1
2
1
1
1
2
1
1
1
1
Yi1
 
di
Yi
2.168
0.417
0.857
0.899
0.911
0.452
0.470
0.909
0.449
0.444
0.450
0.88
0.435
0.450
0.440
0.400
di1  Yi1
 
di
Yi
-2.168
0.583
-0.857
0.101
1.089
0.548
-0.470
1.091
0.551
0.556
0.550
0.120
0.565
0.550
0.556
0.600
3.964
Yi 1Yi 2 di Yi  di 
Yi2 Yi 1
1.326
0.243
0.485
0.489
0.490
0.248
0.249
0.487
0.247
0.247
0.248
0.472
0.246
0.248
0.247
0.240
6.211
Comparisons
W  ti 
Z1 t 
 112
12
Log-Rank
1
3.96
6.21
2.53
p-value
0.112
Gehan
Yi
-9
38862
0.002
0.964
Tarone-Ware
Yi
13.2
432.83
0.4
0.526
2.47
4.36
1.4
0.237
2.31
4.2
1.28
0.259
1.41
0.21
9.67
0.002
2.55
4.69
1.39
0.239
1.02
0.11
9.83
0.002
2.47
0.66
9.28
0.002
0.32
0.01
8.18
0.004
Test
Peto-Peto
Modified Peto-Peto
S  ti 
S  ti  Yi i 1
Y
Fleming-Harrington
p=0; q=1
1  Sˆ  ti 1 
Fleming-Harrington
p=1; q=0
Sˆ  ti 1 
Fleming-Harrington
p=1; q=1
Sˆ  ti 1  1  Sˆ  ti 1  
Fleming-Harrington
p=0.5; q=0.5
0.5
Sˆ  ti 1  1  Sˆ  ti 1  
0.5
Fleming-Harrington
p=0.5; q=2
0.5
Sˆ  ti 1  1  Sˆ  ti 1  
2
Notice the Differences!
• Situation of varying inference
• Need to be sure you are testing what you
think you are testing
• Check
– Look at the hazards
– Do they cross?
• Problem
– Estimating hazards is imprecise (as we’ve
discussed)
Cumulative Hazards
Hazard Rate (smoothing spline)
Misconception
• Survival curves crossing  telling about appropriateness of
log-rank
• Not true
– Survival curves crossing depends on censoring and study
duration
– What if they cross but we don’t look far enough out
• Consider
– Survival curves cross  hazards cross
– Hazards cross  survival curves may or may not cross
• Solution?
– Test regions of t
– Prior and after cross based in looking at hazard
– Some tests allow for crossing (Yang and Prentice 2005)
Take-home
• Choice of weight function can be critical
• K&M recommend applying log-rank and
Gehan
• Cox regression (simple) is akin to log-rank
• Think carefully about the distribution of
weights and about possible crossing of
hazards
What About Weights…
• We know that R has limited selection for
weights.
• SAS doesn’t seem to allow us to specify any
weights (at least not in proc lifetest)
• So of course we can write our own function…
R Function for Different Weights
• What information will we need to construct
the different weights?
• Can we get this information from R?
Building Our R Function
> times<-kidney$Time
> cens<-kidney$d
> grp<-kidney$cath
> fit<-survfit(Surv(times, cens)~1)
> tm<-summary(fit)$time
> Yi<-fit$n.risk[which(fit$time%in%tm)]
> di<-fit$n.event[which(fit$time%in%tm)]
> Yi
[1] 119 103 98 89 79 73 66 55 49 45 40 25 23 20 9 5
> di
[1] 6 1 2 2 2 1 1 2 1 1 1 2 1 1 1 1
> fit<-survfit(st~kidney$cath)
> summary(fit)
Call: survfit(formula = st ~ kidney$cath)
kidney$cath=1
time n.risk n.event survival std.err lower 95% CI upper 95% CI
1.5 43
1 0.977 0.0230
0.9327
1.000
3.5 40
1 0.952 0.0329
0.8899
1.000
4.5 36
2 0.899 0.0478
0.8104
0.998
5.5 33
1 0.872 0.0536
0.7732
0.984
8.5 25
2 0.802 0.0683
0.6790
0.948
9.5 22
1 0.766 0.0743
0.6332
0.926
10.5 20
1 0.728 0.0799
0.5868
0.902
11.5 18
1 0.687 0.0851
0.5392
0.876
15.5 11
1 0.625 0.0976
0.4599
0.849
16.5 10
1 0.562 0.1060
0.3886
0.813
18.5 9
1 0.500 0.1111
0.3233
0.773
23.5 4
1 0.375 0.1366
0.1835
0.766
26.5 2
1 0.187 0.1491
0.0394
0.891
kidney$cath=2
time n.risk n.event survival std.err lower 95% CI upper 95% CI
0.5 76
6 0.921 0.0309
0.862
0.984
2.5 56
2 0.888 0.0376
0.817
0.965
3.5 49
1 0.870 0.0409
0.793
0.954
6.5 35
1 0.845 0.0467
0.758
0.942
15.5 14
1 0.785 0.0726
0.655
0.941
> names(fit)
[1] "n"
"time" "n.risk" "n.event" "n.censor" "surv"
[11] "lower" "conf.type" "conf.int" "call"
"type"
"strata" "std.err" "upper"
Building Our R Function
> names(fit)
[1] "n"
"time" "n.risk" "n.event" "n.censor" "surv"
[11] "lower" "conf.type" "conf.int" "call"
"type"
"strata" "std.err" "upper"
> fit$n.risk
[1] 43 42 40 36 33 31 29 25 22 20 18 16 14 13 11 10 9 8 6 4 3 2 1 76 60 56 49 43 40 35 33 30 27
[34] 25 22 20 16 14 13 11 10 7 6 5 4 3 1
> fit$n.event
[1] 1 0 1 2 1 0 0 2 1 1 1 0 0 0 1 1 1 0 0 1 0 1 0 6 0 2 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
> names(summary(fit))
[1] "n"
"time" "n.risk" "n.event" "n.censor" "surv"
[11] "lower" "conf.type" "conf.int" "call" "table“
> summary(fit)$n.risk
[1] 43 40 36 33 25 22 20 18 11 10 9 4 2 76 56 49 35 14
> summary(fit)$n.event
[1] 1 1 2 1 2 1 1 1 1 1 1 1 1 6 2 1 1 1
"type"
"strata" "std.err" "upper"
Building Our R Function
• We still need to think about how to estimate
Yi1 and di1 for all times where > 1 event occurs
– including times where group 1 is censored
• We can certainly construct a risk set using
what we get out of R.
– Recall how we find the risk set…
Building Our R Function
> dat<-cbind(times, cens)[which(grp==1),]
> yij<-dij<-c()
> for (i in 1:length(tm))
{
tmi<-tm[i]
yij<-append(yij, length(which(dat[,1]>=tmi)))
dij<-append(dij, sum(dat[which(dat[,1]==tmi),2]))
}
> yij
[1] 43 43 42 40 36 33 31 25 22 20 18 11 10 9 4 2
> dij
[1] 0 1 0 1 2 1 0 2 1 1 1 1 1 1 1 1
-We need to estimate the weights so we can construct the weighted versions
Test Statistic
-We have all the parts we need to construct the “constant” portion of our test
statistic.
>OmEi<-dij-yij*(di/Yi)
>vi<-(yij/Yi)*(1-yij/Yi)*((Yi-di)/(Yi-1))*di
> round(OmEi, 3)
[1] -2.168 0.583 -0.857 0.101 1.089 0.548 -0.470 1.091 0.551
[10] 0.556 0.550 0.120 0.565 0.550 0.556 0.600
> round(vi, 3)
[1] 1.326 0.243 0.485 0.489 0.490 0.248 0.249 0.487 0.247 0.247 0.248 0.472
0.246 0.248 0.247 0.240
-Now we need to estimate the weights so we can construct the weighted
versions…
Different Weight functions
#generating weights
Sim1<-c(1,fit$surv[which(fit$time%in%tm)][1:(length(tm)-1)])
if (wt=="lr") Wti<-rep(1, length(tm))
if (wt=="geh") Wti<-Yi
if (wt=="tw") Wti<-sqrt(Yi)
if (wt=="pp") Wti<-cumprod(1-di/(Yi+1))
if (wt=="mpp") Wti<-cumprod(1-di/(Yi+1))*Yi/(Yi+1)
if (wt=="fh")
{
if(missing(p) | missing(q)) stop("Use of Fleming-Harrington Weights
requires values for p and q")
else Wti<-Sim1^p*(1-Sim1)^q
}
#Example Using the Gehan weight
> wt=“geh”
> if (wt=="geh") Wti<-Yi
> Wti
[1] 119 103 98 89 79 73 66 55 49 45 40 25 23 20 9 5
Final Calculations
#Apply the chosen weight to our test statistic and it’s variance
> OmE<-as.numeric(t(Wti)%*%OmEi)
> v<-as.numeric(t(Wti^2)%*%vi)
> tstat<-OmE^2/v
> pval<-pchisq(tstat, df=1, lower.tail=F)
> OmE
[1] -9
>v
[1] 38861.81
> tstat
[1] 0.002084309
> pval
[1] 0.9635858
survdiff_wts<-function(times, cens, grp, wt, p, q) {
fit<-survfit(Surv(times, cens)~1)
tm<-summary(fit)$time
Yi<-fit$n.risk[which(fit$time%in%tm)]
di<-fit$n.event[which(fit$time%in%tm)]
dat<-cbind(times, cens)[which(grp==1),]
yij<-dij<-c()
for (i in 1:length(tm)) {
tmi<-tm[i]
yij<-append(yij, length(which(dat[,1]>=tmi)))
dij<-append(dij, sum(dat[which(dat[,1]==tmi),2])) }
OmEi<-dij-yij*(di/Yi)
vi<-(yij/Yi)*(1-yij/Yi)*((Yi-di)/(Yi-1))*di
Sim1<-c(1,fit$surv[which(fit$time%in%tm)][1:(length(tm)-1)])
if (wt=="lr") Wti<-rep(1, length(tm))
if (wt=="geh") Wti<-Yi
if (wt=="tw") Wti<-sqrt(Yi)
if (wt=="pp") Wti<-cumprod(1-di/(Yi+1))
if (wt=="mpp") Wti<-cumprod(1-di/(Yi+1))*Yi/(Yi+1)
if (wt=="fh") {
if(missing(p) | missing(q)) stop("Use of Fleming-Harrington Weights requires values for p and q")
else Wti<-Sim1^p*(1-Sim1)^q }
OmE<-as.numeric(t(Wti)%*%OmEi)
v<-as.numeric(t(Wti^2)%*%vi)
tstat<-OmE^2/v
pval<-pchisq(tstat, df=1, lower.tail=F)
ans<-list(weights=Wti, Z_tau=OmE, sig_11=v, chisq=tstat, pval=pval)
names(ans)<-c("Weights", "Z_tau","sig_11","chisq value","pvalue")
return(ans)
}
Larynx Cancer
• 90 patients diagnosed with larynx cancer
(1970’s)
• Patients classified according to disease stage
– Stages I-IV
• We are interested in survival
• BUT we want to compare the four stages
Kaplan-Meier curves
R: survdiff
>lar<-read.csv("H:public.html\\BMTRY_722_Summer2015\\Date\\larynx.csv")
>time<-lar$time; death<-lar$death; stage<-lar$stage
>st<-Surv(time, death)
> test0<-survdiff(st~stage)
> test0
Call: survdiff(formula = st ~ stage)
N Observed Expected (O-E)^2/E (O-E)^2/V
stage=1 33
15
22.57
2.537
4.741
stage=2 17
7
10.01
0.906
1.152
stage=3 27
17
14.08
0.603
0.856
stage=4 13
11
3.34
17.590
19.827
Chisq= 22.8 on 3 degrees of freedom, p= 4.53e-05
R: survdiff
> test1<-survdiff(st~stage, rho=1)
> test1
Call: survdiff(formula = st ~ stage, rho=1)
…
Chisq= 23.1 on 3 degrees of freedom, p= 3.85e-05
> test2<-survdiff(st~stage, rho=3)
> test2
Call: survdiff(formula = st ~ stage, rho=3)
…
Chisq= 21.8 on 3 degrees of freedom, p= 7.03e-05
Recall: W(ti)=Y(ti)S(t_(i-1))^p(1-S(t_(i-1)))^q
What about our hazards
R: survdiff
> test3<-survdiff(st[stage<3]~stage[stage<3])
Chisq= 0 on 1 degrees of freedom, p= 0.866
> test4<-survdiff(st~factor(disease, exclude=c(2,4)))
Chisq= 3.1 on 1 degrees of freedom, p= 0.0801
> test5<- survdiff(st~factor(disease, exclude=c(2,3)))
Chisq= 23.4 on 1 degrees of freedom, p= 1.32e-06
> test6<-survdiff(st~factor(disease, exclude=c(1,4)))
Chisq= 1.5 on 1 degrees of freedom, p= 0.266
> test7<-survdiff(st~factor(disease, exclude=c(1,3))
Chisq= 11.5 on 1 degrees of freedom, p= 0.000679
> test8<-survdiff(st[stage>2]~stage[stage>2])
Chisq= 0.5 on 1 degrees of freedom, p= 0.769
What about the differences
• Not much evidence of hazards crossing
• If there isn’t overlap, then tests will be
somewhat consistent
• Log-rank: most appropriate when hazards are
proportional
Test For Trends
• We generally perform tests of trends for
ordinal variables
– Dose level
– PSA categories (prostate cancer)
– Cancer stage
• Different than treating variable as continuous,
although that is one ‘accepted’ approach
• For continuous covariates, we need a
regression model (we will get there shortly)
Formally Tests for trends
• Our hypothesis is
H 0 : h1  t   h2  t   ...  hK  t 
H A : h1  t   h2  t   ...  hK  t  with at least 1 strict inequality
• Any weight function discussed previously can
be used
• Test statistic:
 a Z t 
  a a ˆ
K
Z
j i
j
K
K
j 1
g 1
j
j
g
jg
Formally Tests for trends
• aj : Weights- often chosen as aj = j but can be
user specified
• jg : jth, gth element of the variance-covariance
matrix of Zj(t)
 a Z t 
  a a ˆ
K
Z
j i
j
K
K
j 1
g 1
~ N  0,1
j
j
g
jg
Stage: Ordinal Categories
Trend Test in R
#Test Trend in R
surv.trendtest<-function(times, cens, wt, aj)
{
require(survival)
test<-survdiff(Surv(times, cens), rho=wt)
zj<-test$obs-test$exp
zv<-test$var
num<-sum(aj*zj)
den<-0
for (i in 1:length(aj)) {
for (g in 1:length(aj))
{ den<-den+aj[i]*aj[g]*zv[i,g]}
}
den<-sqrt(den)
zz<-num/den
pval<-2*(1-pnorm(abs(zz)))
return(list(Z=zz, pvalue=pval))
}
Trend Test in R
>test.t0<-surv.trendtest(test=test0, wt=1:4)
>test.t0
$Z
[1] 3.718959
$pvalue
[1] 0.0002000459
>test.t1<-surv.trendtest(test=test1, wt=1:4)
> test.t1
$Z
[1] 4.120055
$pvalue
[1] 3.787827e-05
Next Time…
• Stratified tests
• Other K-sample tests
Download