Lecture 10: Hypothesis Testing II Weight Functions Trend Tests Testing >2 Samples in R > ###2-sample testing using toy example > time<-c(3,6,9,9,11,16,8,9,10,12,19,23) > cens<-c(1,0,1,1,0,1,1,1,0,0,1,0) > grp<-c(1,1,1,1,1,1,2,2,2,2,2,2) > grp<-as.factor(grp) > > sdat<-Surv(time, cens) > survdiff(sdat~grp) Call: survdiff(formula = sdat ~ grp) N Observed Expected (O-E)^2/E (O-E)^2/V grp=1 6 4 2.57 0.800 1.62 grp=2 6 3 4.43 0.463 1.62 Chisq= 1.6 on 1 degrees of freedom, p= 0.203 Testing >2 Samples in R > survdiff(sdat~grp, rho=1) Call: survdiff(formula = sdat ~ grp) N Observed Expected (O-E)^2/E (O-E)^2/V grp=1 6 3.2 2.15 0.513 1.23 grp=2 6 2.11 3.16 0.349 1.23 Chisq= 1.6 on 1 degrees of freedom, p= 0.268 Revisit ‘Linear Dependence’ of Zj(t) • How are they linearly dependent? • Two sample case: Revisit ‘Linear Dependence’ of Zj(t) • K-sample case: Beyond Log-Rank • Log-rank has optimum power to detect Ha when the hazard rates of our K groups are proportional • What if they’re not… • We’ve mentioned using other weight functions • Depending on the choice of weight functions, we can place emphasis on different regions of the survival curve. Example: Kidney Infection • Data on 119 kidney dialysis patients • Comparing time to kidney infection between two groups – Catheters placed percutaneously (n = 76) – Catheters placed surgically (n = 43) Log-Rank Test ti 0.5 1.5 2.5 3.5 4.5 5.5 6.5 8.5 9.5 10.5 11.5 15.5 16.5 18.5 23.5 26.5 Sum Yi1 43 43 42 40 36 33 31 25 22 20 18 11 10 9 4 2 di1 0 1 0 1 2 1 0 2 1 1 1 1 1 1 1 1 Yi2 76 60 56 49 43 40 35 30 27 25 22 14 13 11 5 3 di2 6 0 2 1 0 0 1 0 0 0 0 1 0 0 0 0 Yi 119 103 98 89 79 73 66 55 49 45 40 25 23 20 9 5 di 6 1 2 2 2 1 1 2 1 1 1 2 1 1 1 1 Yi1 di Yi 2.168 0.417 0.857 0.899 0.911 0.452 0.470 0.909 0.449 0.444 0.450 0.88 0.435 0.450 0.440 0.400 di1 Yi1 di Yi -2.168 0.583 -0.857 0.101 1.089 0.548 -0.470 1.091 0.551 0.556 0.550 0.120 0.565 0.550 0.556 0.600 3.964 Yi 1Yi 2 di Yi di Yi2 Yi 1 1.326 0.243 0.485 0.489 0.490 0.248 0.249 0.487 0.247 0.247 0.248 0.472 0.246 0.248 0.247 0.240 6.211 Comparisons W ti Z1 t 112 12 Log-Rank 1 3.96 6.21 2.53 p-value 0.112 Gehan Yi -9 38862 0.002 0.964 Tarone-Ware Yi 13.2 432.83 0.4 0.526 2.47 4.36 1.4 0.237 2.31 4.2 1.28 0.259 1.41 0.21 9.67 0.002 2.55 4.69 1.39 0.239 1.02 0.11 9.83 0.002 2.47 0.66 9.28 0.002 0.32 0.01 8.18 0.004 Test Peto-Peto Modified Peto-Peto S ti S ti Yi i 1 Y Fleming-Harrington p=0; q=1 1 Sˆ ti 1 Fleming-Harrington p=1; q=0 Sˆ ti 1 Fleming-Harrington p=1; q=1 Sˆ ti 1 1 Sˆ ti 1 Fleming-Harrington p=0.5; q=0.5 0.5 Sˆ ti 1 1 Sˆ ti 1 0.5 Fleming-Harrington p=0.5; q=2 0.5 Sˆ ti 1 1 Sˆ ti 1 2 Notice the Differences! • Situation of varying inference • Need to be sure you are testing what you think you are testing • Check – Look at the hazards – Do they cross? • Problem – Estimating hazards is imprecise (as we’ve discussed) Cumulative Hazards Hazard Rate (smoothing spline) Misconception • Survival curves crossing telling about appropriateness of log-rank • Not true – Survival curves crossing depends on censoring and study duration – What if they cross but we don’t look far enough out • Consider – Survival curves cross hazards cross – Hazards cross survival curves may or may not cross • Solution? – Test regions of t – Prior and after cross based in looking at hazard – Some tests allow for crossing (Yang and Prentice 2005) Take-home • Choice of weight function can be critical • K&M recommend applying log-rank and Gehan • Cox regression (simple) is akin to log-rank • Think carefully about the distribution of weights and about possible crossing of hazards What About Weights… • We know that R has limited selection for weights. • SAS doesn’t seem to allow us to specify any weights (at least not in proc lifetest) • So of course we can write our own function… R Function for Different Weights • What information will we need to construct the different weights? • Can we get this information from R? Building Our R Function > times<-kidney$Time > cens<-kidney$d > grp<-kidney$cath > fit<-survfit(Surv(times, cens)~1) > tm<-summary(fit)$time > Yi<-fit$n.risk[which(fit$time%in%tm)] > di<-fit$n.event[which(fit$time%in%tm)] > Yi [1] 119 103 98 89 79 73 66 55 49 45 40 25 23 20 9 5 > di [1] 6 1 2 2 2 1 1 2 1 1 1 2 1 1 1 1 > fit<-survfit(st~kidney$cath) > summary(fit) Call: survfit(formula = st ~ kidney$cath) kidney$cath=1 time n.risk n.event survival std.err lower 95% CI upper 95% CI 1.5 43 1 0.977 0.0230 0.9327 1.000 3.5 40 1 0.952 0.0329 0.8899 1.000 4.5 36 2 0.899 0.0478 0.8104 0.998 5.5 33 1 0.872 0.0536 0.7732 0.984 8.5 25 2 0.802 0.0683 0.6790 0.948 9.5 22 1 0.766 0.0743 0.6332 0.926 10.5 20 1 0.728 0.0799 0.5868 0.902 11.5 18 1 0.687 0.0851 0.5392 0.876 15.5 11 1 0.625 0.0976 0.4599 0.849 16.5 10 1 0.562 0.1060 0.3886 0.813 18.5 9 1 0.500 0.1111 0.3233 0.773 23.5 4 1 0.375 0.1366 0.1835 0.766 26.5 2 1 0.187 0.1491 0.0394 0.891 kidney$cath=2 time n.risk n.event survival std.err lower 95% CI upper 95% CI 0.5 76 6 0.921 0.0309 0.862 0.984 2.5 56 2 0.888 0.0376 0.817 0.965 3.5 49 1 0.870 0.0409 0.793 0.954 6.5 35 1 0.845 0.0467 0.758 0.942 15.5 14 1 0.785 0.0726 0.655 0.941 > names(fit) [1] "n" "time" "n.risk" "n.event" "n.censor" "surv" [11] "lower" "conf.type" "conf.int" "call" "type" "strata" "std.err" "upper" Building Our R Function > names(fit) [1] "n" "time" "n.risk" "n.event" "n.censor" "surv" [11] "lower" "conf.type" "conf.int" "call" "type" "strata" "std.err" "upper" > fit$n.risk [1] 43 42 40 36 33 31 29 25 22 20 18 16 14 13 11 10 9 8 6 4 3 2 1 76 60 56 49 43 40 35 33 30 27 [34] 25 22 20 16 14 13 11 10 7 6 5 4 3 1 > fit$n.event [1] 1 0 1 2 1 0 0 2 1 1 1 0 0 0 1 1 1 0 0 1 0 1 0 6 0 2 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 > names(summary(fit)) [1] "n" "time" "n.risk" "n.event" "n.censor" "surv" [11] "lower" "conf.type" "conf.int" "call" "table“ > summary(fit)$n.risk [1] 43 40 36 33 25 22 20 18 11 10 9 4 2 76 56 49 35 14 > summary(fit)$n.event [1] 1 1 2 1 2 1 1 1 1 1 1 1 1 6 2 1 1 1 "type" "strata" "std.err" "upper" Building Our R Function • We still need to think about how to estimate Yi1 and di1 for all times where > 1 event occurs – including times where group 1 is censored • We can certainly construct a risk set using what we get out of R. – Recall how we find the risk set… Building Our R Function > dat<-cbind(times, cens)[which(grp==1),] > yij<-dij<-c() > for (i in 1:length(tm)) { tmi<-tm[i] yij<-append(yij, length(which(dat[,1]>=tmi))) dij<-append(dij, sum(dat[which(dat[,1]==tmi),2])) } > yij [1] 43 43 42 40 36 33 31 25 22 20 18 11 10 9 4 2 > dij [1] 0 1 0 1 2 1 0 2 1 1 1 1 1 1 1 1 -We need to estimate the weights so we can construct the weighted versions Test Statistic -We have all the parts we need to construct the “constant” portion of our test statistic. >OmEi<-dij-yij*(di/Yi) >vi<-(yij/Yi)*(1-yij/Yi)*((Yi-di)/(Yi-1))*di > round(OmEi, 3) [1] -2.168 0.583 -0.857 0.101 1.089 0.548 -0.470 1.091 0.551 [10] 0.556 0.550 0.120 0.565 0.550 0.556 0.600 > round(vi, 3) [1] 1.326 0.243 0.485 0.489 0.490 0.248 0.249 0.487 0.247 0.247 0.248 0.472 0.246 0.248 0.247 0.240 -Now we need to estimate the weights so we can construct the weighted versions… Different Weight functions #generating weights Sim1<-c(1,fit$surv[which(fit$time%in%tm)][1:(length(tm)-1)]) if (wt=="lr") Wti<-rep(1, length(tm)) if (wt=="geh") Wti<-Yi if (wt=="tw") Wti<-sqrt(Yi) if (wt=="pp") Wti<-cumprod(1-di/(Yi+1)) if (wt=="mpp") Wti<-cumprod(1-di/(Yi+1))*Yi/(Yi+1) if (wt=="fh") { if(missing(p) | missing(q)) stop("Use of Fleming-Harrington Weights requires values for p and q") else Wti<-Sim1^p*(1-Sim1)^q } #Example Using the Gehan weight > wt=“geh” > if (wt=="geh") Wti<-Yi > Wti [1] 119 103 98 89 79 73 66 55 49 45 40 25 23 20 9 5 Final Calculations #Apply the chosen weight to our test statistic and it’s variance > OmE<-as.numeric(t(Wti)%*%OmEi) > v<-as.numeric(t(Wti^2)%*%vi) > tstat<-OmE^2/v > pval<-pchisq(tstat, df=1, lower.tail=F) > OmE [1] -9 >v [1] 38861.81 > tstat [1] 0.002084309 > pval [1] 0.9635858 survdiff_wts<-function(times, cens, grp, wt, p, q) { fit<-survfit(Surv(times, cens)~1) tm<-summary(fit)$time Yi<-fit$n.risk[which(fit$time%in%tm)] di<-fit$n.event[which(fit$time%in%tm)] dat<-cbind(times, cens)[which(grp==1),] yij<-dij<-c() for (i in 1:length(tm)) { tmi<-tm[i] yij<-append(yij, length(which(dat[,1]>=tmi))) dij<-append(dij, sum(dat[which(dat[,1]==tmi),2])) } OmEi<-dij-yij*(di/Yi) vi<-(yij/Yi)*(1-yij/Yi)*((Yi-di)/(Yi-1))*di Sim1<-c(1,fit$surv[which(fit$time%in%tm)][1:(length(tm)-1)]) if (wt=="lr") Wti<-rep(1, length(tm)) if (wt=="geh") Wti<-Yi if (wt=="tw") Wti<-sqrt(Yi) if (wt=="pp") Wti<-cumprod(1-di/(Yi+1)) if (wt=="mpp") Wti<-cumprod(1-di/(Yi+1))*Yi/(Yi+1) if (wt=="fh") { if(missing(p) | missing(q)) stop("Use of Fleming-Harrington Weights requires values for p and q") else Wti<-Sim1^p*(1-Sim1)^q } OmE<-as.numeric(t(Wti)%*%OmEi) v<-as.numeric(t(Wti^2)%*%vi) tstat<-OmE^2/v pval<-pchisq(tstat, df=1, lower.tail=F) ans<-list(weights=Wti, Z_tau=OmE, sig_11=v, chisq=tstat, pval=pval) names(ans)<-c("Weights", "Z_tau","sig_11","chisq value","pvalue") return(ans) } Larynx Cancer • 90 patients diagnosed with larynx cancer (1970’s) • Patients classified according to disease stage – Stages I-IV • We are interested in survival • BUT we want to compare the four stages Kaplan-Meier curves R: survdiff >lar<-read.csv("H:public.html\\BMTRY_722_Summer2015\\Date\\larynx.csv") >time<-lar$time; death<-lar$death; stage<-lar$stage >st<-Surv(time, death) > test0<-survdiff(st~stage) > test0 Call: survdiff(formula = st ~ stage) N Observed Expected (O-E)^2/E (O-E)^2/V stage=1 33 15 22.57 2.537 4.741 stage=2 17 7 10.01 0.906 1.152 stage=3 27 17 14.08 0.603 0.856 stage=4 13 11 3.34 17.590 19.827 Chisq= 22.8 on 3 degrees of freedom, p= 4.53e-05 R: survdiff > test1<-survdiff(st~stage, rho=1) > test1 Call: survdiff(formula = st ~ stage, rho=1) … Chisq= 23.1 on 3 degrees of freedom, p= 3.85e-05 > test2<-survdiff(st~stage, rho=3) > test2 Call: survdiff(formula = st ~ stage, rho=3) … Chisq= 21.8 on 3 degrees of freedom, p= 7.03e-05 Recall: W(ti)=Y(ti)S(t_(i-1))^p(1-S(t_(i-1)))^q What about our hazards R: survdiff > test3<-survdiff(st[stage<3]~stage[stage<3]) Chisq= 0 on 1 degrees of freedom, p= 0.866 > test4<-survdiff(st~factor(disease, exclude=c(2,4))) Chisq= 3.1 on 1 degrees of freedom, p= 0.0801 > test5<- survdiff(st~factor(disease, exclude=c(2,3))) Chisq= 23.4 on 1 degrees of freedom, p= 1.32e-06 > test6<-survdiff(st~factor(disease, exclude=c(1,4))) Chisq= 1.5 on 1 degrees of freedom, p= 0.266 > test7<-survdiff(st~factor(disease, exclude=c(1,3)) Chisq= 11.5 on 1 degrees of freedom, p= 0.000679 > test8<-survdiff(st[stage>2]~stage[stage>2]) Chisq= 0.5 on 1 degrees of freedom, p= 0.769 What about the differences • Not much evidence of hazards crossing • If there isn’t overlap, then tests will be somewhat consistent • Log-rank: most appropriate when hazards are proportional Test For Trends • We generally perform tests of trends for ordinal variables – Dose level – PSA categories (prostate cancer) – Cancer stage • Different than treating variable as continuous, although that is one ‘accepted’ approach • For continuous covariates, we need a regression model (we will get there shortly) Formally Tests for trends • Our hypothesis is H 0 : h1 t h2 t ... hK t H A : h1 t h2 t ... hK t with at least 1 strict inequality • Any weight function discussed previously can be used • Test statistic: a Z t a a ˆ K Z j i j K K j 1 g 1 j j g jg Formally Tests for trends • aj : Weights- often chosen as aj = j but can be user specified • jg : jth, gth element of the variance-covariance matrix of Zj(t) a Z t a a ˆ K Z j i j K K j 1 g 1 ~ N 0,1 j j g jg Stage: Ordinal Categories Trend Test in R #Test Trend in R surv.trendtest<-function(times, cens, wt, aj) { require(survival) test<-survdiff(Surv(times, cens), rho=wt) zj<-test$obs-test$exp zv<-test$var num<-sum(aj*zj) den<-0 for (i in 1:length(aj)) { for (g in 1:length(aj)) { den<-den+aj[i]*aj[g]*zv[i,g]} } den<-sqrt(den) zz<-num/den pval<-2*(1-pnorm(abs(zz))) return(list(Z=zz, pvalue=pval)) } Trend Test in R >test.t0<-surv.trendtest(test=test0, wt=1:4) >test.t0 $Z [1] 3.718959 $pvalue [1] 0.0002000459 >test.t1<-surv.trendtest(test=test1, wt=1:4) > test.t1 $Z [1] 4.120055 $pvalue [1] 3.787827e-05 Next Time… • Stratified tests • Other K-sample tests