Wilcoxon Rank Sum Test Description

Introduction to Survival Analysis Rich Holubkov, Ph.D. September 23, 2010 Today’s Class • Introduction to, and motivation for, basic survival analysis techniques (and why we need advanced techniques) • Presented at a “newbie Master’s statistician” level • I will not assume you’ve seen survival data before, but necessarily will go through this material extremely quickly • I do assume you know biostats basics like heuristics of chi-square statistics Motivation for survival analysis • Let’s say we have an event of interest, such as death or failure of a device. • We may be interested in event rates at a particular timepoint (1 year), or assessing event rates over time as patients are being followed up. • Could just calculate one-year rates… – Losses to follow-up before one year? – Patients not followed for one year as yet? – What if most events happen within 30 days? Motivation for survival analysis • In studies where people are followed over a period of time, we usually have: – people in the study with different lengths of follow-up (whether had event or not) – people dropping out of the study for various reasons at different times • We want to figure out rates of an event (like death, or liver transplant failure) at a particular time point (or all timepoints), taking this differential follow-up into account. Motivation for survival analysis • We will often want to compare survival data between treatment groups (logrank test), quantify effects of factors on outcomes (Cox model and alternatives), sometimes in settings where patients can have competing outcomes (competing risks). • Today, I will talk mainly about KaplanMeier curves and their comparison via logrank tests. No parametrics! Motivating Example (briefly) • My Registry has enrolled 1000 kids undergoing liver transplants from 1985 till 2010 (now). I want to estimate the chance of a child being alive at 10 years post-transplant. • The problem is, Registry has a child who has been followed for 20 years, a child followed for just one year,… • How can I use every child’s information? Makes sense to look year by year! (“Actuarial life tables”) • I’ll see how many kids I had at the beginning of each year of follow-up (1000 enrolled in my Registry, 950 were around at beginning of two years, etc.) • If a child dropped out of the analysis without dying (let’s say, during first year of follow-up, 25 withdrew or just weren’t around for long enough yet), makes sense to count that child as being around and “at risk” for half a year. Actuarial Life Tables • Of my 1000 enrolled kids, at 1 year, 950 were alive, 25 dead, 25 dropouts/not followed long enough. 1-yr survival is: (950+12.5)/(950+25+12.5)=97.5% • Examine the 950 left at beginning of Year 2 in same way. At end of 2 years, 900 were alive, 20 more dead, and 30 dropouts/not long enough. Additional 1-yr survival for these 950 is then (900+15)/(900+20+15)=97.9%. • 2-yr survival=97.5% X 97.9%=95.4% Extend this idea, event by event! • I can do this for 10 years, year by year. • But, I can do this much more finely, effectively calculating an event rate at every timepoint where a child dies. Just need to know number left in the study (“at risk”) at each of those timepoints. This way of generating survival curves is called Kaplan-Meier analysis or the product-limit method. A Kaplan-Meier Example http://www.cancerguide.org/scurve_km.html Let’s say we have just seven patients: Patient A: Patient B: Patient C: Patient D: Patient E: Patient F: Patient G: died at 1 year dropped out at 2 years dropped out at 3 years died at 4 years still in study, alive at 5 years died at 10 years still in study, alive at 12 years So, three deaths at 1, 4, and 10 years. Kaplan-Meier calculation http://www.cancerguide.org/scurve_km.html Interval (StartEnd) # Who # At Risk # Censored # At Risk Died at at Start of During at End of End of Interval Interval Interval Interval Proportion Surviving at End of Interval Cumulative Survival at End of Interval 0-1 7 0 7 1 6/7 = 0.86 0.86 1-4 6 2 4 1 3/4 = 0.75 0.86 * 0.75 = 0.64 4-10 3 1 2 1 1/2 = 0.5 0.86 * 0.75 * 0.5 = 0.31 1/1 = 1.0 0.86 * 0.75 * 0.5 * 1.0 = 0.31 10-12 1 0 1 0 This is our Kaplan-Meier curve! (It’s smoother if more observations) For basic survival analysis, we need for each patient: • A well-defined “time zero”, the date that follow-up begins • A well-defined “date of event or last contact”, the last time we know if the patient had an event or not • Above two elements are equivalent to “time on study” • An indicator of whether or not the patient had an event at last contact. Simple Analysis Dataset For our seven patient example: Event A: died at 1 year 1 B: dropped out at 2 years 0 C: dropped out at 3 years 0 D: died at 4 years 1 E: in study, alive at 5 years 0 F: died at 10 years 1 G: in study, alive at 12 years 0 Time 1 2 3 4 5 10 12 Dropouts: a critical assumption • Statistically: “any dropouts must be noninformative”. Dropout time should be independent of failure time. • Practically: A child who drops out of the analysis cannot differ from a child who stays in analysis, in terms of chances of having an event as follow-up continues. • Is this assumption ever reasonable? • Is this assumption formally testable? Simple Example • A registry follows kids receiving transplants in rural states • Kids whose transplants are not doing as well tend to move to urban areas to be close to major medical centers. Let’s say they are lost to follow-up. • My registry now has relatively healthier kids left as follow-up gets longer • Thus, my long-term follow-up data will give rosier picture of follow-up for children living in rural states. A Variant… • Recent improvements in surgical techniques/post-surgical treatment have improved prognosis. So, kids transplanted decades ago have worse long-term prognosis. But, these are the only kids in my registry with long-term follow-up. • So, my long-term follow-up data will give unnecessarily grim picture for recently treated kids. It’s just like healthy kids dropping out early! • Changes in survival probability over time of enrollment is like nonrandom dropout. A Variant I’ve Seen • A surgeon keeps his own registry. His research assistant regularly followed patients until about 5 years ago. • He still regularly updates the registry database for important outcomes. Specifically, whenever he is notified that a patient has died, that patient’s survival data are updated. • What will be wrong with the database for survival analysis? • How can it be fixed? Informative Censoring • If we stay nonparametric, noninformative censoring cannot be tested using observed data of failure/censoring times! • Parametric methods do exist, for example jointly modeling dropout/events. • Can compare entry characteristics, risk profiles of censored versus uncensored • Should look for time trends in long-term follow-up studies or RCTs • If censoring rate is appreciable, usually concern regarding potential bias. How can we compare two (or more) survival curves? • The standard approach to comparing survival curves between two or more groups, is termed the “logrank test”, sometimes known as the Mantel-Cox test. • The test can be motivated in several ways. I present a “chi-squared table” approach, sort of like Mantel (1966). How a logrank test works! • At every timepoint when one or more kids have an event, we calculate the expected number of events in both groups assuming identical survival. • If one child has event when there were 25 kids in Group A and 75 kids in Group B, we “expect” 0.25 events in Group A and 0.75 in Group B. • Do this for all events, see if sum of the “Observed-Expected” for a group is large, like in the chi-squared test. Heuristic Derivation • • • • j indexes times of events, 1 to J N1j, N2j number at risk at time j, Nj= N1j+N2j O1j, O2j # of events at time j, Oj= O1j+O2j Consider as 2 x 2 table for each time j: Arm 1 Arm 2 Total Dead O1j O2j Oj Alive N1j-O1j N2j-O2j Nj-Oj N1j N2j Nj At Risk How a logrank test works! • If two arms truly have same survival distribution, then O1j is hypergeometric, so has expectation E1j =Oj(N1j/Nj) and variance V1j =(N1jN2jOj(Nj-Oj))/(Nj2(Nj-1)) • Now get a statistic summing over the J event times, treating as sum of independent variables (this is very heuristic!!!) How a logrank test works! • Under null, with E1j =Oj(N1j/Nj) and V1j =(N1jN2jOj(Nj-Oj))/(Nj2(Nj-1)), (∑j(O1j-E1j ))2/∑jV1j has a chi-squared distribution with 1 d.f. This is the standard logrank test for testing equality between two survival curves. • Readily generalizes to more than 2 groups, and to subgroups (strata) within each group to be compared. Why this derivation? • Using our E1j =Oj(N1j/Nj) and V1j =(N1jN2jOj(Nj-Oj))/(Nj2(Nj-1)), we can apply weights wj≡w(tj) at each event time tj, j=1,…,J. Then, the weighted statistic (∑j(wj(O1j -E1j ))2/∑jwjV1j still has a chisquared distribution with 1 d.f. • But, using wj ≡1 yields a test that is uniformly most powerful against all alternatives with proportional hazards (where “risk of event in Arm 1 vs. Arm 2” is constant throughout follow-up) Why is this of interest? • This is why we generally use a standard logrank (Mantel-Cox) test! • But, programs like SAS will give or offer you other variants! – wj = Nj (Gehan-Wilcoxon or Breslow test, gives more weight to earlier observations) – wj = estimated proportion surviving at tj (Peto-Peto-Prentice Wilcoxon test, fully efficient for alternatives where odds ratio of event for Arm 2 vs. Arm 1 constant over time) – There are classes of these test types… Which test to use? • For the practicing statistician, main thing is to be aware of these various test statistics and how they differ. • We usually use the standard Mantel-Cox logrank test when planning a trial comparing survival curves, but whichever test is selected has to be prespecified. • Can’t go hunting for significant p-values after an RCT. What about before? Variance of Kaplan-Meier Estimates • Using the delta method, if estimated survival at time t is S(t), an estimate of the variance of S(t) is S(t)2Σ(t <=t)Oi/((Ni-Oi)Ni). i • This estimate (Greenwood’s formula), used by SAS, is only modified at timepoints when events occur. Thus, estimated variance may not be proportional to subjects remaining at risk. Product-Limit Survival Estimates days_180 0.000 7.000* 15.000* 17.000 18.000 39.000* 42.000* 42.000* 44.000* 59.000* 59.000* 70.000* 92.000 120.000* 130.000* 162.000* 174.000* Survival 1.0000 . . 0.9756 0.9512 . . . . . . . 0.9215 . . . . Failure 0 . . 0.0244 0.0488 . . . . . . . 0.0785 . . . . Standard Error 0 . . 0.0241 0.0336 . . . . . . . 0.0438 . . . . Survival Number Failed 0 0 0 1 2 2 2 2 2 2 2 2 3 3 3 3 3 Number Left 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 Variance of Kaplan-Meier Estimates • You can implement Greenwood’s formula to get confidence intervals of a Kaplan-Meier rate at a particular timepoint, or to test hypotheses about that rate. • Be aware that using the Kaplan-Meier nonparametric estimation approach will give substantially larger standard error estimates than parametrically based survival models. Variance of Kaplan-Meier Estimates • Peto (1977) derived alternative formula: if estimated survival at time t is S(t) and number at risk at time t is N(t), Peto estimate of the variance of S(t) is S(t)2[1-S(t)]/ N(t). • Easy to compute, heuristically makes sense, perhaps(?) better when N smaller • Worth knowing about and examining if testing an event rate is your primary goal! • Neither great with N small/high censoring. Interval Censoring • A patient dropping out of analysis before having an event is “right censored”. • A patient who is known to have an event during some interval is called “interval censored”. For example, a cardiac echo on 12/1/2009 shows a valve leak. Previous echo was on 11/1/2008, and the leak could have started anytime between the two echos. Interval Censoring Left Censoring • A patient who is known to have an event, but not more specifically than before a particular date, is called “left censored”. • For example (Klein/Moeschberger), a study looking at time till first marijuana use may have information from some kids who report use, but cannot recall the date at all. • Special case of interval censoring. Left Censoring (#1, #4) Interval Censoring • Turnbull (1976) developed a nonparametric survival curve estimator applied to interval-censored data. Basically, an EM algorithm is used to compute the nonparametric MLE product-limit curve iteratively. Recently, more efficient algorithms have been developed to compute Kaplan-Meier type estimates. Interval Censoring • Just something to be aware of, if you encounter this type of data. • SAS macros %EMICM and %ICSTEST allow construction of nonparametric survival curves with interval-censored data, and their comparison using a generalized version of the logrank test. • R also has routines (“interval” package) Curves with Interval Censoring Not a Pop Quiz, but… • A statistician familiar with survival basics may sometimes encounter data that didn’t turn out quite as expected. • Here is an example I’ve certainly seen, though usually less extreme. I show hypothetical data on a six-month study with about 40 patients in each of two treatment groups… What’s Your Conclusion? Why? What’s Your Conclusion? Why? Logrank test p=0.64 Gehan-Wilcoxon test p=0.04 Crossing Curves Example • If this were a pilot observational study, logrank test is clearly inappropriate for a future trial if data are like this. • Can consider a different test , shorter follow-up, or otherwise varying design of future trial, depending on setting (e.g., survival after cancer therapy) • If these are results of an actual RCT with a prespecified logrank test, you are at least partly stuck. Basic Survival Analysis in SAS • To do Kaplan-Meier and logrank in SAS: proc lifetest data=curves plots=s; time days*event(0); strata group; run; • plots=s option asks for a Kaplan-Meier plot • time statement: days is time on study, event is outcome indicator, with 0 as the value for patients who are censored (versus those who had an event). • strata statement: compare levels of group Basic Survival Analysis in R • To do Kaplan-Meier curves in R: > library(survival); > mfit=survfit(Surv(days,event==1)~group, data=curves); plot(mfit); • In the survival library, Surv() creates a survival object. First argument is follow-up time, second is a status indicator (if true, event occurred, otherwise censoring). survfit() then computes Kaplan-Meier estimator. • Logrank test: survdiff(Surv(days,event==1) ~group, data=curves) Summary • I have presented basic, nonparametric approaches to analysis of survival data. Kaplan-Meier curves and the logrank test should be a part of every biostatistician’s toolbox. Same is true for proportional hazards models, to be discussed next week by Nan Hu. Summary • The survival analysis assumption of “noninformative censoring” usually cannot be formally tested and should be assumed not to hold. So, if a study has substantial dropout, there may be bias. Compare the characteristics of dropouts to others to get an idea of how bad the situation could be. Summary • The usual logrank test can be viewed as just one member of a class of tests with different weightings of each event time. This test weights all event times equally, and is usually preferred as it’s uniformly most powerful when one treatment’s benefit (in terms of relative risk) is constant throughout follow-up. Variants may be preferred for nonstandard scenarios. Bibliography • Fleming R, Harrington DP. Counting Processes and Survival Analysis. 1991. Wiley. • Kaplan EL, Meier P (1958). Nonparametric estimation from incomplete observations. JASA 53:457–481, 1958. • Klein JP, Moeschberger ML. Survival Analysis: Techniques for Censored and Truncated Data. 2nd ed. 2003. Springer. • Mantel, N (1966). Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemotherapy Reports 50(3): 163–70. • Peto R, Pike MC, Armitage P, et al. (1977) Design and analysis of randomized clinical trials requiring prolonged observation of each patient, Part II. British Journal of Cancer 35: 1-39. • Turnbull, B (1976). The empirical distribution function with arbitrarily grouped, censored and truncated data. JRSS-B, 38: 290-295. Next Seminar • Nan Hu, Ph.D. will be discussing the Cox proportional hazards model, one week from today on September 30th.

Wilcoxon Rank Sum Test Description

Related documents

Products

Support

Wilcoxon Rank Sum Test Description

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib