Outline • General endpoint considerations • Surrogate endpoints • Composite endpoints and recurrent events • Safety outcomes (adverse events) Composite Event (def.) “An event that is considered to have occurred if any one of several different events or outcomes are observed.” Meinert CL. Clinical Trials Dictionary, 1996. Combined Endpoint = Composite Event Examples of Combined Endpoints Study Endpoint Multiple Risk Factor Intervention Trial (MRFIT) CHD death (MI, sudden death) Systolic Hypertension in the Elderly Trial (SHEP) Fatal or non-fatal stroke Physician’s Health Study Fatal/non-fatal myocardial infarction Fatal/non-fatal stroke START HIV Study Serious AIDS, serious non-AIDS or death GISSI-2* Death Late congestive heart failure EF < 35% 45% or more injured myocardial segments QRS score < 10 * Non-fatal events treated hierarchically Survey of Cardiovascular Trials • Composite outcomes in CVD trials are frequent (37% of 1,231 published trials) • Typically comprise 3-4 individual components • More components were used in the composite outcome in smaller in trials • The components vary in their clinical significance; death was the most common component included Ann Intern Med 2008;149:612-617 Composite Examples for Heart Failure (HF) Studies: Time to Event Analysis • Time to the 1st occurrence of any of the outcomes that are part of the combined endpoint • Examples: – Time to death or hospitalization – Time to death or CVD hospitalization – Time to CVD death or or CVD hospitalization – Time to CVD death or hospitalization for HF (more sensitive to treatment differences particularly among patients with less severe heart failure) Composite Example: CVD Death or HF Hospitalization Patient 1 X HF Hosp. 2 X CVD Death 0 3 Non-CVD Death 4 X X X HF Hosp.HF Hosp. HF Hosp. 0 Follow-up Time X CVD Death t Progression to AIDS Endpoint (A Composite with Many Components) • • • • • • • • • • • • • • • • • • Cryptosporidiosis Isosporiasis Toxoplasmosis Mycobacterium avium, other non-tuberculous mycobacterial infections Mycobacterium tuberculosis, extrapulmonary or pulmonary Cryptococcosis Histoplasmosis Cytomegalovirus disease Lymphoma Kaposi’s sarcoma (visceral) HIV encephalopathy or AIDS dementia complex, Stage 2 or higher Progressive multifocal leukoencephalopathy HIV wasting syndrome Pneumocystis carinii, pulmonary or extrapulmonary Candidiasis, esophageal or pulmonary Herpes simplex bronchitis, pneumonitis, esophagitis Herpes zoster, disseminated Non-typhoidal Salmonella septicemia Clinical Relevance? Patient 1 0 X Candidiasis End of Study X 2 Death X 3 Candidiasis 0 X X 0 PCP MAI End of Study Follow-up Time t Composites or Combined Endpoints Rationale • More events = greater power (or smaller sample size or shorter trial duration) (maybe) • Inclusion of some components may reduce/eliminate bias due to informative censoring (but may result in a loss of power) • A solution to handling disagreement over which outcome should be primary (not always the best solution) Freemantle N et al, JAMA 2003. Composite Endpoint Cautions Loss of power if: • Treatment has little or no effect on some components • Early events are less likely to represent “treatment failures” compared to later events (Yusuf and Negassa referred to this as “masking” of events) Unclear interpretation if: • Components show a different pattern for treatments • Less serious or more subjectively assessed events are accounting for treatment difference • “Mixing apples and oranges” Neaton JD et al, Stat Med 1994 and Yusuf S and Negassa A, Amer Heart J 2002. Adding a Component to a Composite Does Not Always Have a Favorable Effect on Sample Size • 10% versus 5% event rate – 1,170 patients total • Add a new component • 30% versus 15% event rate – 330 patients • 30% versus 22.5% event rate – 1,450 patients Alpha = 0.05 (2-sided) and power = 0.90 Neaton J et al, J Cardiac Failure 2005 Informative Censoring - 1 Patient 1 X HF Hosp. 2 X CVD Death 0 3 Non-CVD Death 4 X X X HF Hosp.HF Hosp. HF Hosp. 0 Follow-up Time X CVD Death t Informative Censoring - 2 • If a patient dying from a non-CVD cause would have had a different risk of HF hospitalization (had they survived) than survivors, the censoring is “informative”. • Bias could result if risk of non-CVD death varied by treatment group. PICO HF Trial: Ranked Clinical Outcome at 24 Weeks Assigned Treatment Pimobendan Placebo (N=209) (N=108) Test same/higher than baseline 132 (63%) 64 (59%) Test lower duration than baseline 48 (23%) 34 (31%) Too sick to undergo exercise test 5 (2%) 4 (4%) 24 (12%) 6 (6%) Died before 24 weeks P=0.5 for 63% versus 59%; P < 0.05 for difference in exercise duration. Women’s Angiographic Vitamin and Estrogen Trial (WAVE) • Objective: to determine whether HRT or antioxidant vitamin supplements influenced the progression of coronary artery disease as measured by serial angiograms (2x2 factorial study). • Target population: women with 15-75% coronary stenosis at entry. • Primary endpoint: change in lumen diameter; deaths and MIs assigned worst rank. JAMA 2002; 288: 2432-2440. Freemantle Guidelines for Reporting 1. Components of composite outcomes should always be defined as secondary outcomes and reported alongside the results of the primary analysis, preferably in a table. 2. Ensure that the reporting of composite outcomes is clear and avoids the suggestion that individual components of the composite have been demonstrated to be effective. 3. Systematic overviews and quantitative meta-analysis should be used to identify the effects of treatments on rare but important endpoints that may be included as part of composite outcomes in individual trials. Freemantle N, et al. JAMA 2003. Guide to Interpreting Composite End Points 1. Are the component end points of similar importance to patients? 2. Did the more and less important end points occur with similar frequency? 3. Is the underlying biology of the component end points similar? 4. Are the point estimates of the relative risk reduction similar and the confidence intervals sufficiently narrow? Montori VM et al, BMJ 2005. Recommendations on Reporting of Composite Outcomes • How often did each component contribute to composite outcome (descriptive)? • What is the relative hazard for each component of the composite - the separate number of events and rate for each component (“Consumer Reports approach”)? Multiple Outcomes are a Necessity, So No Matter What You Do… • Collect data on all components of the combined endpoint for trial duration • Report not only the combined endpoint, but also: – how often each component contributed to it – the separate number of events and rate for each component (“Consumer Reports approach”) • See NuCOMBO (N Eng J Med 1996) and EPHESUS (N Eng J Med 2003) trials for good examples of composite outcome reporting. Example: NuCOMBO AIDS Trial How often did each component occur as 1st event? AZT (N=372) AZT+ddI (N=363) Death PCP Esophageal Candidiasis MAC CMV Other AIDS infections Malignancies Other conditions AIDS/Death 75 32 30 54 51 22 23 20 27 30 28 29 9 10 226 13 17 244 Hazard ratio: 0.86 (0.71 to 1.03) Example: NuCOMBO AIDS Trial What is the separate incidence of each component of the combined endpoint “Consumer Reports approach”? AZT+ddI (N=363) 176 42 43 Death PCP Esophageal Candidiasis MAC 42 CMV 49 Other AIDS 37 infections Malignancies 19 Other conditions 17 AIDS/Death 226 AZT (N=372) Hazard Ratio 191 60 42 0.88 0.65 0.97 58 49 38 0.66 0.96 0.94 27 26 244 0.64 0.60 0.86 Composite Endpoint Pitfalls • Components of composite usually vary in severity and in impact on quality of life • Time to event analyses usually focus on 1st event and ignore multiple events of the same or different types. Weighting the Components of Composite Outcomes • Risk of death associated with different components • Rank-ordering of outcomes in terms of severity and quality of life by clinicians and patients • Rating the entire event profile Some Approaches for Accounting for Severity of Events and Event Histories • Ranking of entire event histories (Follmann et. al., Stat Med 1992) • Marginal models with ranking of events according to risk of death or subjective ranking by clinicians and/or patients (Neaton et.al., Stat Med 1994) • Rule based ranking (Bjorling and Hodges, Stat Med, 1997) - Severity, timing, number • Weights determined by clinical investigators for trials of thrombolytic therapy (Armstrong P et al, Am Heart J, 2011) [death 1.0, shock 0.5, CHF 0.3, recurrent MI 0.2] • Matched pairs (Win Ratio) for heart failure trials (Pocock S et al, Euro Heart J, 2012) Considerations in Analysis of All Events • Events are not independent – SE’s have to be adjusted • 2nd, 3rd … events may not add much to signal from 1st event • A loss of power could result with an analysis of all events if treatment was modified after 1st event Recurrent Events of the Same Type HF hospitalizations (Euro J Heart Fail 2014; 16:33-40) COPD exacerbations (N Engl J Med 2011; 365:689-698) Bacteriuria and pyuria at repeated visits in elderly women (JAMA 1994; 271:751-754) Other examples: Fungal infections Transient ischemic attacks Seizures in epileptic patients Statistical methods: Poisson and negative binomial regression; generalized linear mixed models. Example: COPD Exacerbations (N Engl J Med 2011) • Fixed follow-up of 12 months • 741 exacerbations among 558 participants given azithromycin (317 had at least one event) • 900 exacerbations among 559 participants given placebo (380 had at least one event) • HR (1st event)=0.73 (95% CI: 0.63-0.84; p<0.001) • RR (negative binomial regression) = 0.83 (95% CI: 0.72-0.95; p=0.01); p<0.001 by Poisson regression. Example: Heart Failure Hospitalizations (Euro J Heart Fail 2014) • Variable follow-up (median=36.6 months) • 392 HF hospitalizations among 1,514 participants given candesartan (230 had at least one event) • 547 HF hospitalizations among 1,509 participants given placebo (278 had at least one event) • HR (1st event)=0.82 (95% CI: 0.70-0.97; p=0.018) • RR (Poisson regression) = 0.71 (95% CI: 0.620.81; p<0.001); RR (negative binomial regression) =0.68 (95% CI: 0.54-0.85) (lower point estimate but wider CI) Alternatives to Composite or Combined Endpoints • Single outcome (e.g., all-cause mortality) • Co-primary endpoints (requires an adjustment to Type I error if success is defined as “significant” on any) • Global index (may not be easily interpretable) • Hierarchical scoring/ranking of multiple outcomes • Primary + supportive outcome (SMART) Multiple Primary Endpoints • Different than a single combined endpoint • Type I error adjustment may be required (usually is) • Strategy for controlling type I error depends on research question Early HIV (High CD4+) Treatment Trial: CoPrimary Endpoints or Single Composite? • Serious AIDS – Any fatal AIDS event – Non-fatal AIDS events except herpes simplex, esophageal candidiasis and pulmonary tuberculosis • Serious non-AIDS – Non-AIDS deaths – CV disease – Liver disease – Renal disease – Non-AIDS malignancies (excluding skin cancer) What is the question? Four possible alternative hypotheses? • HA: Treatment effect in at least one of K endpoints • HA: Treatment effect in all K endpoints (no type I error adjustment needed) • HA: Treatment effect in M of K endpoints • HA: Treatment effect in weighted average of K endpoints Capizzi T, Zhang J. Drug Info J, 30:949-956, 1996. Strategies for (type I error) Adjustment for 1st Hypothesis:Treatment effect in at least 1 of K endpoints Bonferroni adjustment most common -- conservative Suppose there are 2 co-primary endpoints. Prob [no type 1 error for trial (T)] = 1- T = (1- 1)(1- 2) and T = 1 - (1- 1)(1- 2) is the level for trial For case of 1=2 = 0.05, T =0.098 (unacceptably high) For T =0.05, each = 1- (1- T)1/2 = 0.0253 or more generally 1- (1- T)1/n This is approximately equal to T/n or 0.05/2=0.025 for this case Example: EPHESUS heart failure study of eplerenone (Cardio Drugs and Therapy,15:79-87, 2001) -- 2 primary endpoints – total mortality (0.04) and CV mortality or morbidity (0.01); overall study type 1 error of 0.05. Other Strategies • Global tests, e.g., MANOVA and Hotelling’s T2 (good approach if endpoints are not correlated) or O’Brien’s rank test (best when all outcomes are expected to go in the same direction). Problem – not specific enough. • Sequential testing procedures, e.g., Holm’s step-down procedure or Hochberg’s step-up procedure (both less conservative than Bonferroni) – marginal testing with control of overall error rate Example • 4 endpoints (ordered by p-values): p=0.081; p=0.024; p=0.020; p=0.005 • Bonferroni: judge each against 0.05/4=0.0125; only 4th endpoint is significant • Holm step-down: reject 4th endpoint since p=0.005<0.0125; p-value for 3rd endpoint = 0.020 > 0.05/3=0.017, therefore stop and accept H0 for other 3 endpoints • Hochberg step-up: accept H0 for 1st endpoint since 0.081 > 0.05; reject H0 for 2nd endpoint and all remaining endpoints since 0.024< 0.05/2=0.025. Sankoh et al Stat Med 16:2529-2542, 1997 O’Brien’s Rank Sum Procedure • Rank the responses of patients for each of the K endpoints, e.g., Wilcoxon’s rank sum test • Sum the ranks for each patients • Carry out an analysis of variance (ANOVA) on the sum of the ranks O’Brien P. Biometrics 40:1079-1087, 1984. See TOMHS report in JAMA for application Advantages and Disadvantages of Different Approaches to Defining Primary Endpoint Single outcome Composite Co-primary outcomes Advantage Simple Sample size Eggs not all in one basket Disadvantage Sample size; multiple endpoints are a reality Interpretation not easy if components show different patterns Sample size and power Global index Power Not easily interpretable Hierarchical scoring Clinical relevance Power; clinical relevance Summary • In study planning, focus on methods for defining, ascertaining, and measuring major endpoints. • Composite outcomes can be difficult to interpret if the components do not go in the same direction – choose components carefully. • If not primary, define secondary endpoints using all events during follow-up. • A “Consumer Reports” analysis should be kept in mind for reporting – full disclosure of all relevant outcomes.