Lecture10

advertisement
Outline
• General endpoint considerations
• Surrogate endpoints
• Composite endpoints and recurrent events
• Safety outcomes (adverse events)
Composite Event (def.)
“An event that is considered to have occurred if
any one of several different events or
outcomes are observed.”
Meinert CL. Clinical Trials Dictionary, 1996.
Combined Endpoint = Composite Event
Examples of Combined Endpoints
Study
Endpoint
Multiple Risk Factor
Intervention Trial (MRFIT)
CHD death (MI, sudden death)
Systolic Hypertension in the
Elderly Trial (SHEP)
Fatal or non-fatal stroke
Physician’s Health Study
Fatal/non-fatal myocardial
infarction
Fatal/non-fatal stroke
START HIV Study
Serious AIDS, serious non-AIDS or death
GISSI-2*
Death
Late congestive heart failure
EF < 35%
45% or more injured myocardial segments
QRS score < 10
* Non-fatal events treated hierarchically
Survey of Cardiovascular Trials
• Composite outcomes in CVD trials are frequent
(37% of 1,231 published trials)
• Typically comprise 3-4 individual components
• More components were used in the composite
outcome in smaller in trials
• The components vary in their clinical significance;
death was the most common component included
Ann Intern Med 2008;149:612-617
Composite Examples for Heart
Failure (HF) Studies:
Time to Event Analysis
• Time to the 1st occurrence of any of the outcomes that are
part of the combined endpoint
• Examples:
– Time to death or hospitalization
– Time to death or CVD hospitalization
– Time to CVD death or or CVD hospitalization
– Time to CVD death or hospitalization for HF (more sensitive
to treatment differences particularly among patients with
less severe heart failure)
Composite Example: CVD Death or HF
Hospitalization
Patient
1
X
HF Hosp.
2
X
CVD Death
0
3
Non-CVD Death
4
X
X
X
HF Hosp.HF Hosp. HF Hosp.
0
Follow-up Time
X
CVD
Death
t
Progression to AIDS Endpoint (A Composite with
Many Components)
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Cryptosporidiosis
Isosporiasis
Toxoplasmosis
Mycobacterium avium, other non-tuberculous mycobacterial infections
Mycobacterium tuberculosis, extrapulmonary or pulmonary
Cryptococcosis
Histoplasmosis
Cytomegalovirus disease
Lymphoma
Kaposi’s sarcoma (visceral)
HIV encephalopathy or AIDS dementia complex, Stage 2 or higher
Progressive multifocal leukoencephalopathy
HIV wasting syndrome
Pneumocystis carinii, pulmonary or extrapulmonary
Candidiasis, esophageal or pulmonary
Herpes simplex bronchitis, pneumonitis, esophagitis
Herpes zoster, disseminated
Non-typhoidal Salmonella septicemia
Clinical Relevance?
Patient
1
0
X
Candidiasis
End of
Study
X
2
Death
X
3
Candidiasis
0
X
X
0
PCP
MAI
End of
Study
Follow-up Time
t
Composites or Combined Endpoints
Rationale
• More events = greater power (or smaller sample
size or shorter trial duration) (maybe)
• Inclusion of some components may
reduce/eliminate bias due to informative censoring
(but may result in a loss of power)
• A solution to handling disagreement over which
outcome should be primary (not always the best
solution)
Freemantle N et al, JAMA 2003.
Composite Endpoint Cautions
Loss of power if:
• Treatment has little or no effect on some
components
• Early events are less likely to represent “treatment
failures” compared to later events (Yusuf and
Negassa referred to this as “masking” of events)
Unclear interpretation if:
• Components show a different pattern for
treatments
• Less serious or more subjectively assessed
events are accounting for treatment difference
• “Mixing apples and oranges”
Neaton JD et al, Stat Med 1994 and Yusuf S and Negassa A, Amer Heart J 2002.
Adding a Component to a Composite
Does Not Always Have a Favorable Effect
on Sample Size
• 10% versus 5% event rate – 1,170 patients
total
• Add a new component
• 30% versus 15% event rate – 330 patients
• 30% versus 22.5% event rate – 1,450
patients
Alpha = 0.05 (2-sided) and power = 0.90
Neaton J et al, J Cardiac Failure 2005
Informative Censoring - 1
Patient
1
X
HF Hosp.
2
X
CVD Death
0
3
Non-CVD Death
4
X
X
X
HF Hosp.HF Hosp. HF Hosp.
0
Follow-up Time
X
CVD
Death
t
Informative Censoring - 2
• If a patient dying from a non-CVD cause
would have had a different risk of HF
hospitalization (had they survived) than
survivors, the censoring is “informative”.
• Bias could result if risk of non-CVD death
varied by treatment group.
PICO HF Trial: Ranked Clinical Outcome
at 24 Weeks
Assigned Treatment
Pimobendan
Placebo
(N=209)
(N=108)
Test same/higher than baseline
132 (63%)
64 (59%)
Test lower duration than baseline
48 (23%)
34 (31%)
Too sick to undergo exercise test
5 (2%)
4 (4%)
24 (12%)
6 (6%)
Died before 24 weeks
P=0.5 for 63% versus 59%; P < 0.05 for difference in exercise duration.
Women’s Angiographic Vitamin and
Estrogen Trial (WAVE)
• Objective: to determine whether HRT or
antioxidant vitamin supplements influenced the
progression of coronary artery disease as
measured by serial angiograms (2x2 factorial
study).
• Target population: women with 15-75% coronary
stenosis at entry.
• Primary endpoint: change in lumen diameter;
deaths and MIs assigned worst rank.
JAMA 2002; 288: 2432-2440.
Freemantle Guidelines for Reporting
1. Components of composite outcomes should always be
defined as secondary outcomes and reported alongside the
results of the primary analysis, preferably in a table.
2. Ensure that the reporting of composite outcomes is clear and
avoids the suggestion that individual components of the
composite have been demonstrated to be effective.
3. Systematic overviews and quantitative meta-analysis should
be used to identify the effects of treatments on rare but
important endpoints that may be included as part of
composite outcomes in individual trials.
Freemantle N, et al. JAMA 2003.
Guide to Interpreting Composite
End Points
1. Are the component end points of similar importance to
patients?
2. Did the more and less important end points occur with
similar frequency?
3. Is the underlying biology of the component end points
similar?
4. Are the point estimates of the relative risk reduction similar
and the confidence intervals sufficiently narrow?
Montori VM et al, BMJ 2005.
Recommendations on Reporting of
Composite Outcomes
• How often did each component contribute to
composite outcome (descriptive)?
• What is the relative hazard for each
component of the composite - the separate
number of events and rate for each
component (“Consumer Reports approach”)?
Multiple Outcomes are a Necessity,
So No Matter What You Do…
• Collect data on all components of the combined
endpoint for trial duration
• Report not only the combined endpoint, but also:
– how often each component contributed to it
– the separate number of events and rate for each
component (“Consumer Reports approach”)
• See NuCOMBO (N Eng J Med 1996) and EPHESUS
(N Eng J Med 2003) trials for good examples of
composite outcome reporting.
Example: NuCOMBO AIDS Trial
How often did each component occur as 1st event?
AZT
(N=372)
AZT+ddI
(N=363)
Death
PCP
Esophageal
Candidiasis
MAC
CMV
Other AIDS
infections
Malignancies
Other conditions
AIDS/Death
75
32
30
54
51
22
23
20
27
30
28
29
9
10
226
13
17
244
Hazard ratio: 0.86 (0.71 to 1.03)
Example: NuCOMBO AIDS Trial
What is the separate incidence of each component of the
combined endpoint “Consumer Reports approach”?
AZT+ddI
(N=363)
176
42
43
Death
PCP
Esophageal
Candidiasis
MAC
42
CMV
49
Other AIDS
37
infections
Malignancies
19
Other conditions 17
AIDS/Death
226
AZT
(N=372)
Hazard
Ratio
191
60
42
0.88
0.65
0.97
58
49
38
0.66
0.96
0.94
27
26
244
0.64
0.60
0.86
Composite Endpoint Pitfalls
• Components of composite usually vary in
severity and in impact on quality of life
• Time to event analyses usually focus on 1st
event and ignore multiple events of the same
or different types.
Weighting the Components of
Composite Outcomes
• Risk of death associated with different
components
• Rank-ordering of outcomes in terms of
severity and quality of life by clinicians and
patients
• Rating the entire event profile
Some Approaches for Accounting for
Severity of Events and Event Histories
• Ranking of entire event histories (Follmann et. al., Stat Med
1992)
• Marginal models with ranking of events according to
risk of death or subjective ranking by clinicians
and/or patients (Neaton et.al., Stat Med 1994)
• Rule based ranking (Bjorling and Hodges, Stat Med, 1997)
- Severity, timing, number
• Weights determined by clinical investigators for
trials of thrombolytic therapy (Armstrong P et al, Am Heart J,
2011) [death 1.0, shock 0.5, CHF 0.3, recurrent MI 0.2]
• Matched pairs (Win Ratio) for heart failure trials
(Pocock S et al, Euro Heart J, 2012)
Considerations in Analysis of All Events
• Events are not independent – SE’s have to be
adjusted
• 2nd, 3rd … events may not add much to
signal from 1st event
• A loss of power could result with an analysis of
all events if treatment was modified after 1st
event
Recurrent Events of the Same Type
 HF hospitalizations (Euro J Heart Fail 2014; 16:33-40)
 COPD exacerbations (N Engl J Med 2011; 365:689-698)
 Bacteriuria and pyuria at repeated visits in
elderly women (JAMA 1994; 271:751-754)
 Other examples:
 Fungal infections
 Transient ischemic attacks
 Seizures in epileptic patients
 Statistical methods: Poisson and negative
binomial regression; generalized linear mixed
models.
Example: COPD Exacerbations
(N Engl J Med 2011)
• Fixed follow-up of 12 months
• 741 exacerbations among 558 participants
given azithromycin (317 had at least one
event)
• 900 exacerbations among 559 participants
given placebo (380 had at least one event)
• HR (1st event)=0.73 (95% CI: 0.63-0.84;
p<0.001)
• RR (negative binomial regression) = 0.83
(95% CI: 0.72-0.95; p=0.01); p<0.001 by
Poisson regression.
Example: Heart Failure Hospitalizations
(Euro J Heart Fail 2014)
• Variable follow-up (median=36.6 months)
• 392 HF hospitalizations among 1,514 participants
given candesartan (230 had at least one event)
• 547 HF hospitalizations among 1,509 participants
given placebo (278 had at least one event)
• HR (1st event)=0.82 (95% CI: 0.70-0.97; p=0.018)
• RR (Poisson regression) = 0.71 (95% CI: 0.620.81; p<0.001); RR (negative binomial regression)
=0.68 (95% CI: 0.54-0.85) (lower point estimate but wider CI)
Alternatives to Composite
or Combined Endpoints
• Single outcome (e.g., all-cause mortality)
• Co-primary endpoints (requires an adjustment to Type I
error if success is defined as “significant” on any)
• Global index (may not be easily interpretable)
• Hierarchical scoring/ranking of multiple outcomes
• Primary + supportive outcome (SMART)
Multiple Primary Endpoints
• Different than a single combined endpoint
• Type I error adjustment may be required
(usually is)
• Strategy for controlling type I error depends
on research question
Early HIV (High CD4+) Treatment Trial: CoPrimary Endpoints or Single Composite?
• Serious AIDS
– Any fatal AIDS event
– Non-fatal AIDS events except herpes simplex, esophageal
candidiasis and pulmonary tuberculosis
• Serious non-AIDS
– Non-AIDS deaths
– CV disease
– Liver disease
– Renal disease
– Non-AIDS malignancies (excluding skin cancer)
What is the question? Four possible
alternative hypotheses?
• HA: Treatment effect in at least one of K endpoints
• HA: Treatment effect in all K endpoints (no type I
error adjustment needed)
• HA: Treatment effect in M of K endpoints
• HA: Treatment effect in weighted average of K
endpoints
Capizzi T, Zhang J. Drug Info J, 30:949-956, 1996.
Strategies for  (type I error) Adjustment
for 1st Hypothesis:Treatment effect in at
least 1 of K endpoints
Bonferroni adjustment most common -- conservative
Suppose there are 2 co-primary endpoints.
Prob [no type 1 error for trial (T)] =
1- T = (1- 1)(1- 2) and
T = 1 - (1- 1)(1- 2) is the  level for trial
For case of 1=2 = 0.05, T =0.098 (unacceptably high)
For T =0.05, each  = 1- (1- T)1/2 = 0.0253 or more generally 1- (1- T)1/n
This is approximately equal to T/n or 0.05/2=0.025 for this case
Example: EPHESUS heart failure study of eplerenone (Cardio Drugs and
Therapy,15:79-87, 2001) -- 2 primary endpoints – total mortality (0.04) and CV
mortality or morbidity (0.01); overall study type 1 error of 0.05.
Other Strategies
• Global tests, e.g., MANOVA and Hotelling’s
T2 (good approach if endpoints are not
correlated) or O’Brien’s rank test (best when
all outcomes are expected to go in the same
direction). Problem – not specific enough.
• Sequential testing procedures, e.g., Holm’s
step-down procedure or Hochberg’s step-up
procedure (both less conservative than
Bonferroni) – marginal testing with control of
overall error rate
Example
• 4 endpoints (ordered by p-values): p=0.081;
p=0.024; p=0.020; p=0.005
• Bonferroni: judge each against 0.05/4=0.0125;
only 4th endpoint is significant
• Holm step-down: reject 4th endpoint since
p=0.005<0.0125; p-value for 3rd endpoint = 0.020 >
0.05/3=0.017, therefore stop and accept H0 for
other 3 endpoints
• Hochberg step-up: accept H0 for 1st endpoint since
0.081 > 0.05; reject H0 for 2nd endpoint and all
remaining endpoints since 0.024< 0.05/2=0.025.
Sankoh et al Stat Med 16:2529-2542, 1997
O’Brien’s Rank Sum Procedure
• Rank the responses of patients for each of
the K endpoints, e.g., Wilcoxon’s rank sum
test
• Sum the ranks for each patients
• Carry out an analysis of variance (ANOVA)
on the sum of the ranks
O’Brien P. Biometrics 40:1079-1087, 1984.
See TOMHS report in JAMA for application
Advantages and Disadvantages of Different
Approaches to Defining Primary Endpoint
Single
outcome
Composite
Co-primary
outcomes
Advantage
Simple
Sample size
Eggs not all in
one basket
Disadvantage
Sample size; multiple
endpoints are a reality
Interpretation not easy if
components show
different patterns
Sample size and power
Global index Power
Not easily interpretable
Hierarchical
scoring
Clinical relevance
Power; clinical
relevance
Summary
• In study planning, focus on methods for defining,
ascertaining, and measuring major endpoints.
• Composite outcomes can be difficult to interpret if
the components do not go in the same direction –
choose components carefully.
• If not primary, define secondary endpoints using
all events during follow-up.
• A “Consumer Reports” analysis should be kept in
mind for reporting – full disclosure of all relevant
outcomes.
Download