presentation_6-3-2013-11-37-6

advertisement
Meta-Analysis of Clinical Data for Regulated
Biopharmaceutical Products:
Answers to Frequently Asked Questions
Brenda Crowe, Research Advisor, Eli Lilly and Company
With special thanks to Jesse Berlin
Midwest Biopharmaceutical statistics workshop
May 21, 2013
Disclaimer
• The views expressed herein represent those of the
presenter and do not necessarily represent the views
or practices of the presenter’s employer or any other
party
MBSW May 21, 2013
2
Acknowledgements
Jesse Berlin
Amy Xia
Juergen Kuebler
Ed Whalen
Carol Koro
MBSW May 21, 2013
3
Agenda
• Background
• The 6 questions
–
–
–
–
–
–
What studies should be pooled/combined?
Method of ascertainment?
Individual patient data (vs. aggregate patient data)?
Multiple looks and/or multiple endpoints?
Heterogeneity of design and results?
Fixed-effect models or random-effects models?
• Concluding remarks
MBSW May 21, 2013
4
Background
• During drug development, sponsors need to recognize
safety signals early and adjust the development program
accordingly
• Crowe et al. (SPERT): overview of the framework and
planning of MA in drug development but did not provide
details regarding practical issues arising during
implementation.
• Focus here on common analytical topics (6 questions)
• Emphasis on situations that arise in drug development,
mostly premarketing
SPERT = Safety Planning, evaluation and Reporting Team
MBSW May 21, 2013
5
A little vocabulary (in today’s
context)
• POOL (noun): a grouping of studies used to address
a specific research question
•
Swimming in data (avoid drowning)
MBSW May 21, 2013
6
Q1: WHAT STUDIES SHOULD
BE POOLED IN THE METAANALYSIS?
MBSW May 21, 2013
7
Existing Guidance
FDA guidance on
premarketing risk
assessment
MBSW May 21, 2013
8
Existing Guidance
International
Conference on
Harmonization
(ICH) M4E
MBSW May 21, 2013
9
Existing Guidance
Council for
International
Organizations of
Medical Sciences
VI (CIOMS VI)
report
MBSW May 21, 2013
10
What to pool?
• Decisions on what to combine depend on the
specific questions to be answered (duh)
• Often there are several questions and these might
require different subsets of studies or subjects
MBSW May 21, 2013
11
Pools may be based on
•
•
•
•
Type of control: placebo vs. active
Dose route or regimen
Concomitant (background) therapy
Methods of eliciting adverse events (e.g., active vs.
passive).
• Disease state
• Duration of treatment (and follow-up?)
• Subgroups of patients based on age groups,
geographies, ethnicity groups, or severity of disease,
etc.
MBSW May 21, 2013
12
Table to help pick the right studies
MBSW May 21, 2013
13
Considerations for inclusion in a pool
Usually exclude
• Phase 1 pharmacokinetic and pharmacodynamic
studies (because short duration, healthy subjects or
patients with incurable end-stage disease).
• Studies that cannot / will not provide individual
patient level data if required for analysis.
MBSW May 21, 2013
14
Considerations for inclusion in a pool
• It is generally most appropriate to combine data from
studies that are similar.
• Strong similarity is not required for pooling, if the
effects of treatment don’t depend on the trial
characteristics being considered.
MBSW May 21, 2013
15
For example . . .
• Suppose some studies (or arms) were conducted
at a higher dose than the sponsor is proposing
for the marketing label. Would you exclude those
arms from the analysis?
• Yes, if the goal for those analyses is to
characterize adverse events from proposed
indications at the proposed doses.
•
However, one might choose to combine the high-dose
studies or arms in a different pool to help assess what
could happen in an overdose situation.
MBSW May 21, 2013
16
Studies (or arms) at a higher or lower
dose than proposed for marketing?
• In general, exclude dose arms that are lower
than the proposed dose for marketing, as these
may dilute the effects seen at the higher
marketed dose
–
However, events may occur in the lower dose studies
that should not be ignored
– Including low-dose and high-dose studies may help
understand the dose-response relationship
MBSW May 21, 2013
17
AEs in all those who took the drug?
• Can analyze ALL who took drug as a single cohort
without a comparator group: useful for accounting for all
events and estimating event rates for infrequent events
• Can then be compared to external reference population
rates
• However, external population rates limited by the
availability of event rates for a specific subset of the
population that is comparable to the trial population
–
If the underlying disease increases the risk of a particular event,
comparisons with an external reference could be biased against the study
drug.
– Conversely, if enrollment criteria are such that high-risk patients are
excluded from trials, the on-study rates could appear to be artificially low.
MBSW May 21, 2013
18
Hypothesis generating studies?
• What if a safety signal was detected in Phase 2
that resulted in a change in ascertainment of an
AE in Phase 3 (e.g., an adjudication process,
special case report form)?
•
Create a grouping of Phase 3 studies designed
for that particular event
• Advantages
•
Studies with consistent ascertainment analyzed together
• Excludes studies that generated the hypothesis being
tested
MBSW May 21, 2013
19
Hypothesis generating studies (cont.)
• Previous addresses type I error but
•
•
sacrifices statistical power
discards data from what may be studies in a closely monitored
population, which may also be at differential risk due to exposure to
the compound
• And it can raise all kinds of red flags (so
transparency is key – do the analysis with and
without those studies)
MBSW May 21, 2013
20
Caveats
• Do not do a crude unstratified analysis that
combines studies with a comparator and
studies without a comparator.
•
Results can be very misleading. See Lièvre
2002, Chuang-Stein 2010 for further information
on dangers of not stratifying.
MBSW May 21, 2013
21
Q2: HOW DOES THE METHOD
OF ASCERTAINMENT IMPACT
THE QUALITY OF THE METAANALYSIS?
MBSW May 21, 2013
22
Ascertainment method
• Can affect observed event rates, e.g., actively
solicited events will have higher reporting rates than
passively collected events
• E.g., for drugs that cross the blood–brain barrier, use
prospective tool to assess suicidal ideation and
behavior (vs. post hoc adjudication)
MBSW May 21, 2013
23
Retrospective adjudication
Even with strict criteria using previously collected data,
bias could be introduced by retrospective
adjudication
–
Important detailed clinical information may be missing
• If post hoc adjudication is necessary, use an
external, independent adjudication committee that
–
–
Is masked to treatment assignment AND
Adjudicates events across the entire development program
MBSW May 21, 2013
24
Q3: WHAT ARE THE
ADVANTAGES OF USING
INDIVIDUAL PATIENT DATA
(VS. AGGREGATE
SUMMARIES)?
MBSW May 21, 2013
25
Individual or aggregate-level data?
• For many questions get same answer with IPD as with
APD
• For analyses that do not require patient-level data,
including all relevant studies improves precision
• May also reduce bias that could be introduced by limiting
the analysis to those where patient-level data are
available
• However, there can be advantages to IPD
• Much easier to detect interactions between treatment and
patient-level characteristic with IPD than with APD
MBSW May 21, 2013
26
Advantages of patient-level data
• Allows mapping all data to a common version of
MedDRA (or other) increasing consistency of
terminology across trials
• Generally permits creation of common variables
across trials
•
•
E.g., age categories may have been defined using different
category boundaries
Different threshold hemoglobin values may have been used to
define ‘anemia’
MBSW May 21, 2013
27
More advantages of IPD
• Allows specification of a common set of patient-level
covariates so subgroup analyses across trials can be
performed
• Can define outcomes based on combinations of
variables defining specific events but that may
indicate a common mechanism, e.g., a combination
of weight loss or appetite reduction
MBSW May 21, 2013
28
And still more advantages of IPD
• Post hoc analyses of outcomes that require
adjudication can sometimes be derived, as in the
case of suicide event grading according to Columbia
Classification Algorithm of Suicide Assessment (CCASA criteria)
• Creation of time-to-event variables (may not be
available in publications)
•
Flexibility in defining time periods of interest for analyses, e.g.,
events occurring during “short-term” follow-up
MBSW May 21, 2013
29
Why not always use IPD?
• Integration required to provide the database is labor
intensive, especially if done in retrospect
• Sometimes summary statistics may be the only
information available for some studies of interest,
e.g.,
•
•
studies of a new therapeutic approach done by an academic group
that does not share patient-level data, or
the drug of interest may have been included as an active control by
another sponsor
MBSW May 21, 2013
30
Q4: SHOULD WE ADJUST FOR
MULTIPLE LOOKS AND/OR
MULTIPLE ENDPOINTS IN THE
CONTEXT OF METAANALYSIS?
MBSW May 21, 2013
31
Q4: Multiple comparisons
• Complicated by having multiple looks over time and
multiple (and an unknown number of) endpoints
• Safety Planning, Evaluation, and Reporting Team
(SPERT) defined “Tier 1 events” as those for which a
prespecified hypothesis has been defined
MBSW May 21, 2013
32
Tier 1 Events
• E.g., to rule out an effect of a certain magnitude for
assessing a particular risk (a noninferiority test – as
for diabetes drugs)
• Generally, should consider performing formal
adjustment for multiple looks for Tier 1 events and
for multiple endpoints for other events
MBSW May 21, 2013
33
Diabetes drugs
• Need to rule out a relative risk of 1.8 (for CV events) for
conditional approval, and 1.3 for final approval
• Confidence level for that specific outcome may need to
be adjusted for multiple looks, which can be considered
separately from non-Tier 1 events because it needs to be
met for the drug to move forward
• An event of interest: important regardless of the specific
side effect profile and
• Analogous to a primary analysis in the efficacy setting
MBSW May 21, 2013
34
Multiplicity is a complicated issue in
the safety context
• Often have low power, lack of a priori definitions, and
extraneous variability
• Value in trying not to miss a safety signal, but
remember that initial detection is not the same as
proving that a given AE is definitively related to a
given drug
• Worry about reducing false negative findings in drug
safety given the known limitations of our tools
MBSW May 21, 2013
35
Q5: WHAT IS HETEROGENEITY
AND WHAT ARE SOURCES OF
HETEROGENEITY?
MBSW May 21, 2013
36
• Heterogeneity refers to differences among studies
and/or study results.
• Can be classified in 3 ways: clinical, methodological
and statistical.
MBSW May 21, 2013
37
Clinical Heterogeneity
Differences among trials in their
• Patient selection (e.g., disease conditions under
investigation, eligibility criteria, patient
characteristics, or geographic differences)
MBSW May 21, 2013
38
Clinical Heterogeneity
Differences among trials in their
• Interventions (e.g., duration, dosing, nature of
the control)
• Outcomes (e.g., definitions of endpoints, followup duration, cut-off points for scales)
MBSW May 21, 2013
39
Methodological Heterogeneity
Differences in
• Study design (e.g., the mechanism of randomization).
• Study conduct (e.g., allocation concealment, blinding,
extent and handling of withdrawals and loss to follow up,
or analysis methods).
Decisions about what constitutes clinical
heterogeneity and methodological heterogeneity do
not involve any calculation and are based on
judgment.
MBSW May 21, 2013
40
Statistical heterogeneity
• Numerical variability in results, beyond expected
by sampling variability
May be caused by
• Known (or unknown) clinical and methodological
differences among trials
• Chance
MBSW May 21, 2013
41
Hypothetical example
MBSW May 21, 2013
42
• Clinical heterogeneity may not always result in
statistical heterogeneity.
• If there is clinical heterogeneity but little variation in
study results, may represent robust, generalizable
treatment effects.
MBSW May 21, 2013
43
Beware of Q
(unless you are James Bond)
• Cochran’s Q is a global test of heterogeneity
• I2 is a measure of global heterogeneity
• KEY POINT: They are informative, but rely on neither of
these statistics
• Apparent lack of overall heterogeneity does not rule out a
specific source of heterogeneity
• Conversely, large studies with clinically small variability
can yield spuriously high statistical heterogeneity
MBSW May 21, 2013
44
Q6: IS IT SUFFICIENT TO USE
FIXED-EFFECTS MODELS WHEN
COMBINING STUDIES OR DO WE
NEED TO CONSIDER RANDOMEFFECTS MODELS?
MBSW May 21, 2013
45
Fixed-effect vs. random-effects
• Fixed = common effect across all studies
•
•
Inference is to the studies at hand
Reasonable to expect (?) when designs and populations
are similar across studies
• Random-effects models: true underlying
population effects differ from study to study and
that the true individual study effects follow a
statistical distribution
•
The analytic goal is then to estimate the overall mean
and variance of the distribution of true study effects
MBSW May 21, 2013
46
More on FE vs. RE
• In some situations, it may not be appropriate to
produce a single overall treatment-effect estimate
• Goal should sometimes (often) be to model and
understand sources of heterogeneity
MBSW May 21, 2013
47
More points on FE vs. RE
• Risk differences more heterogeneous than odds ratios
(OR) or relative risks (RR, a point that is also made in an
FDA’s draft guidance for industry on noninferiority trials)
• Can model on OR scale then convert to RD or RR to help
with clinical interpretability
• Constant OR implies effect size must vary for RD, so must decide whether to estimate the baseline (control)
event rate from the external data or from the data
included in the actual meta-analysis (implications for
variance estimation)
MBSW May 21, 2013
48
How to decide on FE or RE?
• Do you expect a common effect or not?
•
•
Single indication, similar protocols, same data collection
methods, definitions, etc., FE likely to be appropriate.
Different populations, etc., use RE but ALSO explore
sources of heterogeneity
• Enough data?
•
Sparse data, few studies, may not permit RE estimation
• Small studies may get “up-weighted” with RE:
are small study results systematically different?
MBSW May 21, 2013
49
Once you go Bayesian, you’ll
never go back
• Specify a prior probability distribution
• Today’s posterior becomes tomorrow’s prior
• Flexibility to deal with heterogeneity through complex
modeling
• Available under both FE and RE (use Deviance
Information Criterion to decide?)
• Bayesian inferences are based on the full ‘exact’
posterior distributions (so useful for small numbers of
events)
MBSW May 21, 2013
50
For more details …
MBSW May 21, 2013
51
Concluding Remarks
• Meta-analysis increasingly used to address safety
concerns in drug development.
• Up-front thought allows teams to improve planning
and enhance data capture, and enhances
transparency and interpretation of the results.
MBSW May 21, 2013
52
Additional References
• Christy Chuang-Stein, and Mohan Beltangady. Reporting
cumulative proportion of subjects with an adverse event based
on data from multiple studies. Pharmaceut. Statist. 2010
• Crowe, Xia, Berlin et al. Recommendations for safety planning,
data collection, evaluation and reporting during drug, biologic
and vaccine development: a report of the safety planning,
evaluation, and reporting team. Clin Trials 2009; 6 430-440
• Lièvre, Cucherat and Leizorovicz. Pooling, meta-analysis, and the
evaluation of drug safety. Current Controlled Trials in
Cardiovascular Medicine 2002
• Olkin I, Sampson A. Comparison of meta-analysis versus
analysis of variance of individual patient data. Biometrics. Mar
1998;54(1):317-322.
MBSW May 21, 2013
53
Download