Meta-Analysis of Clinical Data for Regulated Biopharmaceutical Products: Answers to Frequently Asked Questions Brenda Crowe, Research Advisor, Eli Lilly and Company With special thanks to Jesse Berlin Midwest Biopharmaceutical statistics workshop May 21, 2013 Disclaimer • The views expressed herein represent those of the presenter and do not necessarily represent the views or practices of the presenter’s employer or any other party MBSW May 21, 2013 2 Acknowledgements Jesse Berlin Amy Xia Juergen Kuebler Ed Whalen Carol Koro MBSW May 21, 2013 3 Agenda • Background • The 6 questions – – – – – – What studies should be pooled/combined? Method of ascertainment? Individual patient data (vs. aggregate patient data)? Multiple looks and/or multiple endpoints? Heterogeneity of design and results? Fixed-effect models or random-effects models? • Concluding remarks MBSW May 21, 2013 4 Background • During drug development, sponsors need to recognize safety signals early and adjust the development program accordingly • Crowe et al. (SPERT): overview of the framework and planning of MA in drug development but did not provide details regarding practical issues arising during implementation. • Focus here on common analytical topics (6 questions) • Emphasis on situations that arise in drug development, mostly premarketing SPERT = Safety Planning, evaluation and Reporting Team MBSW May 21, 2013 5 A little vocabulary (in today’s context) • POOL (noun): a grouping of studies used to address a specific research question • Swimming in data (avoid drowning) MBSW May 21, 2013 6 Q1: WHAT STUDIES SHOULD BE POOLED IN THE METAANALYSIS? MBSW May 21, 2013 7 Existing Guidance FDA guidance on premarketing risk assessment MBSW May 21, 2013 8 Existing Guidance International Conference on Harmonization (ICH) M4E MBSW May 21, 2013 9 Existing Guidance Council for International Organizations of Medical Sciences VI (CIOMS VI) report MBSW May 21, 2013 10 What to pool? • Decisions on what to combine depend on the specific questions to be answered (duh) • Often there are several questions and these might require different subsets of studies or subjects MBSW May 21, 2013 11 Pools may be based on • • • • Type of control: placebo vs. active Dose route or regimen Concomitant (background) therapy Methods of eliciting adverse events (e.g., active vs. passive). • Disease state • Duration of treatment (and follow-up?) • Subgroups of patients based on age groups, geographies, ethnicity groups, or severity of disease, etc. MBSW May 21, 2013 12 Table to help pick the right studies MBSW May 21, 2013 13 Considerations for inclusion in a pool Usually exclude • Phase 1 pharmacokinetic and pharmacodynamic studies (because short duration, healthy subjects or patients with incurable end-stage disease). • Studies that cannot / will not provide individual patient level data if required for analysis. MBSW May 21, 2013 14 Considerations for inclusion in a pool • It is generally most appropriate to combine data from studies that are similar. • Strong similarity is not required for pooling, if the effects of treatment don’t depend on the trial characteristics being considered. MBSW May 21, 2013 15 For example . . . • Suppose some studies (or arms) were conducted at a higher dose than the sponsor is proposing for the marketing label. Would you exclude those arms from the analysis? • Yes, if the goal for those analyses is to characterize adverse events from proposed indications at the proposed doses. • However, one might choose to combine the high-dose studies or arms in a different pool to help assess what could happen in an overdose situation. MBSW May 21, 2013 16 Studies (or arms) at a higher or lower dose than proposed for marketing? • In general, exclude dose arms that are lower than the proposed dose for marketing, as these may dilute the effects seen at the higher marketed dose – However, events may occur in the lower dose studies that should not be ignored – Including low-dose and high-dose studies may help understand the dose-response relationship MBSW May 21, 2013 17 AEs in all those who took the drug? • Can analyze ALL who took drug as a single cohort without a comparator group: useful for accounting for all events and estimating event rates for infrequent events • Can then be compared to external reference population rates • However, external population rates limited by the availability of event rates for a specific subset of the population that is comparable to the trial population – If the underlying disease increases the risk of a particular event, comparisons with an external reference could be biased against the study drug. – Conversely, if enrollment criteria are such that high-risk patients are excluded from trials, the on-study rates could appear to be artificially low. MBSW May 21, 2013 18 Hypothesis generating studies? • What if a safety signal was detected in Phase 2 that resulted in a change in ascertainment of an AE in Phase 3 (e.g., an adjudication process, special case report form)? • Create a grouping of Phase 3 studies designed for that particular event • Advantages • Studies with consistent ascertainment analyzed together • Excludes studies that generated the hypothesis being tested MBSW May 21, 2013 19 Hypothesis generating studies (cont.) • Previous addresses type I error but • • sacrifices statistical power discards data from what may be studies in a closely monitored population, which may also be at differential risk due to exposure to the compound • And it can raise all kinds of red flags (so transparency is key – do the analysis with and without those studies) MBSW May 21, 2013 20 Caveats • Do not do a crude unstratified analysis that combines studies with a comparator and studies without a comparator. • Results can be very misleading. See Lièvre 2002, Chuang-Stein 2010 for further information on dangers of not stratifying. MBSW May 21, 2013 21 Q2: HOW DOES THE METHOD OF ASCERTAINMENT IMPACT THE QUALITY OF THE METAANALYSIS? MBSW May 21, 2013 22 Ascertainment method • Can affect observed event rates, e.g., actively solicited events will have higher reporting rates than passively collected events • E.g., for drugs that cross the blood–brain barrier, use prospective tool to assess suicidal ideation and behavior (vs. post hoc adjudication) MBSW May 21, 2013 23 Retrospective adjudication Even with strict criteria using previously collected data, bias could be introduced by retrospective adjudication – Important detailed clinical information may be missing • If post hoc adjudication is necessary, use an external, independent adjudication committee that – – Is masked to treatment assignment AND Adjudicates events across the entire development program MBSW May 21, 2013 24 Q3: WHAT ARE THE ADVANTAGES OF USING INDIVIDUAL PATIENT DATA (VS. AGGREGATE SUMMARIES)? MBSW May 21, 2013 25 Individual or aggregate-level data? • For many questions get same answer with IPD as with APD • For analyses that do not require patient-level data, including all relevant studies improves precision • May also reduce bias that could be introduced by limiting the analysis to those where patient-level data are available • However, there can be advantages to IPD • Much easier to detect interactions between treatment and patient-level characteristic with IPD than with APD MBSW May 21, 2013 26 Advantages of patient-level data • Allows mapping all data to a common version of MedDRA (or other) increasing consistency of terminology across trials • Generally permits creation of common variables across trials • • E.g., age categories may have been defined using different category boundaries Different threshold hemoglobin values may have been used to define ‘anemia’ MBSW May 21, 2013 27 More advantages of IPD • Allows specification of a common set of patient-level covariates so subgroup analyses across trials can be performed • Can define outcomes based on combinations of variables defining specific events but that may indicate a common mechanism, e.g., a combination of weight loss or appetite reduction MBSW May 21, 2013 28 And still more advantages of IPD • Post hoc analyses of outcomes that require adjudication can sometimes be derived, as in the case of suicide event grading according to Columbia Classification Algorithm of Suicide Assessment (CCASA criteria) • Creation of time-to-event variables (may not be available in publications) • Flexibility in defining time periods of interest for analyses, e.g., events occurring during “short-term” follow-up MBSW May 21, 2013 29 Why not always use IPD? • Integration required to provide the database is labor intensive, especially if done in retrospect • Sometimes summary statistics may be the only information available for some studies of interest, e.g., • • studies of a new therapeutic approach done by an academic group that does not share patient-level data, or the drug of interest may have been included as an active control by another sponsor MBSW May 21, 2013 30 Q4: SHOULD WE ADJUST FOR MULTIPLE LOOKS AND/OR MULTIPLE ENDPOINTS IN THE CONTEXT OF METAANALYSIS? MBSW May 21, 2013 31 Q4: Multiple comparisons • Complicated by having multiple looks over time and multiple (and an unknown number of) endpoints • Safety Planning, Evaluation, and Reporting Team (SPERT) defined “Tier 1 events” as those for which a prespecified hypothesis has been defined MBSW May 21, 2013 32 Tier 1 Events • E.g., to rule out an effect of a certain magnitude for assessing a particular risk (a noninferiority test – as for diabetes drugs) • Generally, should consider performing formal adjustment for multiple looks for Tier 1 events and for multiple endpoints for other events MBSW May 21, 2013 33 Diabetes drugs • Need to rule out a relative risk of 1.8 (for CV events) for conditional approval, and 1.3 for final approval • Confidence level for that specific outcome may need to be adjusted for multiple looks, which can be considered separately from non-Tier 1 events because it needs to be met for the drug to move forward • An event of interest: important regardless of the specific side effect profile and • Analogous to a primary analysis in the efficacy setting MBSW May 21, 2013 34 Multiplicity is a complicated issue in the safety context • Often have low power, lack of a priori definitions, and extraneous variability • Value in trying not to miss a safety signal, but remember that initial detection is not the same as proving that a given AE is definitively related to a given drug • Worry about reducing false negative findings in drug safety given the known limitations of our tools MBSW May 21, 2013 35 Q5: WHAT IS HETEROGENEITY AND WHAT ARE SOURCES OF HETEROGENEITY? MBSW May 21, 2013 36 • Heterogeneity refers to differences among studies and/or study results. • Can be classified in 3 ways: clinical, methodological and statistical. MBSW May 21, 2013 37 Clinical Heterogeneity Differences among trials in their • Patient selection (e.g., disease conditions under investigation, eligibility criteria, patient characteristics, or geographic differences) MBSW May 21, 2013 38 Clinical Heterogeneity Differences among trials in their • Interventions (e.g., duration, dosing, nature of the control) • Outcomes (e.g., definitions of endpoints, followup duration, cut-off points for scales) MBSW May 21, 2013 39 Methodological Heterogeneity Differences in • Study design (e.g., the mechanism of randomization). • Study conduct (e.g., allocation concealment, blinding, extent and handling of withdrawals and loss to follow up, or analysis methods). Decisions about what constitutes clinical heterogeneity and methodological heterogeneity do not involve any calculation and are based on judgment. MBSW May 21, 2013 40 Statistical heterogeneity • Numerical variability in results, beyond expected by sampling variability May be caused by • Known (or unknown) clinical and methodological differences among trials • Chance MBSW May 21, 2013 41 Hypothetical example MBSW May 21, 2013 42 • Clinical heterogeneity may not always result in statistical heterogeneity. • If there is clinical heterogeneity but little variation in study results, may represent robust, generalizable treatment effects. MBSW May 21, 2013 43 Beware of Q (unless you are James Bond) • Cochran’s Q is a global test of heterogeneity • I2 is a measure of global heterogeneity • KEY POINT: They are informative, but rely on neither of these statistics • Apparent lack of overall heterogeneity does not rule out a specific source of heterogeneity • Conversely, large studies with clinically small variability can yield spuriously high statistical heterogeneity MBSW May 21, 2013 44 Q6: IS IT SUFFICIENT TO USE FIXED-EFFECTS MODELS WHEN COMBINING STUDIES OR DO WE NEED TO CONSIDER RANDOMEFFECTS MODELS? MBSW May 21, 2013 45 Fixed-effect vs. random-effects • Fixed = common effect across all studies • • Inference is to the studies at hand Reasonable to expect (?) when designs and populations are similar across studies • Random-effects models: true underlying population effects differ from study to study and that the true individual study effects follow a statistical distribution • The analytic goal is then to estimate the overall mean and variance of the distribution of true study effects MBSW May 21, 2013 46 More on FE vs. RE • In some situations, it may not be appropriate to produce a single overall treatment-effect estimate • Goal should sometimes (often) be to model and understand sources of heterogeneity MBSW May 21, 2013 47 More points on FE vs. RE • Risk differences more heterogeneous than odds ratios (OR) or relative risks (RR, a point that is also made in an FDA’s draft guidance for industry on noninferiority trials) • Can model on OR scale then convert to RD or RR to help with clinical interpretability • Constant OR implies effect size must vary for RD, so must decide whether to estimate the baseline (control) event rate from the external data or from the data included in the actual meta-analysis (implications for variance estimation) MBSW May 21, 2013 48 How to decide on FE or RE? • Do you expect a common effect or not? • • Single indication, similar protocols, same data collection methods, definitions, etc., FE likely to be appropriate. Different populations, etc., use RE but ALSO explore sources of heterogeneity • Enough data? • Sparse data, few studies, may not permit RE estimation • Small studies may get “up-weighted” with RE: are small study results systematically different? MBSW May 21, 2013 49 Once you go Bayesian, you’ll never go back • Specify a prior probability distribution • Today’s posterior becomes tomorrow’s prior • Flexibility to deal with heterogeneity through complex modeling • Available under both FE and RE (use Deviance Information Criterion to decide?) • Bayesian inferences are based on the full ‘exact’ posterior distributions (so useful for small numbers of events) MBSW May 21, 2013 50 For more details … MBSW May 21, 2013 51 Concluding Remarks • Meta-analysis increasingly used to address safety concerns in drug development. • Up-front thought allows teams to improve planning and enhance data capture, and enhances transparency and interpretation of the results. MBSW May 21, 2013 52 Additional References • Christy Chuang-Stein, and Mohan Beltangady. Reporting cumulative proportion of subjects with an adverse event based on data from multiple studies. Pharmaceut. Statist. 2010 • Crowe, Xia, Berlin et al. Recommendations for safety planning, data collection, evaluation and reporting during drug, biologic and vaccine development: a report of the safety planning, evaluation, and reporting team. Clin Trials 2009; 6 430-440 • Lièvre, Cucherat and Leizorovicz. Pooling, meta-analysis, and the evaluation of drug safety. Current Controlled Trials in Cardiovascular Medicine 2002 • Olkin I, Sampson A. Comparison of meta-analysis versus analysis of variance of individual patient data. Biometrics. Mar 1998;54(1):317-322. MBSW May 21, 2013 53