Merits and Limitations of Meta-analyses

advertisement
Merits and Limitations of Meta-analyses
Daren K. Heyland, M.D.
Kingston General Hospital, Angada 3, Kingston Ontario, K7L 2FF, Canada
phone +1-613-549-6666, fax +1-613-548-2577, e-mail: dkh2@post.queensu.ca
Learning Objectives
 To describe the methods used by systematic reviews to limit bias
 To list the assumptions inherent in comparing randomized trials and meta-analyses
 To list the criteria used to evaluate the validity of meta-analysis
 To list the various ways of handling heterogeneity in a meta-analysis
Systematic Reviews
All reviews, narrative and systematic, are retrospective, observational research studies and
therefore, are subject to bias. Systematic reviews differ from narrative reviews in that they
employ various methods to limit or minimize bias in the review process. These methods include a comprehensive search of all potentially relevant articles, the use of explicit, reproducible
criteria to select articles for inclusion into the review process, quality assessment of the data
incorporated into the review and a transparent, reproducible process for abstracting data. The
key features of both narrative and systematic reviews are outlined in Table 1.
Meta-analyses are a subset of systematic reviews that employ statistical strategies to aggregate the results of individual studies into a numerical estimate of overall treatment effect.
Systematic reviews are less likely to be biased, more likely to be up-to-date and more likely to
detect small but clinically meaningful treatment effects than narrative reviews [1,2]. For example it has been shown that narrative reviews may lag behind by more than a decade in endorsing a treatment of proven effectiveness or they may continue to advocate therapy long
after it is considered harmful or useless [1].
Controversy Between Randomized Controlled Trials and Meta-analyses
Recently, meta-analyses have been coming under increasing scrutiny [3]. There has been increasing uncertainty regarding the validity of meta-analyses stemming from comparison of the
results of large randomized trials and meta-analyses on the same topic. Such uncertainty
likely deters both clinicians and decision makers from incorporating the results of meta-analyses into their decision making. While cautious interpretation of meta-analyses (and randomized trials!) is a good ‘rule of thumb’, discriminating against meta-analyses is not supported by
current evidence, as I will explain below.
LeLorier and colleagues compared the results of 12 large randomized trials (>1000 patients)
published in four leading medical journals to 19 meta-analysis previously published [4]. They
found significant differences in the results of these comparisons and concluded, “...if there
had been no subsequent randomized trial, the meta-analysis would have led to the adoption
of an ineffective treatment in 32% of cases and rejection of useful treatment in 33 percent of
cases.” The person writing the accompanying editorial in the New England Journal of Medicine said, “.. I still prefer conventional narrative reviews”[5]. It is unclear to me why this author
would have concluded that narrative reviews are superior to systematic reviews. The argument here is with meta-analysis, the practice of reducing the measure of effect into one number, not with systematic reviews.
Feature
Narrative Review
Systematic
Review
Question
No specific question,
usually broad in
scope
Not usually specified,
potentially biased
Focused clinical
question
Selection
Not usually specified,
potentially biased
Criterion-based
selection
Appraisal
Variable
Rigorous critical
appraisal
Synthesis
Qualitative
Qualitative +
Quantitative
Inferences
Sometimes
evidence-based
Evidence-based
Search
Comprehensive,
Explicit strategy
Table 1: systematic and Narrative Reviews.
From: Cook DJ, Mulrow C, Haynes RB. Systematic reviews: synthesis of best evidence for clinical decisions. Ann Intern Med
1997; 126: 376-80
Subsequent letters to the editor have also refuted the findings of LeLorier and colleagues
based on the fact that their methods may have inflated the measure of disagreement [6]. By
selecting 12 trials from four leading medical journals, the authors clearly have a non-representative sample of clinical trials. Such journals tend to publish trials whose results disagree
with prior evidence. In addition, the authors based their agreement statistics on the presence
or absence of a statistically significant p value and ignored the fact that the point estimates
may have been similar (although the confidence intervals may be different). In addition, the
credibility of this work is undermined by the fact that the authors seemed to be selective in
their choice of comparators. For example, the authors cite discordance between the 1993
EMERAS trial and a 1985 meta-analysis of thrombolysis for AMI. As suggested by David
Naylor, perhaps a more valid comparison would be ISIS-2 (a more definitive test of the
hypothesis generated from the 1985 meta-analysis) or a 1994 meta-analysis that used individual patient data from all trials of thrombolysis that had more than 1000 patients [7]. Others
have challenged the notion that large randomized trials are the gold standard to which the re-
sults of meta-analyses should be compared [5]. Potential biases occur in both randomized
trials and meta-analyses and neither should be considered the gold standard in the absence
of rigorous assessment of the methodologic quality. Finally, it is not surprising that the results
of meta-analyses differ from the results of randomized trials; they may be measuring different
things. Given the variable characteristics of patients, interventions, outcomes and methods
across studies included in a meta-analyses, discrepancies with large trials should be expected [5].
By carefully exploring the discordant results of meta-analyses and randomized trials, one can
gain important insights into the treatment effect of the study under investigation. For example,
a recent meta-analysis suggested that calcium supplementation may prevent pre-clampsia
[8]. A subsequent large, NIH-sponsored, clinical trial concluded that there was no treatment
effect in healthy nulliparous women. DerSimonian and colleagues explored the heterogeneity
across studies in the meta-analysis and the inconsistent results with the large trial. They
stratified studies in the meta-analysis according to the baseline risk of preeclampsia (event
rate in placebo-group). When they divided studies in the meta-analyses into studies of low
and high risk patients, it was apparent that there was no treatment effect in low risk patients,
consistent with the large randomized trial of low risk women, but there still was a significant
treatment effect in the high risk patients.
In summary, bias can exist in both randomized trials and meta-analysis. Previous comparisons of the results of meta-analysis and randomized trials on the same topic have demonstrated concordance in 80-90% of cases [9,10]. Not surprisingly, on some occasions, discordance is present. By exploring the discrepancy between meta-analyses and randomized
trials, one can gain important insights into the effect of interventions.
Validity and Generalizability of Systematic Reviews
While systematic reviews are useful tools to discern whether interventions are efficacious or
not, they still need to be evaluated for their methodological quality. Not all systematic reviews
(or randomized trials) are created (or completed) equally. What distinguishes a high quality
systematic review from a low quality review? A recent publication in Critical Care Medicine
outlined key criteria that need to be considered when evaluating the strength of review articles
[11]. To assess the validity of systematic reviews (or meta-analyses if a quantitative summary
is provided), one needs to consider whether a comprehensive search strategy was employed
to find all relevant articles, whether the validity of the original articles was appraised, whether
study selection, validity assessments and data abstraction were done in a reproducible
fashion and whether the results were similar from study to study (statistical homogeneity).
Obviously, we can make stronger inferences from studies that employ more rigorous methods
(see Figure).
One of the weaknesses of randomized trials is their limited generalizability. Because metaanalyses combine many studies with subtle differences in patients, settings, interventions,
etc., provided it is clinically reasonable to combine these studies and no statistical heterogeneity is present, the results of meta-analyses have greater generalizability than a single
randomized trial. For example, a randomized trial of parenteral nutrition in surgical patients at
Veteran Affairs hospitals in the USA (predominantly white males) would have limited general-
Making Inferences from Meta-analyses of Randomized Trials
Weaker Inferences
o
o
o
o
Selective search
Small number of trials
Weak trial methodology
Outdated / unmeasured
co-interventions
o Surrogate endpoints
o Statistical heterogeneity
Stronger Inferences




Comprehensive search
Large number of trials
Strong trial methodology
Current / documented
co-interventions
 Clinically important endpoints
 Statistical homogeneity
izability. A meta-analysis of parenteral nutrition (TPN) in the critically ill patient combines the
results of 26 studies of different patients in different settings and therefore offers the best estimate of treatment effect that is generalizable to a broader range of patients. This is consistent with the perspectives of decision makers who are more concerned with the effects of
health care on groups of patients, rather than the individual.
Strategies to Handle Heterogeneity in a Meta-analysis
One of the main objectives of meta-analysis is to combine “similar” studies to determine the
best estimate of overall treatment effect. The question, “Are these studies ‘similar’” needs to
be asked from a clinical and statistical perspective. It has to make clinical sense to combine
several studies and to exclude others. For example, it may be quite inappropriate to combine
studies of nutritional interventions in patient populations as diverse as neonates, obese adults
and patients undergoing bone marrow transplantation. Whereas, it may make sense to combine studies of adult patients undergoing major elective surgery with studies of critically ill
adult patients in that they share similarities in their metabolic response to injury.
Statistical tests of homogeneity (or heterogeneity) ask the question, “Are the results of the
various included studies similar (or different) to each other?” If either clinical or statistical
heterogeneity is present, it weakens, if not invalidates, any inferences from the overall estimate of treatment effect. Then the goal of further statistical testing is to try and explain why
such differences occur. Strategies to deal with heterogeneity include: 1) exclude studies from
the meta-analysis that, on the basis of clinical judgement, appear to be different, 2) metaregression techniques, and most commonly, 3) subgroup analyses.
Subgroup analyses should be specified a priori and are based on differences in baseline
patient or treatment characteristics and not events that occur post-randomization [12]. There
is considerable debate whether the results of subgroups confirm hypotheses or should be
viewed as hypothesis generating exercises. There are several criteria that one can assess to
establish the strength of inference from subgroup analysis [13]. Because of the heterogeneity
across studies and their typically larger sample size, systematic reviews can provide insights
into important subgroup effects. For example, in a recent meta-analysis of TPN in the critically
ill patient, there were 26 randomized trials of 2.211 patients that compared the use of TPN to
standard care (usual oral diet plus intravenous dextrose] in surgical and critically ill patients
[14]. Overall, when the results of these trials were aggregated, there was no difference between the two treatments with respect to mortality (risk ratio= 1.03, 95% confidence intervals,
0.81-1.31). There was a trend to a lower total complication rate in patients who received TPN
although this result was not statistically significant (risk ratio = 0.84, 95% confidence intervals,
0.64-1.09). However, heterogeneity across studies precluded strong inferences based on the
aggregated estimate and therefore, several a priori hypotheses were examined.
Studies including only malnourished patients were associated with lower complication rates
but no difference in mortality when compared to studies of non-malnourished patients.
Studies published since 1989 and studies with a higher methods score showed no treatment
effect while studies published in 1988 or before and studies with a lower methods score
demonstrated a significant treatment effect. Complication rates were lower in studies that did
not use lipids; however, there was no difference in mortality rates between studies that did
use lipids and those that did not. Studies limited to critically ill patients demonstrated a significant increase in complication and mortality rates compared to studies of surgical patients.
While we had set out to summarize the effect of TPN in critically ill patients, only six studies
included patients that would routinely be admitted to the ICU as part of their management.
Inasmuch as surgical patients and ICU patients have a similar stress response to illness, it
was considered to be reasonable to aggregate studies of surgical and critically ill patients.
However, the results of our subgroup analysis suggest that both mortality and complication
rates may be increased in critically ill patients receiving TPN and these treatment effects may
differ from the results in surgical patients. The results of studies evaluating the effect of TPN
in surgical patients therefore may not be generalizable to all types of critically ill patients. This
leaves a very limited data set from which to base the practice of providing TPN to critically ill
patients and the best estimate to date is that TPN may be doing more harm than good. These
subgroup findings have important implications for designing future studies. Obviously, the
studies would need to be well designed. It would also appear that elective surgical patients
should not be combined with critically ill patients and, for short-term administration, lipids
could be omitted.
Conclusions
In the last decade, we have seen a proliferation in the number of published systematic
reviews. Systematic reviews are an efficient way to synthesize current knowledge on topics of
relevance. Systematic reviews provide the best estimate of overall treatment effect that is
generalizable to the broadest range of individuals. Differences between the results of randomized trials and meta-analyses are to be expected and by exploring these differences, further insights into the effectiveness of interventions can be gained. To assess the validity of
systematic reviews (or meta-analyses if a quantitative summary is provided), one needs to
consider whether a comprehensive search strategy was employed to find all relevant articles,
whether the validity of the original articles was appraised, whether study selection, validity
assessments and data abstraction were done in a reproducible fashion and whether the
results were similar from study to study. Obviously, one can make stronger inferences from
meta-analyses that employ more rigorous methods. Moreover, because of the heterogeneity
present in meta-analyses and their large sample sizes, they offer informative subgroup analysis. Systematic reviews and meta-analysis are an important research tool in illuminating
effectiveness of our therapeutic interventions.
References
1. Antman EM, Lau J, Kupelnick B, Mosteller F, Chalmers TC. A comparison of results of meta-analyses of randomized
control trials and recommendations of clinical experts. Treatments for myocardial infarction. J Am Med Assoc 1992; 268:
240-8
2. Cooper HM, Rosenthal R. Statistical versus traditional procedures for summarizing research findings. Psychol Bull 1980;
87: 442-9
3. Editorial. Meta-analysis under scrutiny. Lancet 1997; 350: 675.
4. LeLorier J, Gregoire G, Benhaddad A, Lapierre J, Derderian F. Discrepancies between meta-analyses and subsequent
large randomized trials. N Engl J Med 1997; 337: 536-42
5. Bailar JC. The promise and problems of meta-analysis. N Engl J Med 1997; 337: 559-561
6. Ioannidis JPA, Cappeleri JC, Lau J. Meta-analyses and large randomized trials (letter to editor) N Engl J Med 1998; 338:
59
7. Naylor D. Meta-analysis and the meta epidemiology of clinical research. Br Med J 1997; 315: 617-19.
8. DerSimonian R, Levine RJ. Resolving discrepancies between a meta-analysis and a subsequent large controlled trial. J
Am Med Assoc 1999; 282: 665-670
9. Capelleri JC, Ioannidis JPA, Schmid CH, de Ferranti SD, Aubert M, Chalmers TC, Lau J. Large trials vs meta-analysis of
smaller trials: How do they compare? J Am Med Assoc 1996; 276: 1332-1338
10. Villar J, Carroli G, Belizan JM. Predictive ability of meta-analyses of randomized controlled trials. Lancet 1995; 345: 77276
11. DJ, Levy MM, Heyland DK. For the Evidence-Based Medicine in Critical Care Working Group. How to Use a Review
Article: Prophylactic endoscopic sclerotherapy for esophageal varices. Crit Care Med 1998: 26; 692-700
12. Yusuf S, Wittes J, Probstfiel J, Tyroler HA. Analysis and interpretation of treatment effects in subgroups of patients in
randomized clinical trials. J Am Med Assoc 1991; 266: 93-98
13. Oxman AD, Guyatt GH. Apples, oranges and fish: A consumer’s guide to subgroup analyses. Ann Intern Med 1992; 116:
78-84
14. Heyland DK, MacDonald S, Keefe L, Drover JW. Total parenteral nutrition in the critically ill patient: A meta-analysis. J
Am Med Assoc 1998; 280: 2013-2019
Download