Meta-analyses: what is heterogeneity?

advertisement
Meta-analyses: what is heterogeneity?
BMJ 2015; 350 doi: http://dx.doi.org/10.1136/bmj.h1435 (Published 16 March 2015) Cite this
as: BMJ 2015;350:h1435
1. Philip Sedgwick, reader in medical statistics and medical education 1Author
affiliationsp.sedgwick@sgul.ac.uk
Researchers undertook a meta-analysis to evaluate the effectiveness of multifactorial assessment
and intervention programmes in preventing falls and injuries among older people. Randomised
or quasi-randomised trials that evaluated interventions to prevent falls and injuries were
included. The intervention had to be delivered to individual patients, not at a community or
population level. It also had to be service based in an emergency department, primary care, or the
community. Control groups could receive standard care or no fall prevention. The outcomes
included the number of fallers and fall related injuries.1
In total 19 trials were identified. Of these, eight reported fall related injuries. When combined
across trials, the risk for fall related injuries was reduced after the intervention compared with the
control, but not significantly (relative risk 0.90, 95% confidence interval 0.68 to 1.20). Tests of
statistical heterogeneity for the meta-analysis of fall related injuries gave the following results:
χ2=15.77, degrees of freedom=7, P=0.03 (Cochran’s Q test), I2=55.6% (Higgins’s I2 test
statistic). Subgroup analyses using a test of interaction based on Cochran’s Q test were
subsequently performed. The resulting P values were: P=0.75 for site of delivery (hospital v
community); P=0.75 for whether a doctor was included in the team (yes v no); and P=0.52 for
whether trial participants had been selected because they were at high risk of falls (yes v no).
The study concluded that there was limited evidence that multifactorial fall prevention
programmes in primary care, community, or emergency care settings were effective in reducing
the number of fallers or fall related injuries.
Which of the following statements, if any, are true for the meta-analysis of fall related injuries?




a) The presence of statistical heterogeneity would be indicative of variation between trials
in the magnitude or direction of the sample estimates of the relative risk of fall related
injuries
b) The result of Cochran’s Q test indicated that heterogeneity existed between the sample
estimates
c) Higgins’s I2 test statistic indicated that homogeneity existed between the sample
estimates
d) Any statistical heterogeneity in the overall meta-analysis of fall related injuries was
not explained by the subgroup analyses
Answers
Statements a, b, and d are true, whereas c is false.
A total of eight trials reported fall related injuries. For each trial a sample estimate of the
population parameter of the relative risk of fall related injuries after multifactorial assessment
and intervention programmes compared with control was obtained. The aim of the meta-analysis
was to combine the sample estimates from the eight trials and provide a single estimate of the
population parameter. By combining the sample estimates, the meta-analysis reduced the
evidence to a manageable quantity. The figure⇓ shows the forest plot for the meta-analysis. The
interpretation of a forest plot has been described in a previous question.2
Forest plot of the sample estimates of the relative risk of fall related injuries after multifactorial
assessment and intervention programmes compared with the control
The combined relative risk of fall related injuries after the intervention versus the control was
0.90 (95% confidence interval: 0.68 to 1.20). Therefore, although the risk of fall related injuries
was reduced after the intervention, the difference was not significant. It was essential that the
meta-analysis incorporated a statistical test of heterogeneity. The purpose of this test was to
assess the extent of variation between the sample estimates. Heterogeneity would exist if the
sample estimates for the population relative risk were of different magnitudes or had the opposite
direction of effect (a is true). Conversely, if homogeneity existed the estimates would be of a
similar magnitude and direction. If heterogeneity existed, it would influence how the total overall
estimate was calculated, as described below. Furthermore, it would be sensible to explore the
potential sources of heterogeneity. For example, heterogeneity would occur if the effect of the
intervention differed between the sites where it was delivered (hospital v community). If so, it
would be useful to estimate the effect of the intervention separately for each of the site of
intervention subgroups. Otherwise, the results of the meta-analysis might be misleading
regarding the effectiveness of the intervention and might be detrimental to future patient care.
The most routinely used tests for statistical heterogeneity are Cochran’s Q test and Higgins’s
I2 test statistic. Cochran’s Q is the traditional test of heterogeneity and is based on the χ2 test,
which has been described in a previous question.3 The statistical test of heterogeneity using
Cochran’s Q test is carried out in a similar way to traditional statistical hypothesis testing, with a
null hypothesis and an alternative hypothesis. The null hypothesis states that homogeneity exists
between the sample estimates of the population parameter across the trials, and any variation
between them is no more than would be expected when taking samples from the same
population—that is, any variation between them is a result of sampling error. The alternative
hypothesis states that heterogeneity exists between the sample estimates. The results for the test
of heterogeneity for the meta-analysis of fall related injuries are displayed towards the bottom of
the forest plot in the line “Test for heterogeneity: χ2=15.77, df=7, P=0.03, I2=55.6%.” The first
three statistics are the χ2 test statistic, degrees of freedom (df), and P value resulting from
Cochran’s Q test of the statistical hypotheses described above. The I2 statistic refers to Higgins’s
I2 test described below. The resulting P value for Cochran’s Q test was 0.03, and because it was
smaller than the traditional critical level of significance of 0.05 (5%), the null hypothesis was
rejected in favour of the alternative, with the conclusion that heterogeneity existed between the
sample estimates (b is true).
Cochran’s Q test is not very accurate; it is conservative and often fails to detect heterogeneity
in the sample estimates. Therefore, a critical level of significance of 0.10 (10%) is often chosen
rather than the traditional one of 0.05 (5%). Because of the lack of accuracy of Cochran’s Q test,
Higgins’s I2 test statistic is often used as an additional test of heterogeneity. Higgins’s I2 test
statistic represents the proportion of variation between the sample estimates that is due to
heterogeneity rather than to sampling error. Values can range from 0% to 100%, with 0%
indicating that statistical homogeneity exists and 100% indicating that statistical heterogeneity
exists. It has been suggested that the adjectives low, moderate, and high (heterogeneity) be
assigned to I2 values of 25%, 50%, and 75%. In general, significant heterogeneity is considered
to be present if I2 is 50% or more. In the above meta-analysis, I2 was reported as 55.6%,
indicating the presence of significant heterogeneity (c is false) and confirming the result of
Cochran’s Q test.
The test of statistical heterogeneity influenced how the combined overall estimate of the
relative risk for fall related injuries after the intervention compared with the control was
obtained. The presence of heterogeneity between the sample estimates indicated that “random
effects” methods should be used to derive the combined overall estimate of the treatment effect.
If homogeneity had existed, “fixed effects” methods would have been used. A meta-analysis
incorporating random effects methodology produces a wider confidence interval for the
combined overall effect than one in which fixed effects methodology is used, resulting in a less
accurate estimate of the effect of the intervention. The reduced accuracy of the combined overall
effect of the intervention reflected the heterogeneity between the sample estimates.
The researchers undertook a subgroup analysis to investigate the heterogeneity between the
sample estimates. The aim was to establish whether the heterogeneity between the eight trials in
the sample estimates of the relative risk of fall related injuries could be explained by differences
between the trials—for example, in how the intervention was delivered or participants recruited.
If so, the effects of the intervention might vary according to how the intervention is delivered or
between subgroups of patients, which would have implications for future care. The researchers
chose the factors that were thought to be important in explaining the observed heterogeneity
between the sample estimates. These subgroups were based on three factors—site of delivery
(hospital v community), whether a doctor was included in the team (yes v no), and whether trial
participants had been selected because they were at high risk of falls (yes v no).
For each factor subgroup, a separate meta-analysis of injury related falls was performed and a
subtotal estimate for the effect of the intervention derived by combining the sample estimates
across trials within the subgroup. A test of heterogeneity was undertaken for the meta-analysis
within each subgroup, although the results were not presented. The subtotal estimates for the
effects of the intervention and tests of heterogeneity can be compared between subgroups for a
factor, although if done visually this should be done informally only. In particular, the lack of
statistical significance for the test of heterogeneity within each subgroup does not indicate that
the subgroups within the factor explain the statistical heterogeneity observed overall. It may be
misleading to compare the results of the tests of statistical heterogeneity between factor
subgroups because the subgroups may not have sufficient statistical power with respect to the
numbers of trials and participants to detect heterogeneity.
To formally investigate heterogeneity across the subgroups of a factor, the subtotal estimates
for the effects of the intervention in the subgroups were compared by a test of interaction. For
example, the test of interaction for the factor site of delivery of intervention (hospital v
community) investigated whether the effect of the intervention on the risk of fall related injuries
varied between the subgroups. Interaction is sometimes referred to as effect modification. In a
meta-analysis, interaction is investigated using Cochran’s Q test or Higgins’ I2, or both. These
tests involve comparing the subtotal estimates between the subgroups. Cochran’s Q test provides
a test of the null hypothesis that homogeneity exists between the subgroups in the subtotal
estimates of the population parameter—that is, any variation between the subgroups is no more
than would be expected as a result of sampling error. Higgins’s I2 test statistic measures the
proportion of total variation between the subgroups in the subtotal estimates that is due to
heterogeneity rather than to sampling error. The test of interaction contrasts with the test of
heterogeneity in the meta-analysis that combines all the trials, where Cochran’s Q and Higgins’s
I2 were used to compare the sample estimates of the treatment effect across all of the trials.
A test of interaction was undertaken for each factor—site of delivery, whether a doctor was
included in the team, and whether participants were selected because they were at high risk of
falls. For each factor, the researchers performed Cochran’s Q test to test for interaction between
the subgroups. The results of the test of interaction were χ2=0.1, P=0.75 for the site of delivery
(hospital v community); χ2=0.1, P=0.75 for whether a doctor was included in the team (yes v no);
and χ2=0.42, P=0.52 for whether trial participants had been selected because they were at high
risk of falls (yes v no). In none of these tests of interaction was the P value less than the
traditional critical level of significance of 0.05 (5%). Therefore, the statistical heterogeneity in
the overall meta-analysis of fall related injuries was not explained by the subgroup analyses (d is
true). The researchers commented that because the studies were carried out in several countries,
differences between the populations or healthcare systems might have contributed to the
heterogeneity. Methodological differences between trials may also contribute to statistical
heterogeneity. In particular, the meta-analysis above included trials that were randomised or
quasi-randomised and therefore of variable methodological quality. A quasi-randomised trial
uses methods of allocating participants to treatment groups that are not truly random—for
example, alternate allocation.
Caution is needed when interpreting findings from subgroup analyses that use tests of
interaction. The results may be misleading because the analyses are observational and not based
on comparisons between randomised groups of patients, and therefore prone to confounding.
1. ↵Gates S, Fisher JD, Cooke MW, et al. Multifactorial assessment and targeted
intervention for preventing falls and injuries among older people in community and
emergency care settings: systematic review and meta-analysis. BMJ2008;336:130.
2. ↵Sedgwick P. How to read a forest plot. BMJ2012;345:e8335.
3. ↵Sedgwick P. Statistical tests for independent groups: categorical data.
BMJ2012;344:e344.
Download