Two-stage individual participant data
meta-analysis
and flexible forest plots
David Fisher
MRC Clinical Trials Unit
Hub for Trials Methodology Research
at UCL
df@ctu.mrc.ac.uk
2013 UK Stata Users Group Meeting
Cass Business School, London
Outline of presentation
• Introduction to individual patient data (IPD) meta-analysis (MA)
• IPD vs aggregate-data (AD) MA
• “One-stage” vs “two-stage” IPD MA
• The ipdmetan command
• Basic use; comparison with metan
• Covariate interactions
• Combining AD with IPD
• Advanced syntax
• The forestplot command
• Interface with ipdmetan
• Stand-alone use and “stacking”
• Summary and Conclusion
Introduction to IPD meta-analysis
• Meta-analysis (MA):
• Use statistical methods to combine results of
“similar” trials to give a single estimate of effect
• Increase power & precision
• Assess whether treatment effects are similar in
across trials (heterogeneity)
• Aggregate data (AD) vs IPD:
• “Traditional” MAs gather results from publications
• Aggregated across all patients in the trial; nothing is
known of individual patients
• IPD MAs gather raw data from trial investigators
• Ensures all relevant patients are included
• Ensures similar analysis across all trials
• Allows more complex analysis, e.g. patient-level
interactions
“One-stage” IPD MA
• Consider a linear regression (extension to GLMs or timeto-event regressions is straightforward)
• For a one-stage IPD MA (i = trial, j = patient):
𝑦𝑖𝑗 = 𝛼𝑖 + 𝛽 + 𝑒𝑖 π‘₯𝑖𝑗
where αi = trial identifiers
β = overall treatment effect estimated across all trials i
(with optional random effect ui)
• Examples in Stata:
• Fixed effects: regress y x i.trial
• Random effects:
xtmixed y x i.trial || trial: x, nocons
“Two-stage” IPD MA
• For a two-stage IPD MA:
𝑦 1 𝑗 = 𝛼(1) + 𝛽(1) π‘₯
for trial 1
1 𝑗
• Then:
𝑖 𝑗
𝛽=
…
…
𝑦
= 𝛼(𝑖) + 𝛽(𝑖) π‘₯
𝑖 𝑀𝑖 𝛽(𝑖)
𝑖 𝑀𝑖
where
𝑖 𝑗
for trial i
and
1
𝑠𝑒 𝛽 =
𝑖 𝑀𝑖
𝑀𝑖 =
1
𝑠𝑒 𝛽(𝑖)
2
• Weights wi may be altered to give random effects
• e.g. DerSimonian & Laird, 𝑀𝑖 = 1 𝑠𝑒 𝛽(𝑖)
2
+ 𝜏2
• Straightforward, but currently messy in Stata
Treatment-covariate interactions
• Assessment of patient-level covariate interactions is a
great advantage of IPD
𝑦𝑖𝑗 = 𝛼𝑖 + 𝛽π‘₯𝑖𝑗 + 𝛾𝑧𝑖𝑗 + 𝛿π‘₯𝑖𝑗 𝑧𝑖𝑗
• Arguably best done with “one-stage”
• Main effects & interactions (& correlations) estimated
simultaneously
• But basic analysis also possible with “two-stage”
• Relative effect (interaction coefficient) only
• Same approach (inverse-variance) as for main effects
• Ensures no estimation bias from between-trial effects
• Can be presented in a forest plot, with assessment of
heterogeneity etc.
• Discussed in a published paper (Fisher 2011)
“One-stage” vs “two-stage”
Pros
One-stage
Two-stage
- All coeffs & correls
estimated
simultaneously
- Flexible & extendable
model structure
- Natural extension of AD MA
- Easily presentable in forest plots
- Applicable to any set of effect
estimates and SEs
(incl. interactions)
- Negligible difference to 1S in
most common scenarios
Cons - Requires more
statistical expertise
- Challenging in certain
situations, e.g.
random-effects with
time-to-event data
- Not a natural fit with
forest plots
- Only a single estimate can be
pooled, which limits complexity
(e.g. interactions)
- Theoretically inferior in (at
least) some scenarios
Example data
• IPD MA of randomised trials of post-operative
radiotherapy (PORT) in non-small cell lung cancer
• Trial ID (k=11)
• Patient ID (n=2343)
• Treatment arm
• Outcome is censored time to overall survival (death
from any cause)
• Time to event (from randomisation)
• Event type (death or censorship)
• Certain covariate measurements also available, not
necessarily for all trials or patients
• Disease stage (factor, but treat as continuous)
• (+ others)
ipdmetan syntax
Uses “prefix” command syntax:
ipdmetan [exp_list], study(study_ID) [ ipd_options
ad(aggregate_data_options) forestplot(forest_plot_options)]
: estimation_command ...
Example:
default is to pool coeffs from
first dep. var. (excluding
baseline factor levels)
ipdmetan,
study(trialid) eform
: stcox arm, strata(sex)
ipdmetan options after
comma, before colon
estimation_command and
options after colon
Trials included: 11
Patients included: 2342
Meta-analysis pooling of main (treatment) effect estimate arm
using Fixed-effects
Variable
label
-------------------------------------------------------------------trial reference
|
number
|
Effect
[95% Conf. Interval]
% Weight
----------------------+--------------------------------------------belgium
|
1.456
1.072
1.979
11.09
EORTC 08861
|
1.643
0.913
2.956
3.02
LILLE
|
1.568
1.060
2.319
6.81
...
...
...
...
...
----------------------+--------------------------------------------Overall effect
|
1.178
1.064
1.305
100.00
-------------------------------------------------------------------Test of overall effect = 1: z =
3.153
p = 0.002
Heterogeneity Measures
--------------------------------------------------|
value
df
p-value
---------------+----------------------------------Cochrane Q
|
15.88
10
0.103
I² (%)
|
37.0%
Modified H²
|
0.588
tau²
|
0.0180
---------------------------------------------------
Output style
similar to metan
or metaan
I² = between-study variance (tau²) as a percentage of total variance
Modified H² = ratio of tau² to typical within-study variance
Basic forest plot
trial
%
reference number
Effect (95% CI)
Weight
belgium
1.46 (1.07, 1.98)
11.09
LCSG 773
1.12 (0.83, 1.53)
11.13
CAMS
1.03 (0.77, 1.38)
12.20
MRC LU11
0.96 (0.74, 1.24)
16.00
EORTC 08861
1.64 (0.91, 2.96)
3.02
SLOVENIA
0.89 (0.54, 1.49)
3.97
LILLE
1.57 (1.06, 2.32)
6.81
GETCB 04CB86
1.14 (0.80, 1.62)
8.48
GETCB 05CB86
1.44 (1.13, 1.83)
17.84
ITALY
0.69 (0.40, 1.20)
3.49
KOREA
1.16 (0.76, 1.76)
5.98
Overall (I-squared = 37.0%, p = 0.103)
1.18 (1.06, 1.31)
100.00
.25
.5
1
2
4
Forest plot of covariate interactions
ipdmetan, study(trialid) eform interaction keepall
: stcox arm##c.stage
Trials included: 8
Patients included: 1962
Meta-analysis pooling of
interaction effect estimate
1.arm#c.stage2
using Fixed-effects
default is to pool
coeffs from first
interaction term
trial
%
reference number
Effect (95% CI)
Weight
belgium
0.92 (0.61, 1.40)
18.70
LCSG 773
0.76 (0.40, 1.45)
8.11
CAMS
0.77 (0.43, 1.39)
9.49
MRC LU11
0.62 (0.36, 1.07)
11.26
EORTC 08861
0.39 (0.14, 1.09)
3.16
GETCB 04CB86
0.94 (0.50, 1.77)
8.22
GETCB 05CB86
0.97 (0.72, 1.30)
38.35
KOREA
2.09 (0.70, 6.27)
2.73
SLOVENIA
(Insufficient data)
LILLE
(Insufficient data)
ITALY
(Insufficient data)
Overall (I-squared = 2.7%, p = 0.409)
0.87 (0.72, 1.04)
.125
.25
.5
1
2
4
8
100.00
Inclusion of aggregate data
• I don’t have a separate aggregate dataset, so I will
create one artificially from my IPD dataset
.
.
.
.
** Generate artificial trial subgrouping
gen subgroup = inlist(trialid, 1, 8, 12, 15)
label define subgroup_ 0 "Trial group 1" 1 "Trial group 2"
label values subgroup subgroup_
. ** Run ipdmetan within one of the subgroups; save the dataset
. qui ipdmetan,
study(trialid) by(subgroup) nooverall nograph
saving(subgroup1.dta)
: stcox arm if subgroup==1, strata(sex)
(Aside: Contents of subgroup1.dta)
_use trialid _labels
_ES
_seES _lci
_uci
_wgt
_NN
1
1 belgium
0.376 0.156 0.069 0.682 0.286
202
1
8 EORTC 08861
0.496 0.300 -0.091 1.084 0.078
105
1
12 LILLE
0.450 0.200 0.058 0.841 0.176
163
1
15 GETCB 05CB86
0.362 0.123 0.120 0.603 0.460
539
Inclusion of aggregate data: Syntax
. ipdmetan, study(trialid) eform nooverall
Do not pool IPD
and aggregate
together
Aggregate data syntax
ad(subgroup1.dta, byad)
“byad” = treat IPD &
aggregate data as
subgroups
: stcox arm if subgroup==0, strata(sex)
estimation_command
Trials included from IPD: 7
Patients included: 1333
Trials included from aggregate data: 4
Patients included: 1009
Inclusion of aggregate data:
Screen output
Pooling of main (treatment) effect estimate arm
using Fixed-effects
------------------------------------------------------------------trial reference
|
number
|
Effect
[95% Conf. Interval]
% Weight
---------------------+--------------------------------------------IPD
|
LCSG 773
|
1.123
0.827
1.526
11.13
CAMS
|
1.029
0.768
1.378
12.20
...
|
...
Subgroup effect
|
1.021
0.896
1.163
61.25
---------------------+--------------------------------------------Aggregate
|
belgium
|
1.456
1.072
1.979
11.09
EORTC 08861
|
1.643
0.913
2.956
3.02
...
|
...
Subgroup effect
|
1.479
1.256
1.743
38.75
------------------------------------------------------------------Tests of effect size = 1:
IPD
z =
Aggregate
z =
0.305
4.682
p =
p =
0.760
0.000
Inclusion of aggregate data: Forest plot
trial
%
reference number
Effect (95% CI)
Weight
LCSG 773
1.12 (0.83, 1.53)
18.18
CAMS
1.03 (0.77, 1.38)
19.92
MRC LU11
0.96 (0.74, 1.24)
26.12
SLOVENIA
0.89 (0.54, 1.49)
6.48
GETCB 04CB86
1.14 (0.80, 1.62)
13.85
ITALY
0.69 (0.40, 1.20)
5.69
KOREA
1.16 (0.76, 1.76)
9.76
Subtotal (I-squared = 0.0%, p = 0.740)
1.02 (0.90, 1.16)
100.00
belgium
1.46 (1.07, 1.98)
28.61
EORTC 08861
1.64 (0.91, 2.96)
7.79
LILLE
1.57 (1.06, 2.32)
17.56
GETCB 05CB86
1.44 (1.13, 1.83)
46.03
Subtotal (I-squared = 0.0%, p = 0.964)
1.48 (1.26, 1.74)
100.00
IPD
Aggregate
.25
.5
1
2
4
Advanced syntax example:
non “e-class” estimation command
ipdmetan (u[1,1]/V[1,1]) (1/sqrt(V[1,1]))
, study(trialid) eform
ad(subgroup1.dta, byad)
Effect estimate &
SE not from e(b)
– must specify
manually
lcols(evrate=_d %3.2f "Event rate")
rcols(u[1,1] %5.2f "o-E(o)" V[1,1] %5.1f "V(o)")
forest(nooverall nostats nowt)
: sts test arm if subgroup==0, mat(u V)
Advanced syntax example:
columns of data in forestplot
ipdmetan (u[1,1]/V[1,1]) (1/sqrt(V[1,1]))
, study(trialid) eform
ad(subgroup1.dta, byad)
Mean of var currently
in memory (note userassigned name, to
match with varname in
aggregate dataset)
lcols(evrate=_d %3.2f "Event rate")
rcols(u[1,1] %5.2f "o-E(o)" V[1,1] %5.1f "V(o)")
forest(nooverall nostats nowt)
Collect lists of
returned stats
: sts test arm if subgroup==0, mat(u V)
Advanced syntax example: Forest plot
trial
Event
reference number rate
o-E(o) V(o)
IPD
LCSG 773
0.72
CAMS
0.58
MRC LU11
0.78
SLOVENIA
0.85
GETCB 04CB86 0.68
ITALY
0.51
KOREA
0.81
Subtotal
0.69
(I-squared = 0.0%, p = 0.710)
4.77
1.07
-2.48
-2.56
4.95
-4.50
3.06
3.24
Aggregate
belgium
0.83
EORTC 08861
0.43
LILLE
0.64
GETCB 05CB86 0.50
Subtotal
(I-squared = 0.0%, p = 0.964)
.25
.5
1
2
4
41.0
44.9
59.4
15.6
31.6
13.2
22.4
229.6
Advanced syntax example: Forest plot
trial
Event
reference number rate
o-E(o) V(o)
IPD
LCSG 773
0.72
CAMS
0.58
MRC LU11
0.78
SLOVENIA
0.85
GETCB 04CB86 0.68
ITALY
0.51
KOREA
0.81
Subtotal
0.69
(I-squared = 0.0%, p = 0.710)
4.77
1.07
-2.48
-2.56
4.95
-4.50
3.06
3.24
Aggregate
belgium
0.83
EORTC 08861
0.43
LILLE
0.64
GETCB 05CB86 0.50
Subtotal
(I-squared = 0.0%, p = 0.964)
.25
.5
1
2
4
41.0
44.9
59.4
15.6
31.6
13.2
22.4
229.6
These vars do not
appear in the
aggregate dataset,
so are not plotted
Subtotal cannot be
calculated for
aggregate data
The forestplot command
• Does not perform any calculations/estimations; simply
plots existing data as a forest plot
• Overall/subgroup estimates, spacings, labels, text
columns etc. need to be created/arranged in advance
• Ordering & spacing; marking of subgroup/overall
estimates for plotting “diamonds”: _use
• Principal left-hand data column (study IDs,
heterogeneity etc. – string fmt): _labels
• This setup is done automatically by ipdmetan before
passing to forestplot
• (but can also be done manually by user)
• Multiple datasets can be passed to forestplot at once
to create a single large “stacked” plot on common x-axis
forestplot syntax
forestplot [varlist] [if] [in]
[, plot_options graph_options
•
•
•
using_option]
varlist = manually specify varnames to plot
plot_options control the data plotting (within plot region)
graph_options control the surroundings (outside plot region;
graph region)
• using_option represents one or more options that allow
suitable datasets (or parts of datasets) to be fed to
forestplot, possibly with different plot_options, to form a
single large forest plot on a single x-axis.
using_option syntax
using(filenamelist [if] [in] [, plot_options])
[using(filenamelist [if] [in] [, plot_options)]
...]
•
filenamelist is a list of one or more Stata-format datasets
• parts may be specified with [if] [in]
• same filename can appear more than once
• order of filenames determines placement in graph
• Different plot_options may be specified to each using option
• For same options applied to multiple files, place them in a
filenamelist
• For different options applied to each file, place each file
in a different using option
plot_options syntax
• Based on metan syntax, options refer to different parts
of the forest plot
• Most options appropriate to the underlying twoway plot
type are acceptable, with some exceptions
Option
Function
twoway plot type
boxopt
Weighted boxes for
study point estimates
scatter [aweight]
pointopt
Points for study point
estimates
scatter
ciopt
Lines for confidence
intervals
rspike, hor
pcarrow
diamopt
Diamond for summary
estimate
pcspike (x4)
olineopt
Vertical line through
summary estimate
rspike
Example forestplot dataset
(“resultsset” from last ipdmetan example)
Estimates; CIs; weights Extra data columns
_use _by _study _labels
_ES
_lci
_uci _wgt evrate u_1_1_ V_1_1_ _NN
0
1
IPD
1
1
3
LCSG 773
0.116 -0.190 0.422 0.111
0.72
4.77
41.0
1
1
5
CAMS
0.024 -0.269 0.316 0.121
0.58
1.07
44.9
1
1
6
MRC LU11
-0.042 -0.296 0.213 0.160
0.78 -2.48
59.4
1
1
9
SLOVENIA
-0.164 -0.660 0.332 0.042
0.85 -2.56
15.6
1
1
14
GETCB 04CB86
0.157 -0.192 0.506 0.085
0.68
4.95
31.6
1
1
13
ITALY
-0.341 -0.881 0.199 0.036
0.51 -4.50
13.2
1
1
16
KOREA
0.136 -0.278 0.550 0.061
0.81
3.06
22.4
3
1
Subtotal
0.019 -0.111 0.149 0.615
0.69
3.24 229.6
4
1
(I-squared = 0.0%, p = 0.710)
4
1
0
2
Aggregate
1
2
17
belgium
0.376 0.069 0.682 0.110
0.83
202
1
2
18
EORTC 08861
0.496 -0.091 1.084 0.030
0.43
105
1
2
19
LILLE
0.450 0.058 0.841 0.068
0.64
163
1
2
20
GETCB 05CB86
0.362 0.120 0.603 0.177
0.50
539
3
2
Subtotal
0.392 0.228 0.556 0.385
1009
4
2
(I-squared = 0.0%, p = 0.964)
4
2
Heterogeneity between groups:
4
p = 0.000
5
Overall
0.162 0.061 0.264 1.000
1009
4
(I-squared = 38.4%, p = 0.093)
“Stacking” of forest plots
• Imagine:
• dataset on previous slide is saved as ipdtest.dta
• we want IPD boxes to be red, and AD boxes to be green
• We proceed as follows:
• Run forestplot with two using(...) options, one for
each part of the plot, with the same filename
• (Alternatively: run ipdmetan twice and save under
different filenames)
• Specify our desired plot_options as suboptions to using()
forestplot,
using(ipdtest.dta if _by==1, boxopt(mcolor(red)))
using(ipdtest.dta if _by==2, boxopt(mcolor(green)))
lcols(evrate) rcols(u_1_1_ V_1_1_)
nooverall nostats nowt
trial
Event
reference number rate
o-E(o) V(o)
IPD
LCSG 773
0.72
CAMS
0.58
MRC LU11
0.78
SLOVENIA
0.85
GETCB 04CB86 0.68
ITALY
0.51
KOREA
0.81
Subtotal
0.69
(I-squared = 0.0%, p = 0.710)
4.77
1.07
-2.48
-2.56
4.95
-4.50
3.06
3.24
Aggregate
belgium
0.83
EORTC 08861
0.43
LILLE
0.64
GETCB 05CB86 0.50
Subtotal
(I-squared = 0.0%, p = 0.964)
.25
.5
1
2
4
41.0
44.9
59.4
15.6
31.6
13.2
22.4
229.6
Summary and conclusion
• IPD is increasingly used, and its advantages widely accepted
• Large numbers of MA scientists use two-stage models for
analysing IPD
• Currently only AD MA (e.g. metan) and
one-stage IPD (e.g. xtmixed) commands exist in Stata
•
•
ipdmetan is a universal command for two-stage IPD MA
forestplot is a flexible forest plot command
• does not carry out analysis itself, thus not restricted by it
• may be useful outside the MA context (e.g. presenting
trial subgroups)
Further information
• Other related programs (all call forestplot by default):
• admetan: calls ipdmetan to analyse AD
(direct alternative to metan)
• ipdover: fit model within series of subgroups
• petometan: perform meta-analysis of time-to-event
data using the Peto (log-rank) method
• SSC and Stata Journal article in near future
Thankyou!
• Questions, requests, bug reports:
df@ctu.mrc.ac.uk
• Thanks to:
• Jayne Tierney, Patrick Royston
• Ross Harris (author of metan) for advice & support
• Assorted colleagues for testing
• Reference:
• Fisher D. J. et al. 2011. Journal of Clinical
Epidemiology 64: 949-67