Lecture5

advertisement
Stratification:
Are you a lumper or a splitter?
…and if you are a splitter, how
should you split the data and
when?
Outline of Stratification Lectures
• Definitions, examples and rationale
(credibility)
• Implementation
– Fixed allocation (permuted blocks)
– Adaptive (minimization)
• Rationale - variance reduction
– Pre- and post-stratification
Stratification in randomized trials is
different from stratified random sampling
where the population might be divided up
into strata, e.g., census tracts, and each
stratum is sampled randomly for some prespecified sample size.
Typical Situation for Stratification in Trials
• Usually, no restriction on number of
participants per stratum (goal is to enroll as
rapidly as possible and include participants
who are representative of target population)
• There are exceptions (sometimes required
by funder or regulatory authority): some
trials have goals or put caps on the
enrollment of certain subgroups:
– ELITE II heart failure trial -- at least 85% of
patients had to be > 65 years.
– Dietary study to lower BP (DASH) – a target of
50% women and 50% blacks. ,
Stratification
• A procedure in which factors known to be
associated with the response (prognostic factors)
are taken into account in the design (e.g.,
randomization)
• Another type of restriction on the randomization.
– Goal of permuted block randomization is to achieve
balance on the number in each treatment arm over time.
– Goal of stratification is to achieve balance between
groups with respect to important prognostic factors.
• Pre-stratification refers to a stratified design; poststratification refers to the analysis
Example: Weight Loss Interventions
in Clinical Practice
(Appel L et al, N Engl J Med 2011)
• 415 participants randomized (1:1:1) to control
(n=138), remote support (n=139) or in-person
support (n=138) (a modest size trial)
• Methods:
– “Randomization was stratified according to sex and was
generated in blocks of 3 and 6 with use of a Web-based
program.”
– “The primary analysis was conducted with…repeated
measures, mixed-effects. The model included
adjustment for clinic, sex , age and race.”
• Results: Female sex n=88 in each treatment group
Post-stratification (def.)
Classification of experimental units into strata after they
have been randomized for the purpose of data analysis
e.g., stratified analysis of variance (normally distributed
response), Mantel-Haenszel (binary response).
Often adjustment for baseline covariates is carried out
using regression methods, e.g., linear regression or
analysis of covariance (continuous), logistic regression
(binary), or Cox regression (time to event)
This can be done irrespective of whether you employed
pre-stratification.
Note:
The term post-stratification is sometimes used to describe stratification on
data collected post-randomization. Such analyses can be very difficult to
interpret. More later on that issue.
General Problems/Issues with PostStratification
• Model dependence / data dredging
– How were covariates (stratifying variables) selected?
– How were cut-points (metric) chosen?
• Frequently covariates are not pre-specified
– Partial solution: Analysis plan in the protocol that
includes all covariates considered important (prestratification variables + others); updated analysis plan
prior to unblinding the results of the study to
investigators.
Possible Stratification Scenarios
• Pre- plus post-stratification
• Pre-stratification only
• Post-stratification only
• Neither pre- nor post- stratification
• Regression adjustment with or without
stratification
Examples
• Targeted temperature management after
cardiac arrest (N Engl J Med 2013; 369:21972206.
– Unadjusted and adjusted (design variables and design +
other variables) Cox regression analyses for mortality
(Table S10).
• Vaccine for influenza in children (N Engl J Med
2013; 369: 2481-2489).
– Cox model adjusted for variables used in minimization
scheme – “pre-stratification variables.
• Solanezumab for Alzheimer’s disease (N Engl J
Med 2014; 370: 311-321).
– Mixed model, change from baseline on baseline and
other covariates.
Advantages of Pre-Stratification
•
Prevents “accidental bias” resulting from maldistribution of important prognostic variables
•
Increases precision (if stratifying variables are
related to outcome)
•
Ensures balance on stratifying factors in early
interim analyses (even in large trials)
•
Facilitates subgroup analysis by stratifyng factor
(more optimal allocation ratio)
•
Results less subject to criticism
International Conference on Harmonization
(ICH) Guideline (E-9 Document)
“Stratification by important prognostic
factors measured at baseline (e.g.,
severity of disease, age, sex, etc.) may
sometimes be valuable in order to
promote balanced allocation within
strata; this has greater potential benefit
in small trials.”
Disadvantages of Pre-Stratification
Primarily relates to additional administrative
burden of implementation of randomization.
• May have several randomization schedules
• Measurements to define stratum must be
carefully made prior to randomization
What Stratification Does Not Do
1. Guarantee adequate power to make withinstratum comparisons
2. Eliminate the need to carry out covariateadjusted analysis
– Chance imbalance on other covariates
– Analysis consistent with design
Criticisms of UGDP
• Definition of target population
• Missing data and eligibility errors
• Differences in baseline characteristics
– “Among the five treatment groups, as well as among clinics,
baseline risk factors were also unevenly distributed. This was due
to simple randomization of patients without subsequent
“stratification” to correct for chance preponderance of antecedent
risk factors in one or more of treatment groups.”
• Defects in interpretation (e.g., accounting for adherence)
Seltzer H, Diabetes 1972 (see also Feinstein A, Clin Pharm Ther 1971 and Biometric Society review, JAMA 1975)
UGDP: Baseline Characteristics
Placebo
(N=205)
Tolbutamide
(N=204)
Insulin
Standard
(N=210)
Age ≥ 55 (%)
42.3
48.2
46.2
46.2
Male (%)
30.8
31.5
26.9
22.1
Non-white (%)
49.8
47.2
51.0
41.2
Hypertension (%)
37.1
29.5
31.1
28.7
Diabetes (%)
4.5
7.8
5.8
5.1
Angina (%)
4.5
7.2
7.7
3.6
ECG
abnormalities
(%)
3.1
4.1
5.3
4.1
Cholesterol ≥
300 mg/dl
8.8
15.5
16.6
13.8
One or more of
the above (%)
47.0
47.8
50.5
42.6
Cornfield J, JAMA 1971
Insulin
Variable
(N=204)
Baseline Characteristics of Patients in Trial to
Prevent Toxoplasmic Encephalitis
(JID 1994;169:384-94)
Pyrimethamine
(n=264)
Placebo
(n=132)
CD4+ count (cells/mm3)
96.1
97.4
AIDS OI (%)+
35.2
22.0
Karnofsky Score
89.5
89.7
Hemoglobin (g/dl)
12.6
12.7
“In view of the major imbalance between the
groups in presentation at baseline with AIDS
defining OIs, the rigorousness of the allocation
procedures need to be supported in detail if
the results are to be regarded as credible.”
NEJM referee for paper – major reason for rejection
Example
How a small difference in an important
prognostic variable can bias treatment
differences.
Baseline Characteristics in Trial of
Didanosine (ddI) and Zalcitabine (ddC)
(N Engl J Med 1994; 330:657-662)
ddI
ddC
(N=230)
(N=237)
Mean
SD
Mean
SD
Age (years)
37.8
8.5
37.5
7.8
CD4+
75.1
86.2
71.1
84.3
Karnofsky Score
87.2
11.9
85.3
11.9
Prior AIDS
Diagnosis (%)
64.8
66.7
Frequency Distribution of Karnofsky
Score by Treatment Group
< 70
ddI
4.8
ddC
6.8
70 - 79
10.0
11.8
80 - 89
21.3
24.1
90 - 99
36.1
36.7
100+
27.8
20.6
Death Rate by Karnofsky Level
Karnofsky
Score
Death Rate
per 100
Person-years
< 70
169.8
70 - 79
84.0
80 - 89
41.0
90 - 99
31.9
100+
18.4
Comparison of
Unadjusted and Adjusted
Relative Risk Estimates
RR (ddC/ddI)
P-value
Unadjusted
0.79
0.11
Adjusted
0.66
0.006
A major problem with this study is the adjustment for the
“small differences at baseline” between didanosine and
zalcitabine. While there is a “small difference” noted, the
variability for each of these variables is quite large. For
example, the difference in CD4 count was 4 cells/mm3
between treatment groups; however, the standard error was
over 86 cells/mm3. Similarly, for Karnofsky performance
status, the difference between the two groups was 2, but the
standard error was 11.9. And, finally, there was no difference
in the presence of AIDS-defining illness between the two
groups. In short, the conclusion that should be drawn is that
there is, indeed, no difference between the two groups and
attempting to adjust for these small differences is
inappropriate. The discussion of Results on page 23, first
paragraph, should be eliminated.
Comments by NEJM referee – this time no rejection!
Summary
• Small differences in a very important prognostic
variable (irrespective of significance) can bias
treatment comparisons
• Large, significant differences in unimportant
variables will not bias treatment comparisons
• Remember a p-value is a function of both sample
size and effect size
• Chance imbalances can occur with large sample
sizes if there are many strata.
Stratified Design for Comparing Treatments
Treatment
Stratum
A
B
1
m1A
m1B
m1
2
m2A
m2B
m2
3
m3A
m3B
m3
4
m4A
m4B
m4
na
nb
• Typical situation:
m1 ≠ m2 ≠ m3 ≠ m4
• Study is designed/powered based on na and nb
• Goal: miA = miB for all i.
Considerations in the Decision to
“Lump” or “Split”
1. Size of study
2. Homogeneity of study subjects
3. Strength of prognostic factors
(between strata variability)
4. Administrative burden
5. Credibility
Usual Implementation
• Block randomization within stratum
i.e., prepare a separate randomization schedule
for each stratum usually with relatively small block
sizes
• Makes no sense to use simple randomization
Note: The aim of this method is to ensure balance
within strata formed by cross-classification of all
factor levels.
Typical Stratifying Variables
• Clinical site (good idea in multi-center study
as each site can be viewed as a replication of
study)
• Baseline level for outcome of interest
• Stage of disease
• Combination of factors, e.g., a risk score
Stratification Example: TOMHS
• Multi-center (4 clinical sites) trial with two
other strata defined by previous use of
antihypertensive treatment (Rx) (Yes/No)
• 4 x 2 = 8 strata and randomization
schedules – aim is to achieve the desired
allocation ratio across all 8 groups
In general, s stratification variables with Ii levels for
s
the ith variable result in π
strata.
I
i=1i
One can calculate the probability of obtaining a
certain imbalance before the study begins. This
can be used to decide whether to stratify the
randomization.
p(t) =
(
N
t
(
a
) (
N
a
+ N
t 1
N b
t 1- t
b
)
)
p(t) is the prob. of randomizing t patients to group
A when there are t1 patients in stratum 1. For a
certain imbalance one can sum over all p(t) for t's
that give that imbalance or worse.
Example: Na = 100, Nb = 100, t1 = 40, g = 0.16, h = 0.24
p(t) =
(
100
t
) ( )
( )
100
40 - t
200
40
Stratum 1
Group A
Group B
Total
16
24
40
Stratum 2
Total
84
76
160
100
100
200
Want the prob. of obtaining the imbalance given by
g = 0.16, h = 0.24, or worse.
 p(t) = 0.216
t ≤ 16
t ≥ 24
Probability of Given Imbalance or More
Extreme
Total in Stratum
Fraction
Assigned B
50
100
1000
.52
.48
1.0
.84
.23
.55
.45
.57
.42
.002
.60
.40
.25
.07
–
.70
.30
.01
–
–
Fraction
Assigned A
Estimates for the Size of Treatment
Imbalance
• Let B = block size; K = number of strata; and D =
imbalance.
• Hallstrom and Davis (Cont Clin Trials, 1988) showed that
the total trial imbalance for the number of patients assigned
2 treatments across all strata = D = KB/2 with variance =
K(B+1)/6
• Example: Cardiac arrhythmia trial with 270 strata (site,
ejection fraction, time since MI) and block size of 4.
• Max D = 540; Var (D) = 225; SD (D) = 15; 2 SD = 30.
• In this trial, 4200 patients were to be randomized and an
imbalance of 30 with probability = 0.05 was considered
acceptable.
For small studies with a large number of strata, the use of
random permuted blocks within strata can be selfdefeating.
Example: A study of testicular cancer
• 2 treatments
• 3 stratifying variables
Stage: 2 levels
Histology: 3 levels
Age: 2 levels
No. of strata = 2 x 3 x 2 = 12.
Randomization Schedules for 12 Strata
Stage I
< 15
A*
A*
B
A
B
B
≥ 15
A*
A*
A*
B
B
B
Embryonal carcinoma
A*
A*
B
B
B
A
Choriocarcinoma
B*
B
A
A
B
A
Histology
Teratocarcinoma
* Patients randomized
Stage II
< 15
A*
A*
A*
B
B
B
≥ 15
B*
A*
A*
B
B
A
B
B
A
A
A
B
B*
B*
A
A
B
A
A*
B*
B*
B*
A
A
B
A
A
B
B
A
A*
B*
B*
B*
A
A
B*
B*
A
A
A
B
Marginal Totals for Strata
A
B
10
3
1
1
5
6
Stage I
Stage II
7
7
1
11
Age: < 15
≥ 15
8
6
6
6
14
12
Teratocarcinoma
Embryonal carcinoma
Choriocarcinoma
TOTAL
Minimization
A method of adaptive stratification
which balances the marginal treatment
totals for each stratification variable.
Interestingly, the European Committee for
Proprietary Medicinal Products (CPMP)
discourages use of minimization due to concerns
about analysis. They note that the methods
remain “highly controversial” and are “strongly
discouraged”.
References
• Taves DR, Clin Pharmacol Ther 1974;
15:443-53.
• Pocock S, Simon R. Biometrics 1975;
31:103-15.
Some Notation
Let Xik = number of patients already assigned
treatment k
k = 1, 2 (A or B) for our example
i = 1, 2 …, f prognostic factors of a new patient
Xtik = Xik if t ≠ k and = Xik+1 if t = k
Xtik denotes the new allocation if the new patient is
assigned to t.
t = 1, 2 (A, B)
Lack of Balance Functions
B(t) could be a function of Xik or Xtik which measures the “Lack of Balance”: 2 examples
f
B(t) =
 Xik
i=1
f
B(t) =
 range (Xti1, Xti2)
i=1
Rule of assignment: Use the treatment with smallest B(t) with higher probability.
Note: Pocock and Simon’s approach is more general than Taves. It allows for variation
among assignments to be considered (e.g., range) and non-deterministic assignment.
Characteristics of New Patient
Example (Pocock, page 85):
Factor
Number on
each treatment
Level
A
B
Patient
Performance status
Ambulatory
Non-ambulatory
30
10
31
9
x
Age
< 50
≥ 50
18
22
17
23
x
Disease-free interval
< 2 years
≥ 2 years
31
9
32
8
Visceral
Osseous
Soft tissue
19
8
13
21
7
12
Dominant metastatic
lesion
2 x 2 x 2 x 3 = 24 strata; x denotes the characteristics of the next patient to be
randomized. Note: Taves would simply sum marginal totals and randomize to
treatment with lowest total. In this case, A(76) instead of B (77).
x
x
Estimation of B (1)
1
i) Factor 1, Level 1
k
1
2
x 1k
30
31
x 11k
31
31
Range (x 111 – x 12 )
31 – 31 = 0
ii) Factor 2, Level 1
k
1
2
x 2k
18
17
x 12k
19
17
Range (x 21 – x 22 )
19 - 17 = 2
iii) Factor 3, Level 2
k
1
2
x
9
8
x
10
8
1
3k
iv) Factor 4, Level 1
k
1
2
x 4k
19
21
x
20
21
1
4k
B (1) = 0 + 2 + 2 + 1 = 5
3k
1
1
1
31
1
32
1
41
1
42
Range (x – x )
10 – 8 = 2
Range (x – x )
20 – 21 = 1
Estimation of B (2)
2
i) Factor 1, Level 1
k
1
2
x 1k
30
31
x 21k
30
32
Range (x 211 – x 12 )
30 – 32 = 2
ii) Factor 2, Level 1
k
1
2
x 2k
18
17
x 22k
18
18
Range (x 21 – x 22 )
18 - 18 = 0
iii) Factor 3, Level 2
k
1
2
x
9
8
x
9
9
3k
2
3k
2
2
2
31
2
32
2
41
2
42
Range (x – x )
9–9=0
k
x 4k
x
Range (x – x )
1
19
19
19 – 22 = 3
2
21
22
Since B (1) = B (2), toss a coin for the next patient.
iv) Factor 4, Level 1
2
4k
Implementation
Need to continuously update marginal totals
to determine B(t) therefore this is best
done at a central coordinating/statistical
center
Flexibility in allocation: Examples
1. P = 1 if B(1) ≠ B(2)
P = 1/2 if B(1) = B(2)
Simple randomization if equal,
deterministic if unequal
2. P = 2/3 if B(1) ≠ B(2)
P = 1/2 if B(1) = B(2)
P denotes the: Prob (groups become “more
equal”)
The more P deviates from 1 when B(1) ≠ B(2),
the less effective the balancing
Theoretical Challenge
• Not true randomization – in some cases
deterministic
– Violation of randomization as basis for inference
– If the site knows all the margins, then can
predict
• Reality: When done in a multi-center trial,
with central randomization, impossible for
sites to predict
– Appears random to the sites
– Basis for inference: We do inference all the
time in non-randomized trials, doesn’t bother us
then
Summary
• Unless a very small block size is used, overstratification is likely with use of block
randomization within strata if you have many strata
relative to the total sample size.
• Minimization should be considered for situations
where you have several important prognostic
factors and a small sample size (particularly if you
are concerned about using a very small block
size).
• Therneau (Cont Clin Trials 1993;14:98-108)
suggests that as the number of distinct groups
(strata) approaches N/2, adaptive methods be
considered.
Download