Causality in Randomized Phase III Clinical Trials

advertisement
Correcting for Selection Bias in
Randomized Clinical Trials
Vance W. Berger, NCI
9/15/05 FDA/Industry Workshop, DC
Outline






1. What do we expect of randomization (4)?
2. Chronological bias (2).
3. Randomized blocks (3).
4. Selection bias (7).
5. Correcting selection bias (5).
6. Further reading (4).
1. What Do We Expect? (1/4)



The success of randomization has often
been questioned in randomized trials,
because of baseline imbalances [1].
For example, Schor [2] raised this concern
in The University Group Diabetes Program.
Altman [3] raised this concern for a
randomized comparison of talc to mustine
for control of pleural effusions [4].
1. What Do We Expect? (2/4)


Because of an imbalance in the numbers of
patients randomized to each group (134 vs. 116),
the Western Washington Intracoronary
Streptokinase Trial statisticians were “particularly
concerned in verifying that the randomization
process had been carried out as planned” [5].
Weiss, Gill, and Hudis [6] audited a randomized
South African trial of high-dose chemotherapy for
metastatic breast cancer [7], noted imbalances in
the numbers of patients allocated over time, and
concluded that “It is unlikely that this sequence of
treatment assignments could have occurred if the
study were truly randomized.”
1. What Do We Expect? (3/4)


In a randomized study of a culturally sensitive AIDS
education program [8], Marcus [9] hypothesized that
“subjects with lower baseline knowledge scores … may
have been channeled into the treatment group”, because
of baseline imbalances across the randomized groups.
Jordhoy et al. [10] discussed a cluster randomized trial
of palliative care conducted at the Palliative Medicine
Unit of Trondheim University Hospital and noted that
“The individual patient results [meaning baseline
imbalances] suggested that diagnosis was not randomly
distributed across the two groups”.
1. What Do We Expect? (4/4)



Two common themes emerge from all of these
challenges of ostensibly randomized trials.
Questions are raised when either 1) the numbers
of subjects do not match expectations or 2) the
baseline characteristics of the participants differ
greatly across the randomized groups.
Clearly, then, we expect more from randomized
trials than just that they be randomized, and in
fact randomization does not always create the
balanced groups we would have hoped for.
2. Chronological Bias (1/2)






How can baseline imbalances be large enough that one
would question the success of the randomization?
Completely unrestricted randomization ensures
independence, but allows for unbalanced group sizes,
and so is not used very often in practice.
Instead, some form of restricted randomization is used to
ensure balanced group sizes at the end of the trial.
The random allocation rule makes this terminal balance
in group sizes its only restriction, and so it allows for
large baseline imbalances during the trial.
Suppose that many more early allocations are to one
group, and more late allocations are to the other group.
Suppose further that the covariate distribution changes
during the course of the trial; this is quite likely.
2. Chronological Bias (2/2)





There could be more females early, but during
the trial another trial opens up just for females,
so there are more males in this trial henceforth.
Gender is confounded with time, which, because
of the imbalance, is confounded with treatments.
This is chronological bias [11], although the
name is a misnomer as chronological bias does
not systematically favor one group or the other.
Still, it is one cause of baseline imbalances.
The only way to control chronological bias is to
introduce restrictions on the randomization.
3. Randomized Blocks (1/3)





Perhaps the most common form of restricted
randomization is randomized or permuted blocks.
The idea is to force perfect balance every so often.
Block sizes may be fixed (e.g., 4) or varied (e.g., 2
& 4), and the random allocation rule is used within
each block to ensure perfect balance in the block.
In unmasked trials, prior allocations are known.
Once all but one group has been exhausted in the
block (e.g., EECC with size 4), all remaining
allocations to that block will be deterministic.
3. Randomized Blocks (2/3)






In fact, in an EECC block even the 2nd is
predictable, as one can use knowledge of the 1st
allocation to do better than guessing.
Let P{E} be the proportion of remaining
assignments to the experimental group E.
If there is 1:1 allocation between experimental
group E and control C, with block size 4:
CCEE
CECE
CEEC
2/4, 2/3, 2/2, 1/1
2/4, 2/3, ½. 1/1
2/4, 2/3, ½, 0/1
EECC 2/4, 1/3, 0/2, 0/1
ECEC 2/4, 1/3, ½, 0/1
ECCE 2/4, 1/3, ½, 1/1
3. Randomized Blocks (3/3)




Only the 1st allocation of an EECC or CCEE block
is unpredictable, and only the 1st and 3rd of CECE,
CEEC, ECEC, or ECCE blocks are unpredictable.
Even if the investigator has never actually seen the
allocation sequence, he or she will still know P{E}
at the time a patient is considered for trial entry.
In fact, the investigator will know both P{E} (the
predicted treatment assignment) and the set of
covariates specific to the patient being considered.
Only if P{E} equals the unconditional probability
(or 0.5 with 1:1 allocation) is there no prediction.
4. Selection Bias Mechanism (1/7)




Many authors state that, as a consequence of
randomization, any baseline imbalances in a
randomized trial must be random in origin.
Yet selection bias occurs if healthier patients are
enrolled when P{E}>0.5 and sicker patients are
enrolled when P{E}<0.5 (or vice versa).
Of course, this is not a concern in masked trials,
because unmasking is required for P{E} to assume
any value other than the uninformative 0.5.
But in practice, are there any truly masked trials?
4. Selection Bias Mechanism (2/7)




It will help to define our terms carefully.
Some define masked trials as those in which
nobody knows who got what until the end.
Indeed, this is the objective of masking; to define
randomization similarly in terms of its objective is
to define a trial to be randomized if and only if
any of its baseline imbalances are random.
And yet one cannot help but recall Socrates asking
if an act was pious because the heavens approved,
or if the heavens approved because it was pious.
4. Selection Bias Mechanism (3/7)



Just as one cannot confer with Zeus to inquire as
to his approval of an action one is contemplating,
so too is one unable to verify that each observed
baseline imbalance was of a random origin.
This ideal would have to be a consequence, and
not the definition, of randomization, and we are
now left to wonder – what is randomization?
To make randomization, masking, and allocation
concealment useful concepts, and avoid circular
logic, we must define these three terms as actions
that one can take (processes), and not as the
realization of their intended outcomes [12].
4. Selection Bias Mechanism (4/7)




The process of randomization is nothing more, or
less, than constructing treatment groups by
randomly selecting non-overlapping subsets of the
set of all accession numbers to be used [13].
Note that this definition allows one to actually
conduct a randomized trial (it is an action).
Can one eliminate selection bias as a consequence
of randomization according to the definition?
Without allocation concealment (often defined as
masking of each allocation only until a treatment
is assigned to the patient in question), the answer
is clearly no, but perfect masking implies perfect
allocation concealment, which implies no bias.
4. Selection Bias Mechanism (5/7)






But do masking & allocation concealment claims
confer true allocation concealment (and no bias)?
The process of masking, or not telling patients or
physicians who got what, is clearly worthwhile,
but information is not often contained very well.
Tell-tale side effects, e.g., may lead to unmasking.
Sealed envelopes have been held up to lights, files
have been raided, and fake patients have been
called in to ascertain the next allocation [14].
So the effect of masking may not match its goal.
Unmasking may lead to evaluation biases; if it
occurs after the patients have been selected then it
should not lead to selection bias; however …
4. Selection Bias Mechanism (6/7)





Most RCTs use restricted randomization (blocks).
The patterns in the allocation sequence allow for
prediction of the future allocations based on
knowledge of the past ones, and selection bias [1].
So even “masked” randomized trials with planned
allocation concealment are not immune [12].
One can compute the expected imbalance in a
binary covariate to be 50% with blocks of size 2,
42% (block size 4), or 28% (block size 6) [15].
The result is artificially large test statistics and
posterior probabilities, artificially low p-values,
and artificially narrow confidence intervals.
4. Selection Bias Mechanism (7/7)
All patients randomized (20 male, 20 female)
20 blocks of size two each
10 ‘CE’ blocks, 10 ‘EC’ blocks
For ‘CE’, P{E}=0.5, then 1.0
For ‘EC’, P{E}=0.5, then 1.0
Females respond better than males
Selectively
Semi-permeable
P{E}=0.0 (10 male)
100%
Selectively
Semi-permeable
Permeable
P{E}=0.5 (10 male, 10 female)
50%
Control Group
(25% female, 75% male)
P{E}=1.0 (10 female)
50%
100%t
Experimental Group
(75% female, 25% male)
5. Correcting Selection Bias (1/5)






Selection bias can be prevented, detected, and
corrected, but specialized methods are needed.
Recall that E & C are the experimental & control
treatment groups (TG), respectively; P{E} is the
proportion of E allocations remaining in the block.
If E is superior to C, then treatment group TG and
response Y are correlated, as are P{E} and TG.
P{E} should be unbalanced, possibly prognostic.
But P{E} should not predict Y within a given TG.
Consider two patients who receive E, one known
up front to get E (P{E}=1), one not (P{E}=0.50).
5. Correcting Selection Bias (2/5)




If E[Y|TG=E, P{E}] depends on P{E}, then P{E}
is on the causal pathway of the mechanism of
action of E; this would suggest selection bias.
For example, consider a study with 24 patients, 12
blocks of size two each, six each of EC and CE.
P{E}=0.5 if block position BP=1, P{E}=0 if BP=2
(EC block), and P{E}=1 if BP=2 (CE block).
Suppose that the response data turn out as follows.



BP=2, P{E}=0
C 0/6
E 0/0
BP=1, P{E}=1/2 BP=2, P{E}=1
T
3/6
3/6
3/12
9/12
0/0
6/6
5. Correcting Selection Bias (3/5)



Fisher’s exact p-values are 0.04 (two-sided) or
0.02 (one-sided) for comparing either E to C or
EC blocks to CE blocks; p=0.0003 one-sided or
p=0.0007 two-sided for testing for trend in P{E}
binomial proportions (Jonckheere-Terpstra).
So P{E} is even more predictive than treatment is!
Without allocation concealment P{E} is a perfect
predictor of treatment group (TG), but allocation
concealment (meaning the ability to predict but
not observe) separates the effects of P{E} and TG.
5. Correcting Selection Bias (4/5)





The Berger-Exner test of selection bias [16]
exploits this separation of effects, and is based on
the ability of P{E} to predict Y, adjusting for TG.
The quantity P{E} can also be used to correct for
selection bias, because there is no bias within a
group of patients with the same P{E} value.
That is, P{E} is a balancing score much like the
propensity score (used in observational studies).
P{E} functions as the propensity score, and was
termed the “reverse propensity score” [17].
So compare TGs within P{E} values [17] to
ensure that the comparisons are free of bias.
5. Correcting Selection Bias (5/5)




That is, the suggestion is to use the RPS as a
covariate, although it is an unusual covariate.
We might call the RPS a “reverse causality”
covariate, because it does not bring about better
outcomes but rather suggests that the patient was
found to possess attributes that would do so.
So the RPS is a credential that reflects selection
based on all attributes, but is not itself an attribute.
Further work is needed to clarify if the RPS
should replace or supplement other covariates.
6. Further Reading (1/4)

More information is
available -- just send
me a message and I
will send you articles.

Vance Berger
Vb78c@nih.gov
(301) 435-5303


6. Further Reading (2/4)






[1]. Berger VW, Weinstein S (2004). Ensuring the Comparability of
Comparison Groups: Is Randomization Enough? Controlled Clinical
Trials 25, 515-524.
[2]. Schor, S. (1971). The University Group Diabetes Program: A
Statistician Looks at the Mortality Results. JAMA 217, 12, 1671-1675.
[3]. Altman, D. G. (1985). Comparability of Randomized Groups.
The Statistician 34, 125-136.
[4]. Fentiman, I. S., Rubens, R. D., Hayward, J. L. (1983). Control of
Pleural Effusions in Patients with Breast Cancer. Cancer 52, 737-739.
[5]. Hallstrom, A., Davis, K. (1988). Imbalance in Treatment
Assignments in Stratified Blocked Randomization. Controlled
Clinical Trials 9, 375-382.
[6]. Weiss, R. B., Gill, G. G., and Hudis, C. A. (2001). An On-Site
Audit of the South African Trial of High-Dose Chemotherapy for
Metastatic Breast Cancer and Associated Publications. Journal of
Clinical Oncology 19, 11, 2771-2777.
6. Further Reading (3/4)







[7]. Bezwoda, W. R., Seymour, L., and Dansey, R. D. (1995). High-Dose
Chemotherapy with Hematopoietic Rescue as Primary Treatment for
Metastatic Breast Cancer: A Randomized Trial. Journal of Clinical Oncology
13, 2483-2489.
[8]. Stevenson, H. C., Davis, G. (1994). Impact of Culturally Sensitive AIDS
Video Education on the AIDS Risk Knowledge of African American
Adolescents. AIDS Education and Prevention 6, 40-52.
[9]. Marcus SM (2001). Sensitivity Analysis for Subverting Randomization in
Controlled Trials. Statistics in Medicine 20, 545-555.
[10]. Jordhoy, M. S., Fayers, P. M., Ahlner-Elmqvist, M., Kaasa, S. (2002).
Lack of Concealment May Lead To Selection Bias in Cluster Randomized
Trials of Palliative Care. Palliative Medicine 16, 43-49.
[11]. Matts, J. P. and McHugh, R. B. (1983). Conditional Markov chain design
for accrual clinical trials. Biometrical Journal 25, 563-577.
[12]. Berger, VW, Christophi, CA (2003). “Randomization Technique,
Allocation Concealment, Masking, and Susceptibility of Trials to
Selection Bias”, JMASM 2, 1, 80-86.
[13]. Berger, VW (2004). “Selection Bias and Baseline Imbalances in
Randomized Trials”, Drug Information Journal 38, 1-2.
6. Further Reading (4/4)




[14]. Berger, VW (2005). Selection Bias and Covariate
Imbalances in Randomized Clinical Trials, John Wiley &
Sons, Chichester.
[15]. Berger, VW (2005). “Quantifying the Magnitude of
Baseline Covariate Imbalances Resulting from Selection
Bias in Randomized Clinical Trials” (with discussion),
Biometrical Journal 47, 2, 119-139.
[16]. Berger, VW, Exner, DV (1999). “Detecting Selection
Bias in Randomized Clinical Trials”, Controlled Clinical
Trials 20, 319-327.
[17]. Berger, VW (2005). “The Reverse Propensity Score
To Manage Baseline Imbalances in Randomized Trials”,
Statistics in Medicine 24, in press.
Download