PubH 7420 Clinical Trials: Supplemental Notes for Lectures 5 and 6

advertisement
PubH 7420 Clinical Trials: Supplemental Notes for Lectures 5 and 6
1.
Friedman, Furberg, and DeMets. Fundamentals of Clinical Trials, Chapter 5 and
Chapter 16, pages 297-304.
Supplemental Reading References
1. Grizzle JE. A note on stratifying versus complete random assignment in clinical
trials. Cont. Clin. Trials, 3:365-368, 1982.
2. Meier P. Stratification in the design of a clinical trial. Cont. Clin. Trials,
1:355-361, 1981.
3. Scott NW, McPherson GC, Ramsay CR, Campbell MK. The method of
minimization for allocation in clinical trials: a review. Cont. Clin. Trials, 23:662674, 2002.
4. Hallstrom A, Davis K. Imbalance in treatment assignments in stratified blocked
randomization. Cont. Clin, Trials, 9:375-382,1988.
5. Kernan WN, Viscoli CM, Makuch RW et al. Stratified randomization for clinical
trials. J Clin Epidemiol 52:19-26, 1999.
6. Pocock SJ: Clinical Trials. A practical approach. John Wiley and Sons, Ltd.
Chapter 5 and Chapter 13, pages 216-220.
6
Clinical Trials: Design, Conduct and Analysis, Chapter 10.
1
Stratification (def.) - A procedure whereby factors which are known to be associated
with the response of interest (prognostic variables) are taken into account in the
randomization scheme, i.e., in the design of the study. Stratification aims to help ensure
that prognostic variables have the same distribution in all treatment groups.
Stratification is used to refer to restrictions on the randomization other than time
(blocking). In other words, blocking is a restriction placed on the randomization to
ensure the desired allocation ratio while stratification is a restriction to ensure
comparability of the treatment groups with respect to the stratifying variables. As noted
previously, in multi-clinic trials stratification on clinic is usually carried out since the
types of patients can vary widely from clinic to clinic as can use of concomitant
treatments and compliance to study treatment, i.e., it is not surprising to see a marked
clinic effect on the outcomes of interest. In the analysis, sites may have to be grouped,
e.g., by region, by size, or by type (HMO, university), otherwise, the sparse strata could
result in a loss of power.
Stratum (def.) - a large group of experimental units more homogenous than a randomly
assembled group of experimental units by virtue of classification on some variable or set
of variables at baseline.
Advantages:
-
May prevent bias (an unfair treatment comparison) arising as a result of a chance
imbalance between treatment groups on an important baseline prognostic factor.
-
Will increase the precision (reduce the variance) of the treatment comparisons
made.
-
Will facilitate within stratum (subgroup) analysis since the treatments will be
balanced.
-
If important prognostic factors are balanced then the study will be subject to less
criticism.
Disadvantages:
-
Results in a randomization scheme which is more difficult to implement and
therefore more prone to error. For example, in a multicenter trial if one is
stratifying on three baseline variables and each variable has two possible
outcomes, eight schedules would have to be prepared in advance for each clinic.
It is important to differentiate stratified randomization (also referred to as
pre-stratification or a stratified design) from post-stratification. Whether or not one uses
stratified randomization, post-stratification can be used in the analysis.
2
Post-Stratification (def.) - the classification of experimental units into strata after they
have been randomized for the purpose of data analysis. Usually strata are defined by
pre-randomization (baseline) measurements.
Stratified vs. Unstratified Design - Considerations
1.
Size of the study; gain in statistical efficiency is minimal for study > 50 patients.
2.
Stratifying variables should be easily observed or measured prior to
randomization; variables used for stratification should be relatively free of
measurement error.
3.
Risk of errors in carrying out the mechanics of randomization is greater with
stratification. Stratifying variables that involve complicated computations or
interpretations should be avoided.
4.
The gain in statistical efficiency is small unless the stratifying variables are
strongly related to the outcome variables.
5.
The desired allocation ratios may not be achieved if several strata are used with
only a small number of patients per stratum.
6.
It is unreasonable to expect to control for all prognostic variables in the design.
Post-stratification in the analysis using variables that were not considered in the
design is usually necessary.
Most clinical trials that employ stratification in the design do so to ensure balance with
respect to important prognostic factors. Also, for most trials with stratified designs,
sample size is estimated based on the overall number to be enrolled (all strata pooled),
and the analysis stipulates that treatment differences will be pooled across strata. Very
few trials are designed to provide high power of detecting treatment differences within
each stratum.
3
Usual Implementation
Block randomization within stratum, i.e., a separate randomization schedule is prepared
for each stratum. Note that it does not really make any sense to stratify unless the
treatments are assigned within stratum by block randomization or an equivalent scheme
since one of the aims is to avoid chance imbalances. For this same reason, usually the
block size chosen is relatively small so that balance is achieved in small strata.
Example: A multi-clinic trial, (2 clinics) with 2 treatments (A and B) with 20 patients
expected in each clinic. Would like equal allocation to treatments A and B and also to
ensure that a similar number of men and women receive each treatment, i.e., stratify by
clinic and gender - 4 strata.
Schedules for Clinic #1
Accession No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Schedule for
Men
B
A
A
B
A
B
B
A
B
A
B
A
B
A
B
A
A
B
B
A
Schedule for
Women
B
A
B
A
A
B
A
B
B
A
B
A
B
A
B
A
A
B
A
B
* Generated using randomly mixed blocks of size 2 and 4
1-5 = > Block size = 2; 1-5 = > AB, 6-0 = > BA
6-0 = > Block size = 4; 1-6 = > use the appropriate permutation as in previous
example, don't use 7-0.
4
Schedules for Clinic #2
Accession No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Schedule for
Men
B
A
B
A
A
A
B
B
B
A
A
B
A
B
B
A
A
A
B
B
Schedule for
Women
B
A
B
A
B
A
A
B
A
B
A
B
A
B
A
A
B
B
B
A
This method of stratification can be self-defeating in a small study with several stratum,
i.e., over-stratification. More generally, Therneau (Cont Clin Trials, 1993;14:98-108)
has shown you can have problems when the number of strata (distinct combinations of
factor levels) becomes large relative to the sample size.
A couple of examples:
Example 1: (Lancet 2000, 356:1521-1522, Letter concerning SYMPHONY study).
Stratification was carried out by site (670 sites) and indication for treatment (myocardial
infarction or unstable angina) – 1,340 total strata, and a block size of 6 was used. Even
though 9,233 patients were randomized, imbalances arose across the indication
grouping and this resulted in queries about their report. The imbalances likely arose
because not all of the blocks were filled, i.e., many sites enrolled fewer than 6 patients
with each indication.
Example 2: A study of testicular cancer, 2 treatments (A and B), and 3 stratification
factors: 1) Stage (I or II), 2) Histology (teratocarcinoma, embryonal carcinoma,
elements of choriocarcinoma), and 3) age (<15, 15) -- .2 x 3 x 2 = 12 schedules
5
Reference: Tagnan HJ, Staquet MJ, editors. Controversies in Cancer, Design of Trials
and Treatment. Blocks of size 6 were used and the following 12 schedules were
prepared:
Stage I
Stage II
Histology
<15
>15
<15
>15
Teratocarnoma
A*
A*
B
A
B
B
A*
A*
A*
B
B
B
A*
A*
A*
B
B
B
B*
A*
A*
B
B
A
Embryonalcarcinoma
A*
A*
B
B
B
A
B
B
A
A
A
B
B*
B*
A
A
B
A
A*
B*
B*
B*
A
A
Choriocarcinoma
B*
B
A
A
B
A
B
A
A
B
B
A
A*
B*
B*
B*
A
A
B*
B*
A
A
A
B
The asterisk corresponds to the 26 patients randomized. The number of patients in
each stratum is given below:
A
B
Histology:
Teratocarcinoma
10
1
Embryonal carcinoma
3
5
Choriocarcinoma
1
6
Stage: I
II
7
7
1
11
Age: < 15
8
6
>15
6
6
Total
14
12
Note that the distributions of histology and stage are very unbalanced. Could solve this
problem by using smaller block sizes, fewer stratifying variables or an adaptive
stratification scheme.
6
Minimization
An adaptive stratification procedure developed to cope with the problem of a small study
and several strata. This approach was described in papers by Taves (Clin Pharmacol
Ther, 1974) and Pocock and Simon (Biometrics, 1975). Instead of trying to achieve
balance with respect to treatment assignments for all possible combinations of the
prognostic variables (in our example all 12 strata), the minimization procedure restricts
its aim to equalizing treatment numbers at the different levels of each variable taken
separately. This is accomplished by choosing the treatment for each new patient
entering the study in such a way that the "treatment imbalance" after admitting that
patient is as small as possible.
Let xik =
the number of patients already assigned treatment k
k=
1, 2 or in our case A, B for those patients who have the same level of
prognostic factor i
i=
1, 2, ..., f factors corresponding to characteristics of the new patient
Let xitk = xik if t  k
= xik + 1 if t = k
xitk represents the change in balance of allocation if the new patient is assigned to
treatment t
B(t) =
function of xitk's which measures the "lack of balance" over all prognostic
factors if the next patient is assigned treatment t
In his book, Pocock considers the simple case of B(t) =  xik , i=1,…f
A similar rule is: B(t) =  Range (xit1, xit2), i=1,…f
Where the range is the absolute difference between the largest and smallest values of
xit1 and xit2. Our rule will be to use the treatment with smallest B(t).
7
Example (Pocock, page 85):
Factor
Number on
each treatment
A
B
Level
Next
Patient
Performance status
Ambulatory
Non-ambulatory
30
10
31
9
x
Age
< 50
> 50
18
22
17
23
x
Disease-free interval
< 2 years
> 2 years
31
9
32
8
Visceral
Osseous
Soft tissue
19
8
13
21
7
12
Dominant metastatic
lesion
x
x
2 x 2 x 2 x 3 = 24 strata; x denotes the characteristics of the next patient
to be randomized.
1. Determine B(1)
i) Factor 1, Level 1
k
1
2
x1k
30
31
x11k Range (x111, x112)
31 31-31 = 0
31
k
1
2
x2k
18
17
x12k Range (x121, x122)
19 19-17 = 2
17
k
1
2
x3k
9
8
x13k Range (x131, x132)
10 10-8 = 2
8
k
1
2
x4k
19
21
x14k Range (x141, x142)
20 20-21 = 1
21
ii) Factor 2, Level 1
iii) Factor 3, Level 2
iv) Factor 4, Level 1
B(1) = 0 + 2 + 2 + 1 = 5
8
2. Determine B(2)
i) Factor 1, Level 1
x1k
30
31
x21k Range (x211, x212)
30 30-32 = 2
32
x2k
1
2
18
17
x22k Range (x221, x222)
18
18-18 = 0
18
k
1
2
x3k
9
8
x23k Range (x231, x232)
9
9-9 = 0
9
k
1
2
x4k
19
21
x24k Range (x241, x242)
19 19-22 = 3
22
k
1
2
ii)
Factor 2, Level 1
iii) Factor 3, Level 2
iv) Factor 4, Level 1
B(2) = 2 + 0 + 0 + 3 = 5
Since B(1) = B(2) toss a coin for the next patient
Generalizations of this procedure
1. B (t) =  wi Range (xit1, xit2)
where wi is the relative importance of factor i to the other factors
2. Assign the patient with probability > 1/2 to the "preferred" treatment.
The major disadvantage of this method is that it is more difficult to implement.
For additional reading see Taves DR: Minimization: a new method of assigning patients
to treatment and control groups. Clin Pharmacol Ther, 15:443-453, 1974 and Pocock
SJ, Simon R: Sequential treatment assignment with balancing for prognostic factors in
the controlled clinical trial. Biometrics, 31:103-115, 1975.
9
Implementation of Minimization
Requirements:
1.
Need an easy way to update the marginal totals for each treatment after each
randomization so that the data is available for the next patient.
- Pocock proposes index cards
- If central randomization is used, a computer could be used
- If each center has a microcomputer, the calculation of even complex
B(t)'s can be accomplished quickly
2.
Having determined B(t) need a procedure based on pre-chosen value of P
P = Prob (assign "preferred" treatment)
Examples:
1.
P = 1 if B(1) = B(2)
P = 1/2 if B(1) = B(2)
i.e., simple randomization
2.
P = 2/3 if B(1) ¹ B(2)
P = 1/2 if B(1) = B(2)
10
Stratification and Variance Reduction
How much of an increase in precision is expected with stratification? How much of a
price does one pay for trusting randomization to achieve reasonable balance? To
consider this question consider the relative efficiency (RE) of two designs:
RE = Var (treatment contrast with stratification)
Var (treatment contrast with no stratification, but post-stratified analysis)
Consider the comparison of 2 treatments, A and B, and a single dichotomous prognostic
factor, S. Also assume a balanced design, i.e., an equal number of patients given A
and B.
Grizzle (Cont. Clinical Trials, 1982) considered the question of how much loss in
efficiency occurred by trusting unstratified randomization to achieve reasonable
balance. He considered the situation of a continuous response variable with equal
variances at each level of the prognostic factor and showed that the relative efficiency
(RE) as defined above could be written as:


n A g + nB h
RE = 

 n A g(1 - g) + n B h(1 - h) 
-1
 n A g + nB h 
1  1
n A + nB 

nA = total number randomly assigned to A.
nB = total number randomly assigned to B.
g = fraction of those given A at level 1 of prognostic factor.
h = fraction of those given B at level 1 of prognostic factor.
11
Treatment
Stratum
A
B
1
nAg
nBh
2
nA(1-g)
nB(1-h)
nA
nB
_ _ _
_
If the average responses in each cell are denoted Y1A, Y1B, Y2A and Y2B, then
 1
1  2
 
Var ( Y 1A - Y 1B ) = 
+
 n A g nB h 
and:

 2
1
1
 
Var ( Y 2A - Y 2B ) = 
+
 n A (1 - g) n B (1 - h) 
Grizzle's RE follows from noting that the variance of the overall average difference
between A and B is a weighted average of the above estimates.
The weights are:
1
1
nA g
+
=
1
n A gn B h
n A g + nB h
nB h
and:
n A (1 - g) n B (1 - h)
n A (1 - g) + n B (1 - h)
For the stratified design, g=h and for that situation the weights are maximized.
Note that RE = 1 when g = h. When nA = nB, this simplifies to:
-1

  (g + h) 
g +h
  1 RE = 
  1
2 
 g(1 - g) + h(1 - h)  
12
Suppose g = 2h, i.e., the prevalence of the prognostic factor is twice as high for
treatment A as for treatment B.
g, h
0.10, 0.05
0.25, 0.125
0.50, 0.25
0.75, 0.375
RE
0.99
0.97
0.93
0.86
Suppose g = 1-h. Then:
RE = 4g(1-g).
g, h
----------0.6, 0.4
0.7, 0.3
0.8, 0.2
0.9, 0.1
RE
-----0.96
0.84
0.64
0.31
Grizzle concludes that "for sample sizes of 50 or more, and a prevalence of the
prognostic factor of 0.5, large deviations are unlikely, which implies that randomization
can be trusted to prevent large losses in efficiency in this case."
One can calculate the probability of obtaining a certain imbalance before the study
begins. This can be used to decide whether to stratify the randomization.
 N a  N b

  
  

based on hypergeometric distribution
 t   t1 - t 
p(t) =
with 2 strata, 2 groups
 N a + N b




t1

p(t) is the probability of randomizing t patients to group A when there are t 1 patients in
stratum 1. For a certain imbalance one can sum over all p(t) for t's that give that
imbalance or worse.
e.g., Na = 100, Nb = 100, t1 = 40, g = 0.16, h = 0.24
 100   100 

 

t
40
t




p(t) =
 200 


 40 
13
Stratum 1
Stratum 2
Total
Group A
16
84
100
Group B
24
76
100
Total
40
160
200
Want the probability of obtaining the imbalance given by g = 0.16, h = 0.24 or worse.
Σp(t) = 0.216
t < 16
t > 24
This probability can be obtained using the SAS function PROBHYPR or by creating a
data set with cell counts as above and then obtaining the 2-tailed probability for the
Fisher exact test.
Probability of Given Imbalance
g
h
50
N
100
.52
.55
.60
.70
.48
.45
.40
.30
1
.57
.25
.01
.84
.42
.07
-
14
1000
.23
.002
-
Fleiss (The Design and Analysis of Experiments, John Wiley & Sons) argues that only
when the relative efficiency of the unstratified design to the stratified design exceeds
130% is the added effort in setting up the design worth it. The relative efficiency of an
unstratified to a stratified design can be determined by computing the following variance
estimates:
s
2
( n
2
s =
2
s NS =
j
let s2 = pooled variance across
treatment and stratum (S)
ji
- 1)s 2 ji
i
n.. - 2S
2 s
1
[(n.. - 2S) s 2 + n ji ( x ji - x..) ] 2
n.. - 2
i
j
let s2NS = pooled variance
across treatments, but no
stratification (NS)
Note that the RE2(sNS/s2) depends on the sum of squared deviations of the
stratum-specific means about the overall mean. As the variability among
stratum-specific means increases, more consideration should be given to stratification in
the design and/or in the analysis.
15
Now consider a Bernoulli response variable and the impact of post-stratification on RE.
Example of Post-Stratification:
Brown et al, Lancet, 227-230, 1960, Clinical Trial of Tetanus Anti-toxin in Treatment of
Tetanus; see also Meier (Controlled Clinical Trials, 1981).
AntiToxin (A)
No AntiToxin (B)
Alive
21
9
30
Dead
20
29
49
41
38
79
pˆ = overall death rate =
49
= 0.620
79
20
= 0.488
41
29
= 0.763
pˆ B =
38
pˆ A - pˆ B = - 0.275
pˆ A =
1
1
Vaˆr ( pˆ A - pˆ B ) =  +  pˆ (1 - pˆ ) = 0.01195
nA nB
SE ( pˆ A - pˆ B ) = 0.109
16
Time from 1st symptoms to admission turned out to be an important prognostic factor,
and since the anti-toxin group had a smaller fraction of high risk patients (28/41 = 0.68)
than the control group (30/38 = 0.79) post-stratification was carried out.
 72 Hours
< 72 Hours
A
B
Alive
10
4
Dead
18
26
28
30
A
B
Alive
11
5
Dead
2
3
13
8
Stratum 1: < 72 Hours
Stratum 2: > 72 Hours
ˆ 1A = 0.643
p
ˆ 1B = 0.866
p
ˆ 1 = 0.759
p
ˆ 1A - p
ˆ 1B = - 0.223
p
ˆr ( p
ˆ 1A - p
ˆ 1B ) = 0.01263
Va
ˆ 1A - p
ˆ 1B ) = 0.112
SE ( p
Similarly for Stratum 2, pˆ 2 = 0.238
ˆ 2A - p
ˆ 2B = - 0.221, Vaˆr ( p
ˆ 2A - p
ˆ 2B ) = 0.03662
p
ˆ 2A - p
ˆ 2B ) = 0.191
SE ( p
17
Let (pA - pB)W denote the weighted difference between treatments A and B
Let G = Fraction of patients in Stratum 1 (< 72 hours)
= 58 = 0.734
79
( pˆ A - pˆ B )W = G( pˆ 1A - pˆ 1B ) + (1 - G) ( pˆ 2A - pˆ 2B ) = - 0.223
compared to -0.275 for unweighted difference
Vaˆr ( pˆ A - pˆ B )W = G 2Vaˆr ( pˆ 1A - pˆ 1B ) + (1 - G)2 Vaˆr( pˆ 2A - pˆ 2B ) = 0.00938
SE ( pˆ A - pˆ B )W = .097
A 22% reduction in the variance is achieved with post-stratification
RE = Var (Post-Stratification) = 0.00938 = 0.78
Var (No Stratification)
0.01195
Question:How much would one have gained in precision if stratification was used in the
design? To consider this question for this example force balance
within stratum and assume the pij's do not change.
Meier considered this question and found little gain in precision for this example.
18
Meier also showed that in general the loss of efficiency resulting from a disproportion of
patients given A and B in a particular stratum compared to using a stratified design was
small.
Suppose there are 2n patients in stratum 1. In a stratified design
 1
1 
Var ( p A1 - p B1 ) = 
+
 p1 (1 - p1 )
 n A1 n B1 
and nA1 = nB1 = n because of the stratified design therefore
1 1
2
Var ( p A1 - p B1 ) = p 1 (1 - p 1 )  +  = p 1 (1 - p 1 )  
n n
n
Without stratification and with 2n patients in stratum 1
1 
 1
Var ( p A1 - p B1 ) = 
+
 p 1 (1 - p 1 )
n+ h n - h
RE =
Var (Stratified Design)
is proportional to
Var (Unstratified Design and Post-Stratification)
2
n
2
= n = 1 - h2 / n2
1
1
2n
+
2
n + h n - h n - h2
n = 10
Allocation
(11, 9)
(12, 8)
(14, 6)
(15, 5)
h
1
2
4
5
n = 100
RE
.99
.96
.86
.75
Allocation
(105,95)
(110,90)
(120,80)
(130,70)
h
5
10
20
30
RE
.9975
.99
.96
.91
Conclusion:
In most situations post-stratification does not result in much loss of precision compared
to stratification in the design.
19
Now consider how the following factors influence the variance when the response is
Bernoulli.
1. Prevalence of prognostic factor (G)
2. Importance of prognostic factor (p1. vs. p2.)
Assume the response variable is Bernoulli (success or failure, alive or dead). The
layout of the study in terms of key parameters is as follows:
Treatment
A
B
Total
Stratum 1(S1)
p1A
p1B
p1.
Stratum 2(S2)
p2A
p2B
p2.
pA
pB
p
Total
The hypothesis of interest is pA = pB
pij = probability of success in stratum i for treatment j.
p = total success rate in the population and pA and pB are the overall success rates for
treatments A and B, respectively.
Let G = fraction of patients in stratum 1 in the population
After conducting the study one observes the following:
Stratum 1
Stratum 2
A
B
Success
X1A
X1B
Failure
n1A-X1A
n1B-X1B
n1A
n1B
20
A
B
Success
X2A
X2B
Failure
n2A-X2A
n2B-X2B
n2A
n2B
TOTAL
A
B
Success
X1A + X2A
X2B + X2B
Failure
n1A + n2A
n1B + n2B
-X1A -X2A
-X1B -X2B
nA
nB
Xij = the number of successes on treatment j in stratum i
nij = the number of patients in stratum i given treatment j
N = total number of patients
NA = NB = N/2 = no. of patients given A and B (assume the randomization
is restricted to assure NA = NB)
Consider estimates of Var(pA-pB) for 2 situations:
1. no stratification and
2. stratification on S.
1. No stratification
Var(pA - pB) = p(1 - p) ( 1 + 1 ) = p(1 - p) 4
NA NB
N
Note that p can also be written as a weighted estimate of the pi.
p = Gp1. + (1 - G)p2., therefore;
Var(pA - pB) = [Gp1. + (1 - G)p2.][1 - Gp1. - (1 - G)p2.] 4
N
21
2. Stratification on S
pA =
n1Ap1A + n2Ap2A
--------------------NA
Var(pA) =
=
n1Ap1A(1 - p1A) + n2Ap2A(1 - p2A)
-----------------------------------------NA2
Gp1A(1 - p1A) + (1 - G)p2A(1 - p2A)
-------------------------------------------NA
Var(pA - pB) = 2 Gp1A(1 - p1A) + (1 - G)p2A(1 - p2A)
N
+ Gp1B(1 - p1B) + (1 - G)p2B(1 - p2B)
Substituting p1. for p1A and p1B and p2. for p2A and p2B we have
Var(pA - pB) = 4 Gp1.(1 - p1.) + (1 - G)p2.(1 - p2.)
N
RE =
Gp1.(1 - p1.) + (1 - G)p2(1 - p2.)
---------------------------------------------------[Gp1. + (1 - G)p2.][1 - Gp1. - (1-G)p2.]
Note if p1. = p2., then RE = 1
22
Consider RE for various values of p1., p2. and G
G
p1.
0.50
0.20
0.10
0.50
0.20
0.10
0.50
0.20
0.10
0.50
0.20
0.10
0.50
0.20
0.10
0.6
0.6
0.6
0.6
0.6
0.6
0.8
0.8
0.8
0.1
0.1
0.1
0.1
0.1
0.1
p2.
RE
0.3
0.3
0.3
0.2
0.2
0.2
0.2
0.2
0.2
0.05
0.05
0.05
0.01
0.01
0.01
0.91
0.94
0.96
0.83
0.87
0.92
0.64
0.74
0.83
0.991
0.992
0.996
0.96
0.95
0.96
The reduction in variance achieved with stratification depends on:
1.
G, the distribution of the prognostic factor in the population
2.
The difference in p1 and p2, i.e., the relative strength of the prognostic factor
3.
p, the overall success rate in the population
Example MRFIT - Although not seriously considered, a logical stratification variable
would have been pc, the estimated six year probability of dying from coronary heart
disease in the control group based on Framingham data.
pc =
1
1+exp[-Bo-B1X1-B2X2-B3X3]
where
X1 = serum cholesterol level
X2 = diastolic BP
X3 = cigarettes smoked per day
23
pe = experimental group event rate
Consider 4 strata - Framingham Data
_
Class of pc
No. Percent pc
< .0225
.0225-.0274
.0275-.0324
.0325
TOTAL
122
52
32
69
44.4
18.9
11.6
25.1
275
100.0
_
pe
_ _
p(1-p)
.0094
.0134
.0158
.0213
0.0139
0.0187
0.0222
0.0314
.0274 0.0139
0.0202
.0188
.0248
.0296
.0436
Let A denote the Special Intervention group and B denote the Usual Care group then
using the notation previously developed
 1
1
Vaˆr ( pˆ A - pˆ B ) without stratification =  +  .0202
 n A nB 
4
N
= (0.00202) with n A = n B =
N
2
(in MRFIT N was 12,000 and nA and nB = 6000)
Vaˆr ( pˆ A - pˆ B ) with stratification
4
= --- [0.444(0.0139) + 0.189(0.0187) + 0.116(0.0222) + 0.251(0.0314)]
N
4
= --- (.0199)
N
RE = 0.0199/0.0202 = 0.987
24
In a small study with very important prognostic factors, one-to-one matching of patients
is an alternative to stratification.
E.g., randomized block matched pairs experiment: Comparison of Imipramine and
placebo for the treatment of depression (Fleiss, The Design and Analysis of Clinical
Experiments, p. 121).
-
60 patients were paired to form 30 matched pairs or blocks.
-
matching was based on time of entry to the study (within one month), sex and age.
-
one member of each pair was randomly assigned to receive Imipramine, an
anti-depressant drug; the other patient received placebo.
-
the study was double-blind.
-
primary aim of the blocking is to reduce the variability between patients and
eliminate the chance of an imbalance on these important characters.
Findings for Hamilton rating scale for depression:
Imipranine
No.
Mean
SD
Matched Pair
Difference
Placebo
30
6.3
2.4
30
7.6
2.6
30
-1.27
2.92
SE ( d ) for matched design : 2.92 / 30 = 0.53
SE ( X 1 - X 2 ) ignoring matching = sp
1
1
+ = 2.50 2/30 = 0.64
30 30
RE (blocking:no blocking) = (0.53)2 / (0.64)2 = 0.69
25
Summary Remarks on Stratification
1.
Only moderate increases in power are obtained with stratification when the
response is Bernoulli.
2.
Stratification is more important with small sample sizes since there is a greater
probability of a chance imbalance. In small studies with a very important
prognostic variable, a matched pairs design should be considered.
3.
Stratification may result in more mistakes in the randomization process.
4.
The precision achieved with post-stratification is nearly as great as that achieved
with pre-stratification.
5.
Common methods for adjustment of prognostic factors in comparing treatments are
analysis of covariance for continuous response variables, logistic regression or the
Mantel-Haenszel test for binary response variables and proportional hazards
regression for survival times.
6.
Most investigators generally feel much better if stratified randomization is used;
many investigators are skeptical of post-stratification and "adjusted" results.
Listed below are some quotes from clinical trial texts and selected papers on the
issue of "adjusted" analyses:
1.
Peto R, Pike MC, Armitage P, et al. Design and analysis of randomized clinical
trials requiring prolonged observation of each patient. Br. J. Cancer, vol 35, 1-39,
1977.
In section 22 of this paper the authors write:
"In clinical trial analysis, we are interested in whether apparent differences between
treatments might be due merely to random allocation of more of the goodprognosis patients to one treatment than to the other treatment. Obviously,
anything we know about the major determinants of prognosis can help us to
answer this question correctly, and help us to see whether, given the different
numbers on each treatment in various prognostic categories, there is any residual
relationship of treatment with survival."
These authors go on to describe the arithmetic required to adjust for chance
baseline differences in prognostic factors.
26
2.
Armitage P, Gehan E. Statistical methods for the identification and use of
prognostic factors. Int. J. Cancer, vol 13, 16-36, 1974.
On page 17 the authors write:
"There are three main reasons for allowing prognostic factors in the analysis. First,
even when patients are randomized, there may be certain differences between
treatment groups in the distribution of prognostic variables; if the biases caused by
these differences can be corrected, this should be done. Second, if the prognostic
variables have a high correlation with the outcome, much of the random variation in
outcome can be explained by these variables; the residual random variation is
thereby reduced and comparisons between treatments are correspondingly more
precise. Third, interactions between treatments and prognostic variables may be
detected, i.e., any tendency for the relative merits of the treatments to differ
according to the patient's prognosis."
3.
Pocock SJ, Clinical Trials. A Practical Approach, John Wiley & Sons Ltd., 1983,
Chapter 14.
On page 216, Pocock writes:
"If one has comparable treatment groups, as discussed earlier in this section, then
any adjustment for prognostic factors will scarcely affect the magnitude of the
treatment difference but may improve the precision of one's estimate, e.g., by
narrowing the confidence interval. However, if treatment groups differ with respect
to some prognostic factors then both the magnitude and significance of treatment
differences may be altered (i.e. they are determined more correctly) by adjustment
for prognostic factors."
4.
Friedman LM, Furberg CD, DeMets DL. Fundamentals of clinical trials, John
Wright, 1981, Chapter 13.
On page 165 these authors write:
"The goal in a clinical trial is to have groups of subjects that are comparable except
for the intervention being studied. Even if randomization is used, all prognostic
factors may not be perfectly balanced, especially in smaller studies. Even if no
prognostic factors are significantly unbalanced in the statistical sense, an
investigator may, nevertheless, observe that one or more factors favor one of the
groups. In either case, covariate-adjustment can be used in the analysis to
minimize the effect of the differences."
27
On page 167 they write:
"Analysis, strictly speaking, should always be stratified if stratification was used in
the randomization. In such cases, the adjusted analysis should include not only
those covariates found to be different between the groups, but also those stratified
during the randomization."
5.
Meinert CL. Clinical trials. Design, conduct, and analysis. Oxford University
Press, 1986, Chapter 18.
On page 193 Meinert writes:
"To be valid, the evaluation of treatment effects must be performed on treatment
groups that are comparable with regard to baseline characteristics. Usually, the
comparability provided by randomization is adequate. However, randomization
does not guarantee comparability. As noted in Chapter 10, stratification can be
used to assure comparability for a few variables, but the distribution with regard to
others must be left to chance. As a result, there can be minor, and sometimes
even major, differences in the baseline composition of the study groups. The
impact of such differences on treatment comparisons should be removed using
procedures such as those outlined below."
Meinert goes on to describe subgroup and regression analysis.
6.
Byar DP, Chapter 24, Identification of prognostic factors. In Cancer clinical trials
methods and practice. Edited by Buyse ME, Staquet MJ, Sylvester RJ, Oxford
University Press, 1984.
On page 424 Byar writes:
"One of the most important reasons for studying prognostic variables is that by
definition they affect the outcome variable. If two treatment groups are being
compared which are not nearly identical with respect to important prognostic
variables, then apparent differences in the results of treatment may result from our
failure to compare `like with like', that is, they may be due to imbalances in the
prognostic factors. In deciding whether or not a prognostic variable is balanced
across treatment groups, it is common practice to form tables of treatment group
versus categories of the variable and test these for independence. Although this
procedure may be useful in detecting gross imbalances, it is an improper use of
statistical significance testing because large imbalances in unimportant variables
will not matter, but even small imbalances in important ones may seriously bias
treatment comparisons."
Kernan et al provide a review in 1999 of research on stratification in clinical trials.
28
Download