PubH 7420 Clinical Trials: Supplemental Notes for Lectures 5 and 6 1. Friedman, Furberg, and DeMets. Fundamentals of Clinical Trials, Chapter 5 and Chapter 16, pages 297-304. Supplemental Reading References 1. Grizzle JE. A note on stratifying versus complete random assignment in clinical trials. Cont. Clin. Trials, 3:365-368, 1982. 2. Meier P. Stratification in the design of a clinical trial. Cont. Clin. Trials, 1:355-361, 1981. 3. Scott NW, McPherson GC, Ramsay CR, Campbell MK. The method of minimization for allocation in clinical trials: a review. Cont. Clin. Trials, 23:662674, 2002. 4. Hallstrom A, Davis K. Imbalance in treatment assignments in stratified blocked randomization. Cont. Clin, Trials, 9:375-382,1988. 5. Kernan WN, Viscoli CM, Makuch RW et al. Stratified randomization for clinical trials. J Clin Epidemiol 52:19-26, 1999. 6. Pocock SJ: Clinical Trials. A practical approach. John Wiley and Sons, Ltd. Chapter 5 and Chapter 13, pages 216-220. 6 Clinical Trials: Design, Conduct and Analysis, Chapter 10. 1 Stratification (def.) - A procedure whereby factors which are known to be associated with the response of interest (prognostic variables) are taken into account in the randomization scheme, i.e., in the design of the study. Stratification aims to help ensure that prognostic variables have the same distribution in all treatment groups. Stratification is used to refer to restrictions on the randomization other than time (blocking). In other words, blocking is a restriction placed on the randomization to ensure the desired allocation ratio while stratification is a restriction to ensure comparability of the treatment groups with respect to the stratifying variables. As noted previously, in multi-clinic trials stratification on clinic is usually carried out since the types of patients can vary widely from clinic to clinic as can use of concomitant treatments and compliance to study treatment, i.e., it is not surprising to see a marked clinic effect on the outcomes of interest. In the analysis, sites may have to be grouped, e.g., by region, by size, or by type (HMO, university), otherwise, the sparse strata could result in a loss of power. Stratum (def.) - a large group of experimental units more homogenous than a randomly assembled group of experimental units by virtue of classification on some variable or set of variables at baseline. Advantages: - May prevent bias (an unfair treatment comparison) arising as a result of a chance imbalance between treatment groups on an important baseline prognostic factor. - Will increase the precision (reduce the variance) of the treatment comparisons made. - Will facilitate within stratum (subgroup) analysis since the treatments will be balanced. - If important prognostic factors are balanced then the study will be subject to less criticism. Disadvantages: - Results in a randomization scheme which is more difficult to implement and therefore more prone to error. For example, in a multicenter trial if one is stratifying on three baseline variables and each variable has two possible outcomes, eight schedules would have to be prepared in advance for each clinic. It is important to differentiate stratified randomization (also referred to as pre-stratification or a stratified design) from post-stratification. Whether or not one uses stratified randomization, post-stratification can be used in the analysis. 2 Post-Stratification (def.) - the classification of experimental units into strata after they have been randomized for the purpose of data analysis. Usually strata are defined by pre-randomization (baseline) measurements. Stratified vs. Unstratified Design - Considerations 1. Size of the study; gain in statistical efficiency is minimal for study > 50 patients. 2. Stratifying variables should be easily observed or measured prior to randomization; variables used for stratification should be relatively free of measurement error. 3. Risk of errors in carrying out the mechanics of randomization is greater with stratification. Stratifying variables that involve complicated computations or interpretations should be avoided. 4. The gain in statistical efficiency is small unless the stratifying variables are strongly related to the outcome variables. 5. The desired allocation ratios may not be achieved if several strata are used with only a small number of patients per stratum. 6. It is unreasonable to expect to control for all prognostic variables in the design. Post-stratification in the analysis using variables that were not considered in the design is usually necessary. Most clinical trials that employ stratification in the design do so to ensure balance with respect to important prognostic factors. Also, for most trials with stratified designs, sample size is estimated based on the overall number to be enrolled (all strata pooled), and the analysis stipulates that treatment differences will be pooled across strata. Very few trials are designed to provide high power of detecting treatment differences within each stratum. 3 Usual Implementation Block randomization within stratum, i.e., a separate randomization schedule is prepared for each stratum. Note that it does not really make any sense to stratify unless the treatments are assigned within stratum by block randomization or an equivalent scheme since one of the aims is to avoid chance imbalances. For this same reason, usually the block size chosen is relatively small so that balance is achieved in small strata. Example: A multi-clinic trial, (2 clinics) with 2 treatments (A and B) with 20 patients expected in each clinic. Would like equal allocation to treatments A and B and also to ensure that a similar number of men and women receive each treatment, i.e., stratify by clinic and gender - 4 strata. Schedules for Clinic #1 Accession No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Schedule for Men B A A B A B B A B A B A B A B A A B B A Schedule for Women B A B A A B A B B A B A B A B A A B A B * Generated using randomly mixed blocks of size 2 and 4 1-5 = > Block size = 2; 1-5 = > AB, 6-0 = > BA 6-0 = > Block size = 4; 1-6 = > use the appropriate permutation as in previous example, don't use 7-0. 4 Schedules for Clinic #2 Accession No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Schedule for Men B A B A A A B B B A A B A B B A A A B B Schedule for Women B A B A B A A B A B A B A B A A B B B A This method of stratification can be self-defeating in a small study with several stratum, i.e., over-stratification. More generally, Therneau (Cont Clin Trials, 1993;14:98-108) has shown you can have problems when the number of strata (distinct combinations of factor levels) becomes large relative to the sample size. A couple of examples: Example 1: (Lancet 2000, 356:1521-1522, Letter concerning SYMPHONY study). Stratification was carried out by site (670 sites) and indication for treatment (myocardial infarction or unstable angina) – 1,340 total strata, and a block size of 6 was used. Even though 9,233 patients were randomized, imbalances arose across the indication grouping and this resulted in queries about their report. The imbalances likely arose because not all of the blocks were filled, i.e., many sites enrolled fewer than 6 patients with each indication. Example 2: A study of testicular cancer, 2 treatments (A and B), and 3 stratification factors: 1) Stage (I or II), 2) Histology (teratocarcinoma, embryonal carcinoma, elements of choriocarcinoma), and 3) age (<15, 15) -- .2 x 3 x 2 = 12 schedules 5 Reference: Tagnan HJ, Staquet MJ, editors. Controversies in Cancer, Design of Trials and Treatment. Blocks of size 6 were used and the following 12 schedules were prepared: Stage I Stage II Histology <15 >15 <15 >15 Teratocarnoma A* A* B A B B A* A* A* B B B A* A* A* B B B B* A* A* B B A Embryonalcarcinoma A* A* B B B A B B A A A B B* B* A A B A A* B* B* B* A A Choriocarcinoma B* B A A B A B A A B B A A* B* B* B* A A B* B* A A A B The asterisk corresponds to the 26 patients randomized. The number of patients in each stratum is given below: A B Histology: Teratocarcinoma 10 1 Embryonal carcinoma 3 5 Choriocarcinoma 1 6 Stage: I II 7 7 1 11 Age: < 15 8 6 >15 6 6 Total 14 12 Note that the distributions of histology and stage are very unbalanced. Could solve this problem by using smaller block sizes, fewer stratifying variables or an adaptive stratification scheme. 6 Minimization An adaptive stratification procedure developed to cope with the problem of a small study and several strata. This approach was described in papers by Taves (Clin Pharmacol Ther, 1974) and Pocock and Simon (Biometrics, 1975). Instead of trying to achieve balance with respect to treatment assignments for all possible combinations of the prognostic variables (in our example all 12 strata), the minimization procedure restricts its aim to equalizing treatment numbers at the different levels of each variable taken separately. This is accomplished by choosing the treatment for each new patient entering the study in such a way that the "treatment imbalance" after admitting that patient is as small as possible. Let xik = the number of patients already assigned treatment k k= 1, 2 or in our case A, B for those patients who have the same level of prognostic factor i i= 1, 2, ..., f factors corresponding to characteristics of the new patient Let xitk = xik if t k = xik + 1 if t = k xitk represents the change in balance of allocation if the new patient is assigned to treatment t B(t) = function of xitk's which measures the "lack of balance" over all prognostic factors if the next patient is assigned treatment t In his book, Pocock considers the simple case of B(t) = xik , i=1,…f A similar rule is: B(t) = Range (xit1, xit2), i=1,…f Where the range is the absolute difference between the largest and smallest values of xit1 and xit2. Our rule will be to use the treatment with smallest B(t). 7 Example (Pocock, page 85): Factor Number on each treatment A B Level Next Patient Performance status Ambulatory Non-ambulatory 30 10 31 9 x Age < 50 > 50 18 22 17 23 x Disease-free interval < 2 years > 2 years 31 9 32 8 Visceral Osseous Soft tissue 19 8 13 21 7 12 Dominant metastatic lesion x x 2 x 2 x 2 x 3 = 24 strata; x denotes the characteristics of the next patient to be randomized. 1. Determine B(1) i) Factor 1, Level 1 k 1 2 x1k 30 31 x11k Range (x111, x112) 31 31-31 = 0 31 k 1 2 x2k 18 17 x12k Range (x121, x122) 19 19-17 = 2 17 k 1 2 x3k 9 8 x13k Range (x131, x132) 10 10-8 = 2 8 k 1 2 x4k 19 21 x14k Range (x141, x142) 20 20-21 = 1 21 ii) Factor 2, Level 1 iii) Factor 3, Level 2 iv) Factor 4, Level 1 B(1) = 0 + 2 + 2 + 1 = 5 8 2. Determine B(2) i) Factor 1, Level 1 x1k 30 31 x21k Range (x211, x212) 30 30-32 = 2 32 x2k 1 2 18 17 x22k Range (x221, x222) 18 18-18 = 0 18 k 1 2 x3k 9 8 x23k Range (x231, x232) 9 9-9 = 0 9 k 1 2 x4k 19 21 x24k Range (x241, x242) 19 19-22 = 3 22 k 1 2 ii) Factor 2, Level 1 iii) Factor 3, Level 2 iv) Factor 4, Level 1 B(2) = 2 + 0 + 0 + 3 = 5 Since B(1) = B(2) toss a coin for the next patient Generalizations of this procedure 1. B (t) = wi Range (xit1, xit2) where wi is the relative importance of factor i to the other factors 2. Assign the patient with probability > 1/2 to the "preferred" treatment. The major disadvantage of this method is that it is more difficult to implement. For additional reading see Taves DR: Minimization: a new method of assigning patients to treatment and control groups. Clin Pharmacol Ther, 15:443-453, 1974 and Pocock SJ, Simon R: Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial. Biometrics, 31:103-115, 1975. 9 Implementation of Minimization Requirements: 1. Need an easy way to update the marginal totals for each treatment after each randomization so that the data is available for the next patient. - Pocock proposes index cards - If central randomization is used, a computer could be used - If each center has a microcomputer, the calculation of even complex B(t)'s can be accomplished quickly 2. Having determined B(t) need a procedure based on pre-chosen value of P P = Prob (assign "preferred" treatment) Examples: 1. P = 1 if B(1) = B(2) P = 1/2 if B(1) = B(2) i.e., simple randomization 2. P = 2/3 if B(1) ¹ B(2) P = 1/2 if B(1) = B(2) 10 Stratification and Variance Reduction How much of an increase in precision is expected with stratification? How much of a price does one pay for trusting randomization to achieve reasonable balance? To consider this question consider the relative efficiency (RE) of two designs: RE = Var (treatment contrast with stratification) Var (treatment contrast with no stratification, but post-stratified analysis) Consider the comparison of 2 treatments, A and B, and a single dichotomous prognostic factor, S. Also assume a balanced design, i.e., an equal number of patients given A and B. Grizzle (Cont. Clinical Trials, 1982) considered the question of how much loss in efficiency occurred by trusting unstratified randomization to achieve reasonable balance. He considered the situation of a continuous response variable with equal variances at each level of the prognostic factor and showed that the relative efficiency (RE) as defined above could be written as: n A g + nB h RE = n A g(1 - g) + n B h(1 - h) -1 n A g + nB h 1 1 n A + nB nA = total number randomly assigned to A. nB = total number randomly assigned to B. g = fraction of those given A at level 1 of prognostic factor. h = fraction of those given B at level 1 of prognostic factor. 11 Treatment Stratum A B 1 nAg nBh 2 nA(1-g) nB(1-h) nA nB _ _ _ _ If the average responses in each cell are denoted Y1A, Y1B, Y2A and Y2B, then 1 1 2 Var ( Y 1A - Y 1B ) = + n A g nB h and: 2 1 1 Var ( Y 2A - Y 2B ) = + n A (1 - g) n B (1 - h) Grizzle's RE follows from noting that the variance of the overall average difference between A and B is a weighted average of the above estimates. The weights are: 1 1 nA g + = 1 n A gn B h n A g + nB h nB h and: n A (1 - g) n B (1 - h) n A (1 - g) + n B (1 - h) For the stratified design, g=h and for that situation the weights are maximized. Note that RE = 1 when g = h. When nA = nB, this simplifies to: -1 (g + h) g +h 1 RE = 1 2 g(1 - g) + h(1 - h) 12 Suppose g = 2h, i.e., the prevalence of the prognostic factor is twice as high for treatment A as for treatment B. g, h 0.10, 0.05 0.25, 0.125 0.50, 0.25 0.75, 0.375 RE 0.99 0.97 0.93 0.86 Suppose g = 1-h. Then: RE = 4g(1-g). g, h ----------0.6, 0.4 0.7, 0.3 0.8, 0.2 0.9, 0.1 RE -----0.96 0.84 0.64 0.31 Grizzle concludes that "for sample sizes of 50 or more, and a prevalence of the prognostic factor of 0.5, large deviations are unlikely, which implies that randomization can be trusted to prevent large losses in efficiency in this case." One can calculate the probability of obtaining a certain imbalance before the study begins. This can be used to decide whether to stratify the randomization. N a N b based on hypergeometric distribution t t1 - t p(t) = with 2 strata, 2 groups N a + N b t1 p(t) is the probability of randomizing t patients to group A when there are t 1 patients in stratum 1. For a certain imbalance one can sum over all p(t) for t's that give that imbalance or worse. e.g., Na = 100, Nb = 100, t1 = 40, g = 0.16, h = 0.24 100 100 t 40 t p(t) = 200 40 13 Stratum 1 Stratum 2 Total Group A 16 84 100 Group B 24 76 100 Total 40 160 200 Want the probability of obtaining the imbalance given by g = 0.16, h = 0.24 or worse. Σp(t) = 0.216 t < 16 t > 24 This probability can be obtained using the SAS function PROBHYPR or by creating a data set with cell counts as above and then obtaining the 2-tailed probability for the Fisher exact test. Probability of Given Imbalance g h 50 N 100 .52 .55 .60 .70 .48 .45 .40 .30 1 .57 .25 .01 .84 .42 .07 - 14 1000 .23 .002 - Fleiss (The Design and Analysis of Experiments, John Wiley & Sons) argues that only when the relative efficiency of the unstratified design to the stratified design exceeds 130% is the added effort in setting up the design worth it. The relative efficiency of an unstratified to a stratified design can be determined by computing the following variance estimates: s 2 ( n 2 s = 2 s NS = j let s2 = pooled variance across treatment and stratum (S) ji - 1)s 2 ji i n.. - 2S 2 s 1 [(n.. - 2S) s 2 + n ji ( x ji - x..) ] 2 n.. - 2 i j let s2NS = pooled variance across treatments, but no stratification (NS) Note that the RE2(sNS/s2) depends on the sum of squared deviations of the stratum-specific means about the overall mean. As the variability among stratum-specific means increases, more consideration should be given to stratification in the design and/or in the analysis. 15 Now consider a Bernoulli response variable and the impact of post-stratification on RE. Example of Post-Stratification: Brown et al, Lancet, 227-230, 1960, Clinical Trial of Tetanus Anti-toxin in Treatment of Tetanus; see also Meier (Controlled Clinical Trials, 1981). AntiToxin (A) No AntiToxin (B) Alive 21 9 30 Dead 20 29 49 41 38 79 pˆ = overall death rate = 49 = 0.620 79 20 = 0.488 41 29 = 0.763 pˆ B = 38 pˆ A - pˆ B = - 0.275 pˆ A = 1 1 Vaˆr ( pˆ A - pˆ B ) = + pˆ (1 - pˆ ) = 0.01195 nA nB SE ( pˆ A - pˆ B ) = 0.109 16 Time from 1st symptoms to admission turned out to be an important prognostic factor, and since the anti-toxin group had a smaller fraction of high risk patients (28/41 = 0.68) than the control group (30/38 = 0.79) post-stratification was carried out. 72 Hours < 72 Hours A B Alive 10 4 Dead 18 26 28 30 A B Alive 11 5 Dead 2 3 13 8 Stratum 1: < 72 Hours Stratum 2: > 72 Hours ˆ 1A = 0.643 p ˆ 1B = 0.866 p ˆ 1 = 0.759 p ˆ 1A - p ˆ 1B = - 0.223 p ˆr ( p ˆ 1A - p ˆ 1B ) = 0.01263 Va ˆ 1A - p ˆ 1B ) = 0.112 SE ( p Similarly for Stratum 2, pˆ 2 = 0.238 ˆ 2A - p ˆ 2B = - 0.221, Vaˆr ( p ˆ 2A - p ˆ 2B ) = 0.03662 p ˆ 2A - p ˆ 2B ) = 0.191 SE ( p 17 Let (pA - pB)W denote the weighted difference between treatments A and B Let G = Fraction of patients in Stratum 1 (< 72 hours) = 58 = 0.734 79 ( pˆ A - pˆ B )W = G( pˆ 1A - pˆ 1B ) + (1 - G) ( pˆ 2A - pˆ 2B ) = - 0.223 compared to -0.275 for unweighted difference Vaˆr ( pˆ A - pˆ B )W = G 2Vaˆr ( pˆ 1A - pˆ 1B ) + (1 - G)2 Vaˆr( pˆ 2A - pˆ 2B ) = 0.00938 SE ( pˆ A - pˆ B )W = .097 A 22% reduction in the variance is achieved with post-stratification RE = Var (Post-Stratification) = 0.00938 = 0.78 Var (No Stratification) 0.01195 Question:How much would one have gained in precision if stratification was used in the design? To consider this question for this example force balance within stratum and assume the pij's do not change. Meier considered this question and found little gain in precision for this example. 18 Meier also showed that in general the loss of efficiency resulting from a disproportion of patients given A and B in a particular stratum compared to using a stratified design was small. Suppose there are 2n patients in stratum 1. In a stratified design 1 1 Var ( p A1 - p B1 ) = + p1 (1 - p1 ) n A1 n B1 and nA1 = nB1 = n because of the stratified design therefore 1 1 2 Var ( p A1 - p B1 ) = p 1 (1 - p 1 ) + = p 1 (1 - p 1 ) n n n Without stratification and with 2n patients in stratum 1 1 1 Var ( p A1 - p B1 ) = + p 1 (1 - p 1 ) n+ h n - h RE = Var (Stratified Design) is proportional to Var (Unstratified Design and Post-Stratification) 2 n 2 = n = 1 - h2 / n2 1 1 2n + 2 n + h n - h n - h2 n = 10 Allocation (11, 9) (12, 8) (14, 6) (15, 5) h 1 2 4 5 n = 100 RE .99 .96 .86 .75 Allocation (105,95) (110,90) (120,80) (130,70) h 5 10 20 30 RE .9975 .99 .96 .91 Conclusion: In most situations post-stratification does not result in much loss of precision compared to stratification in the design. 19 Now consider how the following factors influence the variance when the response is Bernoulli. 1. Prevalence of prognostic factor (G) 2. Importance of prognostic factor (p1. vs. p2.) Assume the response variable is Bernoulli (success or failure, alive or dead). The layout of the study in terms of key parameters is as follows: Treatment A B Total Stratum 1(S1) p1A p1B p1. Stratum 2(S2) p2A p2B p2. pA pB p Total The hypothesis of interest is pA = pB pij = probability of success in stratum i for treatment j. p = total success rate in the population and pA and pB are the overall success rates for treatments A and B, respectively. Let G = fraction of patients in stratum 1 in the population After conducting the study one observes the following: Stratum 1 Stratum 2 A B Success X1A X1B Failure n1A-X1A n1B-X1B n1A n1B 20 A B Success X2A X2B Failure n2A-X2A n2B-X2B n2A n2B TOTAL A B Success X1A + X2A X2B + X2B Failure n1A + n2A n1B + n2B -X1A -X2A -X1B -X2B nA nB Xij = the number of successes on treatment j in stratum i nij = the number of patients in stratum i given treatment j N = total number of patients NA = NB = N/2 = no. of patients given A and B (assume the randomization is restricted to assure NA = NB) Consider estimates of Var(pA-pB) for 2 situations: 1. no stratification and 2. stratification on S. 1. No stratification Var(pA - pB) = p(1 - p) ( 1 + 1 ) = p(1 - p) 4 NA NB N Note that p can also be written as a weighted estimate of the pi. p = Gp1. + (1 - G)p2., therefore; Var(pA - pB) = [Gp1. + (1 - G)p2.][1 - Gp1. - (1 - G)p2.] 4 N 21 2. Stratification on S pA = n1Ap1A + n2Ap2A --------------------NA Var(pA) = = n1Ap1A(1 - p1A) + n2Ap2A(1 - p2A) -----------------------------------------NA2 Gp1A(1 - p1A) + (1 - G)p2A(1 - p2A) -------------------------------------------NA Var(pA - pB) = 2 Gp1A(1 - p1A) + (1 - G)p2A(1 - p2A) N + Gp1B(1 - p1B) + (1 - G)p2B(1 - p2B) Substituting p1. for p1A and p1B and p2. for p2A and p2B we have Var(pA - pB) = 4 Gp1.(1 - p1.) + (1 - G)p2.(1 - p2.) N RE = Gp1.(1 - p1.) + (1 - G)p2(1 - p2.) ---------------------------------------------------[Gp1. + (1 - G)p2.][1 - Gp1. - (1-G)p2.] Note if p1. = p2., then RE = 1 22 Consider RE for various values of p1., p2. and G G p1. 0.50 0.20 0.10 0.50 0.20 0.10 0.50 0.20 0.10 0.50 0.20 0.10 0.50 0.20 0.10 0.6 0.6 0.6 0.6 0.6 0.6 0.8 0.8 0.8 0.1 0.1 0.1 0.1 0.1 0.1 p2. RE 0.3 0.3 0.3 0.2 0.2 0.2 0.2 0.2 0.2 0.05 0.05 0.05 0.01 0.01 0.01 0.91 0.94 0.96 0.83 0.87 0.92 0.64 0.74 0.83 0.991 0.992 0.996 0.96 0.95 0.96 The reduction in variance achieved with stratification depends on: 1. G, the distribution of the prognostic factor in the population 2. The difference in p1 and p2, i.e., the relative strength of the prognostic factor 3. p, the overall success rate in the population Example MRFIT - Although not seriously considered, a logical stratification variable would have been pc, the estimated six year probability of dying from coronary heart disease in the control group based on Framingham data. pc = 1 1+exp[-Bo-B1X1-B2X2-B3X3] where X1 = serum cholesterol level X2 = diastolic BP X3 = cigarettes smoked per day 23 pe = experimental group event rate Consider 4 strata - Framingham Data _ Class of pc No. Percent pc < .0225 .0225-.0274 .0275-.0324 .0325 TOTAL 122 52 32 69 44.4 18.9 11.6 25.1 275 100.0 _ pe _ _ p(1-p) .0094 .0134 .0158 .0213 0.0139 0.0187 0.0222 0.0314 .0274 0.0139 0.0202 .0188 .0248 .0296 .0436 Let A denote the Special Intervention group and B denote the Usual Care group then using the notation previously developed 1 1 Vaˆr ( pˆ A - pˆ B ) without stratification = + .0202 n A nB 4 N = (0.00202) with n A = n B = N 2 (in MRFIT N was 12,000 and nA and nB = 6000) Vaˆr ( pˆ A - pˆ B ) with stratification 4 = --- [0.444(0.0139) + 0.189(0.0187) + 0.116(0.0222) + 0.251(0.0314)] N 4 = --- (.0199) N RE = 0.0199/0.0202 = 0.987 24 In a small study with very important prognostic factors, one-to-one matching of patients is an alternative to stratification. E.g., randomized block matched pairs experiment: Comparison of Imipramine and placebo for the treatment of depression (Fleiss, The Design and Analysis of Clinical Experiments, p. 121). - 60 patients were paired to form 30 matched pairs or blocks. - matching was based on time of entry to the study (within one month), sex and age. - one member of each pair was randomly assigned to receive Imipramine, an anti-depressant drug; the other patient received placebo. - the study was double-blind. - primary aim of the blocking is to reduce the variability between patients and eliminate the chance of an imbalance on these important characters. Findings for Hamilton rating scale for depression: Imipranine No. Mean SD Matched Pair Difference Placebo 30 6.3 2.4 30 7.6 2.6 30 -1.27 2.92 SE ( d ) for matched design : 2.92 / 30 = 0.53 SE ( X 1 - X 2 ) ignoring matching = sp 1 1 + = 2.50 2/30 = 0.64 30 30 RE (blocking:no blocking) = (0.53)2 / (0.64)2 = 0.69 25 Summary Remarks on Stratification 1. Only moderate increases in power are obtained with stratification when the response is Bernoulli. 2. Stratification is more important with small sample sizes since there is a greater probability of a chance imbalance. In small studies with a very important prognostic variable, a matched pairs design should be considered. 3. Stratification may result in more mistakes in the randomization process. 4. The precision achieved with post-stratification is nearly as great as that achieved with pre-stratification. 5. Common methods for adjustment of prognostic factors in comparing treatments are analysis of covariance for continuous response variables, logistic regression or the Mantel-Haenszel test for binary response variables and proportional hazards regression for survival times. 6. Most investigators generally feel much better if stratified randomization is used; many investigators are skeptical of post-stratification and "adjusted" results. Listed below are some quotes from clinical trial texts and selected papers on the issue of "adjusted" analyses: 1. Peto R, Pike MC, Armitage P, et al. Design and analysis of randomized clinical trials requiring prolonged observation of each patient. Br. J. Cancer, vol 35, 1-39, 1977. In section 22 of this paper the authors write: "In clinical trial analysis, we are interested in whether apparent differences between treatments might be due merely to random allocation of more of the goodprognosis patients to one treatment than to the other treatment. Obviously, anything we know about the major determinants of prognosis can help us to answer this question correctly, and help us to see whether, given the different numbers on each treatment in various prognostic categories, there is any residual relationship of treatment with survival." These authors go on to describe the arithmetic required to adjust for chance baseline differences in prognostic factors. 26 2. Armitage P, Gehan E. Statistical methods for the identification and use of prognostic factors. Int. J. Cancer, vol 13, 16-36, 1974. On page 17 the authors write: "There are three main reasons for allowing prognostic factors in the analysis. First, even when patients are randomized, there may be certain differences between treatment groups in the distribution of prognostic variables; if the biases caused by these differences can be corrected, this should be done. Second, if the prognostic variables have a high correlation with the outcome, much of the random variation in outcome can be explained by these variables; the residual random variation is thereby reduced and comparisons between treatments are correspondingly more precise. Third, interactions between treatments and prognostic variables may be detected, i.e., any tendency for the relative merits of the treatments to differ according to the patient's prognosis." 3. Pocock SJ, Clinical Trials. A Practical Approach, John Wiley & Sons Ltd., 1983, Chapter 14. On page 216, Pocock writes: "If one has comparable treatment groups, as discussed earlier in this section, then any adjustment for prognostic factors will scarcely affect the magnitude of the treatment difference but may improve the precision of one's estimate, e.g., by narrowing the confidence interval. However, if treatment groups differ with respect to some prognostic factors then both the magnitude and significance of treatment differences may be altered (i.e. they are determined more correctly) by adjustment for prognostic factors." 4. Friedman LM, Furberg CD, DeMets DL. Fundamentals of clinical trials, John Wright, 1981, Chapter 13. On page 165 these authors write: "The goal in a clinical trial is to have groups of subjects that are comparable except for the intervention being studied. Even if randomization is used, all prognostic factors may not be perfectly balanced, especially in smaller studies. Even if no prognostic factors are significantly unbalanced in the statistical sense, an investigator may, nevertheless, observe that one or more factors favor one of the groups. In either case, covariate-adjustment can be used in the analysis to minimize the effect of the differences." 27 On page 167 they write: "Analysis, strictly speaking, should always be stratified if stratification was used in the randomization. In such cases, the adjusted analysis should include not only those covariates found to be different between the groups, but also those stratified during the randomization." 5. Meinert CL. Clinical trials. Design, conduct, and analysis. Oxford University Press, 1986, Chapter 18. On page 193 Meinert writes: "To be valid, the evaluation of treatment effects must be performed on treatment groups that are comparable with regard to baseline characteristics. Usually, the comparability provided by randomization is adequate. However, randomization does not guarantee comparability. As noted in Chapter 10, stratification can be used to assure comparability for a few variables, but the distribution with regard to others must be left to chance. As a result, there can be minor, and sometimes even major, differences in the baseline composition of the study groups. The impact of such differences on treatment comparisons should be removed using procedures such as those outlined below." Meinert goes on to describe subgroup and regression analysis. 6. Byar DP, Chapter 24, Identification of prognostic factors. In Cancer clinical trials methods and practice. Edited by Buyse ME, Staquet MJ, Sylvester RJ, Oxford University Press, 1984. On page 424 Byar writes: "One of the most important reasons for studying prognostic variables is that by definition they affect the outcome variable. If two treatment groups are being compared which are not nearly identical with respect to important prognostic variables, then apparent differences in the results of treatment may result from our failure to compare `like with like', that is, they may be due to imbalances in the prognostic factors. In deciding whether or not a prognostic variable is balanced across treatment groups, it is common practice to form tables of treatment group versus categories of the variable and test these for independence. Although this procedure may be useful in detecting gross imbalances, it is an improper use of statistical significance testing because large imbalances in unimportant variables will not matter, but even small imbalances in important ones may seriously bias treatment comparisons." Kernan et al provide a review in 1999 of research on stratification in clinical trials. 28