PubH 7420 Clinical Trials: Supplemental Notes for Lectures 13 and 14 1. Freidman, Furberg, and DeMets, Fundamentals of Clinical Trials. Chapters 4, 9 and 10. Supplemental Reading References 1. Pocock SJ, Clinical Trials: A Practical Approach. John Wiley and Sons, Ltd. Chapter 3. 2. Clinical Trials. Design, Conduct and Analysis. Chapters 5 and 16. 3. Yusef S, Held P, Tea K et al. Selection of patients for randomized controlled trials: Implications of wide or narrow eligibility criteria. Stat Med 9:73-86, 1990. 4, Rothwells P. External validity of randomized controlled trials: “To whom do the results of this trial apply” Lancet 365:82-93, 2005. 5. Pablos-Mendez A, Barr RG, Shea S. Run-in periods in randomized trials. Implications for the application of results in clinical practice. JAMA 279:222225,1998. 6. Temple RJ. Special study designs: early escape, enrichment, studies in nonresponders. Commun Statist Theory Meth, 23:499-531, 1994. 7. Sargent DJ, Conley BA, Allegra C Collette L. Clinical trial designs for predictive marker validation in cancer treatment trials. J Clin Oncol 23: 2020-2027, 2005. 8. Leber PD, Davis CS. Threats to the validity of clinical trials employing enrichment strategies for sample selection Cont Clin Trials, 19:178-187, 1998. 9. Davis CE et al. A single cholesterol measurement underestimates the risk of coronary heart disease. JAMA 264:3044-3046, 1991. 10. Prospective Studies Collaboration. Age-specific relevance of usual blood pressure to vascular mortality: a meta-analysis of individual data for one million adults in 61 prospective studies. Lancet 360:1903-1913, 2002. 11. Gardner MJ, Heady JA. Some effects of within-person variability in epidemiologic studies. J Chron Dis, 26, 781-795, 1973. 12. Ederer F. Serum cholesterol changes: effects of diet and regression toward the mean. J. Chron Dis 25:277-289, 1972. 1 13. Davis CE. The effect of regression to the mean in epidemiologic and clinical studies. Amer J Epid 104:493-498, 1976. 14. Cutter GR. Some examples for teaching regression toward the mean from a sampling viewpoint. Amer Stat 30:194-197, 1976. 15. Yudkin PL, Stratton IM. How to deal with regression to the mean in intervention studies. Lancet 347:241-243, 1996. 16. Clarke R, Shipley M, Lewington S., et al. Under-estimation of risk associations due to regression dilution in long-term follow-up of prospective studies. Amer J Epid 150:341-353, 1999. 2 Eligibility Criteria for Clinical Trials Prior to randomization in a clinical trial patients must meet pre-specified eligibility criteria. These criteria, which are stated in advance of the trial, are frequently referred to as inclusion and exclusion criteria. An example of an inclusion criterion would be the signing of the consent form. An example of an exclusion criterion would be a contraindication to one of the treatments being studied. The study population (those randomized), as defined by the eligibility criteria, can be thought of as a subset of the larger target population to which we wish to generalize the findings of our study. In many trials eligibility criteria are specified so as to identify a group of "homogeneous" patients. This allows the question to be stated more precisely and may permit the question to be addressed with a smaller sample size because variability among subjects is reduced. The disadvantage of this approach is that the results of the study may not be as widely generalizable and some patients may eventually be given treatments for which little is known from controlled studies, e.g., lipid lowering drugs in women and the elderly, antiretroviral drugs in women and non-whites or in persons with very early disease. If a heterogeneous group is not enrolled, the consistency of the overall findings cannot be addressed in different subgroups. Many attribute restriction on eligibility and patient management in antiretroviral trials to our poor understanding on how to use current treatments even though thousands of patients have been studied. A State of the Art Conference on antiretroviral therapy in 1993 attributed our inability to make firm recommendations on when to start and change therapy and what treatments to use to the way clinical trials have been designed. In their words most of the randomized trials "were designed to answer questions about comparative clinical safety and efficacy in certain specific clinical scenarios." Also, "trials tended to enroll relatively uniform study groups". Tunis et al. have argued that diverse populations should be enrolled in trials and that they should be enrolled by clinicians in heterogeneous practice settings (JAMA 2003;90:1624-1632). Entry criteria for ACTG 016, a study conducted by the NIH-sponsored AIDS Clinical Trials Group, are typical. In ACTG 016, 711 patients with mildly symptomatic HIV disease were randomized to receive AZT or placebo. According to the published report of this study, to be eligible patients had to have mildly symptomatic HIV infection defined as the presence of one or two of the following: oral candidiasis, oral hairy leukoplakia, single dermatomal herpes zoster, unintentional weight loss exceeding 4.5 3 kg or 10% of usual body weight, all occurring within 3 years of study entry; recurrent dermatitis or pruritic folliculitis; intermittent diarrhea, defined as 2 or more liquid stools per day for no more than 7 consecutive days but for at least 14 days within 120 days of study entry; or chronic fatigue that interfered with normal activity within 6 months of study entry. CD4+ had to be between 200 and 800 cell/mm 3; hematocrit had to be greater than 34%; neutrophil count had to be 1500 cells/mm 3 or more; platelet count had to be greater than 100 X 109/L; liver tests had to be not more than 5 times the upper range of normal. Patients with AIDS were excluded. Use of other anti-HIV drugs, biologic response modifiers, or systemic corticosteroids were not allowed. In addition, prophylaxis for PCP was not allowed until 4 months before the study concluded. Patient management during the study was restrictive in other ways besides concomitant treatments. Participants with a serious toxicity lasting more than 21 consecutive days or those with the same recurring toxicity after dose reduction were removed from study medication. Patients were also removed from study therapy if they developed AIDS or if they failed to complete follow-up appointments or take study medication as directed. These restrictive eligibility criteria and patient management rules are consistent with much that we teach about the scientific method. Carefully define your study population taking care to eliminate as many possible sources of variability as possible. But does it make sense to narrowly define the study population and then use the drug for others who did not meet the eligibility criteria? Does it make sense to restrict concomitant medications if, in the "real world", patients take multiple medications for prophylaxis of opportunistic diseases and for other reasons? Does the reduced variability resulting from these restrictions limit our understanding of how the drug really works? For example, if we don't really know when a new drug should be initiated, would it not make more sense to be as inclusive as possible? You cannot study factors which are fixed by design, e.g., if you only enroll men, then you can not assess whether the treatment works the same in women. More generally, should the number and scope of restrictions made in a trial to ensure more homogeneity depend on the research question posed? Friedman, Furberg and DeMets define 5 general considerations on which to develop individual criteria for eligibility. 1. Only participants who have the potential to benefit from the intervention should be enrolled. 4 2. Participants for whom there is high probability of detecting the results of the intervention should be enrolled. 3. Participants to whom the intervention is potentially harmful should not be enrolled. 4. Participants at high risk of developing conditions which preclude ascertainment of the event of interest should not be enrolled. 5. Participants who cannot comply with the protocol should not be enrolled. As noted previously, many trial designers feel that eligibility criteria for trials are unnecessarily complex and restrictive resulting in slow enrollment, eligibility violations and many patients being treated outside of the trial with the same treatments. Also, as noted above, frequently if a treatment is found to be effective in the trial, its use becomes more widespread after the trial is over among patients whose disease status is different than those studied in the trial. This thinking has led to development of a concept called the "uncertainty principle". With this approach any patient for whom the effects of treatment are uncertain in the mind of the clinician and patient are randomized. This is the only eligibility criteria apart from a signed consent. The resulting increase in patient heterogeneity is accounted for by increasing sample size. Run-In Periods In some trials run-in periods have been used to ensure more "compliant" patients are enrolled. For example, prior to randomization, potential trial participants may be required to take the study medication and/or attend clinic visits and comply with other study procedures (e.g., diaries, blood tests, collection of urine samples). Those who "comply" are randomized; those who do not are excluded. In some trials active treatment has been given during the run-in period to ensure that patients can tolerate the treatment and/or respond to it based on intermediate variables (see Packer et al, NEJM 334:1349-55, 1996 for a placebo-controlled trial of carvedilol in heart failure or the CAST study in NEJM 324:781-788, 1991). In other studies, placebo is used to exclude "placebo responders" (see Lancet 349:1594-1597, 1997). In the Physicians Health Study, active aspirin and placebo for carotene were used during an 18 week runin period to identify good compliers. Run-in periods may improve the power for treatment comparisons. One disadvantage is that in blinded studies patients may notice 5 the change in treatment after randomization. Another disadvantage is that trials results will be less generalizable. Pablos-Mendez discuss the implication of run-ins on the applicability of trial results (JAMA 1998;279:222-225). Enrichment Designs Designs with run-ins can be viewed as just one way of “enriching” the study population chosen for a clinical trial to be participants who are likely to respond to the treatment. Biomarkers, including genetic traits, can be used to target therapy. Sargent et al (2005) discuss trial designs to determine whether targeted treatment is warranted. Temple (1994) discusses enrichment in a pre-randomization period. The CAST study is an example of such a trial. Maitournam and Simon (Stat Med 2005) consider the efficiency of targeted clinical trials and describe situations when sample size is reduced. Logs and Checklists If exclusions are many, some investigators suggest keeping a log of ineligible patients. Recent guidelines for reporting clinical trials indicate that reports should count and characterize the patients that they did not include. A log may enable the type of patient for whom the results of the study apply to be more easily defined. Many reports of clinical trials are difficult to interpret and generalize because the patients population from which the randomized groups were selected is inadequately defined (see Charlson ME and Horwitz RI, BMJ, 289:1281-1284, 1984). Disadvantages of logs are that they add to the cost of studies and often it is difficult to define the "denominator" -- all patients considered for entry. Some investigators consider them to be pointless (Peto R, Lancet 348:894-895, 1996). Logs may be helpful in monitoring the recruitment effort. If eligibility criteria are too stringent, the necessary number of patients to be studied may not be able to be recruited. Variations from clinic to clinic, ambiguous eligibility criteria, and criteria which are resulting in the exclusion of large numbers of patients can be identified and modified if appropriate. To ensure that eligibility criteria are understood and adhered to, forms with checklists should be used. Verification of eligibility should be part of the randomization procedure. In some studies, source documentation (documentation in the patient's medical chart) in addition to case-report forms, is frequently required for study elibility criteria. 6 Screening and Regression Toward the Mean For some trials, eligibility is established over a series of screening visits. For example, in MRFIT, participants attended 3 screening visits prior to randomization. One reason multiple screening visits are used in risk factor intervention studies is to ensure that patients do in fact have elevated risk factors. Regression toward the mean is the term used to describe the phenomenon that a variable that is extreme on its first measurement will be closer to the center of the distribution for a later measurement. This phenomenon was noted by Francis Galton (J. Anthropol. Inst., 15, pp 246-263, 1886) where he noted that the children of tall parents were on the average shorter than their parents, and that the children of short parents were on the average taller than their parents. This phenomenon arises in clinical and epidemiologic studies because of "within person" or temporal variability and measurement error. Regression toward the mean is often not recognized to exist in uncontrolled clinical investigations. Patients have measurements taken before and after treatment and the change is attributed to a treatment effect when in fact the change may simply result from temporal variability and measurement error. Many ailments improve over time, regardless of the intervention (recall the study on the treatment for the common cold). Likewise patients whose ailments are better than usual might be expected to worsen over time. Here are a few examples: 1. Patients with elevated blood pressure (BP) are recruited to receive musclerelaxation training and bio-feedback techniques. Substantial BP reductions are noted. Later scientists randomly allocate patients to relaxation training or no training. Both groups experience substantial reductions. Regular monitoring of BP results in reduction? (Science News, June 17, 1987.) 2. Five obstetricians each re-evaluated the case notes of 50 women in labor who had had an emergency caesarean section for fetal distress. In 30% of cases at least four of them felt that the caesarean had been unnecessary. What is the 7 appropriate control group? An unselected group? Those that did not receive a caesarean? 3. Cholesterol screening has been encouraged by public health agencies. How does the natural variability of cholesterol influence how this screening is carried out? 4. For patients infected with HIV, prophylaxis against PCP is recommended when the CD4+ count drops below 200 cells/mm3. CD4+ counts are difficult to measure and vary considerably over short time periods. How do we know when the CD4+ count truly drops below 200 cells/mm3? 5. Mantel (Cont. Clinical Trials 3:369-270, 1982) describes problems with investigators dividing patients into "responders" and "non-responders". He cites an example of a study of a treatment called Krebiozen for cancer. In that example a Krebiozen-responder was defined as one who had survived some minimal amount of time -- those dying early were classified as Krebiozen nonresponders. Claims of significantly improved survival for Krebiozen-responders were then made. Regression toward the mean also influences studies of the association of risk factors and disease endpoints. Random, short-term fluctuations in blood pressure, for example, result in substantial underestimation of the strength of the real association. This has been referred to as "regression dilution bias" (see Clarke et al.). How can regression toward the mean affect the design, implementation and interpretation of a clinical trial? 1) The methods by which patients are selected for inclusion can influence the estimated event rate in the control group (pc) and the estimated treatment difference. Gardner and Heady (J. Chronic Diseases, Vol 26, pp 781-795, 1973) note: "it is important to select an experimental group with as far as possible the highest 'true' values. In this way more of those at highest risk of the disease, in whom the treatment might be most effective, would be included, thus increasing the probability of detecting a stated difference with a given number of subjects. It 8 seems undesirable to have the experimental risk groups contaminated with persons without truly high levels..." Neaton and Bartsch (Stat Med, Vol 11, pp 1719-1729, 1992) showed that in the design of trials regression toward the mean could result in an underestimation of the effect of treatment in some cases (regression dilution bias) and an overestimation of the effect in other cases (due to my classification at entry). 2) The number of measurements used to establish eligibility will have an impact on how many people have to be screened to find a group "at risk", e.g., the screening yield. 3) The choice of a baseline from which to measure the effect of treatment. Suppose for our clinical trial have two eligibility/screening visits. To be eligible for randomization a subject must have a screen 1 risk factor level > xc. Consider: 1) E (x1 | x1 > xc) - Average level observed at screen 1 for those eligible at screen 1 2) E (x2 | x1 > xc) - Average level observed at screen 2 for those eligible at screen 1 3) E (X | x1 > xc) - Average of "true" level for those eligible at screen 1 9 STATISTICAL MODEL (2 Screens) X = Subject's "true" risk level ei = Error resulting from measurement and temporal variation at screen i xi = Subject's observed risk level at screen i xc = cutpoint for eligibility at screen 1 (no risk cutoff at screen2) X ~ N (μX, σ2X) ei ~ N ( O, σ2e ) xi = X + ei σ2X = between subject variability σ2e = within subject variability If the errors in different subjects and within subjects are independent: xi ~ N( μxi , σ2x) and σ2x = σ2X + σ2e In an unselected population μx1 = μx2 = μx, and σ2x1 = σ2x2 = σ2x 10 Expected Regression to the Mean in Standard Deviation Units for Various Proportions Excluded in Screening and Levels of Correlation Proportion Excluded in Screening Correlation .01 .05 .25 .50 .60 .70 0 .027 .108 .424 .798 .966 1.159 1.400 1.554 1.755 2.063 2.666 .10 .024 .098 .381 .718 .869 1.043 1.260 1.399 1.580 1.857 2.399 .20 .022 .087 .339 .638 .772 .927 1.120 1.244 1.404 1.650 2.132 .30 .019 .076 .297 .559 .676 .811 .980 1.088 1.229 1.444 1.866 .40 .016 .065 .254 .479 .579 .695 .840 .933 1.053 1.238 1.599 .50 .013 .054 .212 .399 .483 .579 .700 .777 .878 1.032 1.333 .60 .011 .043 .170 .319 .386 .463 .560 .622 .702 .825 1.066 .70 .008 .033 .127 .239 .290 .348 .420 .466 .527 .619 .800 .80 .005 .022 .085 .160 .193 .232 .280 .311 .351 .413 .533 .90 .003 .011 .042 .080 .097 .116 .140 .155 .176 .206 .267 11 .80 .85 .90 .95 .99