PubH 7420 Clinical Trials: Supplemental Notes for Lectures 13 and

advertisement
PubH 7420 Clinical Trials: Supplemental Notes for Lectures 13 and 14
1.
Freidman, Furberg, and DeMets, Fundamentals of Clinical Trials. Chapters 4, 9
and 10.
Supplemental Reading References
1.
Pocock SJ, Clinical Trials: A Practical Approach. John Wiley and Sons, Ltd.
Chapter 3.
2.
Clinical Trials. Design, Conduct and Analysis. Chapters 5 and 16.
3.
Yusef S, Held P, Tea K et al. Selection of patients for randomized controlled
trials: Implications of wide or narrow eligibility criteria. Stat Med 9:73-86, 1990.
4,
Rothwells P. External validity of randomized controlled trials: “To whom do the
results of this trial apply” Lancet 365:82-93, 2005.
5.
Pablos-Mendez A, Barr RG, Shea S. Run-in periods in randomized trials.
Implications for the application of results in clinical practice. JAMA 279:222225,1998.
6.
Temple RJ. Special study designs: early escape, enrichment, studies in nonresponders. Commun Statist Theory Meth, 23:499-531, 1994.
7.
Sargent DJ, Conley BA, Allegra C Collette L. Clinical trial designs for predictive
marker validation in cancer treatment trials. J Clin Oncol 23: 2020-2027, 2005.
8.
Leber PD, Davis CS. Threats to the validity of clinical trials employing enrichment
strategies for sample selection Cont Clin Trials, 19:178-187, 1998.
9.
Davis CE et al. A single cholesterol measurement underestimates the risk of
coronary heart disease. JAMA 264:3044-3046, 1991.
10.
Prospective Studies Collaboration. Age-specific relevance of usual blood
pressure to vascular mortality: a meta-analysis of individual data for one million
adults in 61 prospective studies. Lancet 360:1903-1913, 2002.
11.
Gardner MJ, Heady JA. Some effects of within-person variability in
epidemiologic studies. J Chron Dis, 26, 781-795, 1973.
12.
Ederer F. Serum cholesterol changes: effects of diet and regression toward the
mean. J. Chron Dis 25:277-289, 1972.
1
13.
Davis CE. The effect of regression to the mean in epidemiologic and clinical
studies. Amer J Epid 104:493-498, 1976.
14.
Cutter GR. Some examples for teaching regression toward the mean from a
sampling viewpoint. Amer Stat 30:194-197, 1976.
15.
Yudkin PL, Stratton IM. How to deal with regression to the mean in intervention
studies. Lancet 347:241-243, 1996.
16.
Clarke R, Shipley M, Lewington S., et al. Under-estimation of risk associations
due to regression dilution in long-term follow-up of prospective studies. Amer J
Epid 150:341-353, 1999.
2
Eligibility Criteria for Clinical Trials
Prior to randomization in a clinical trial patients must meet pre-specified eligibility
criteria. These criteria, which are stated in advance of the trial, are frequently referred
to as inclusion and exclusion criteria. An example of an inclusion criterion would be the
signing of the consent form. An example of an exclusion criterion would be a
contraindication to one of the treatments being studied. The study population (those
randomized), as defined by the eligibility criteria, can be thought of as a subset of the
larger target population to which we wish to generalize the findings of our study.
In many trials eligibility criteria are specified so as to identify a group of "homogeneous"
patients. This allows the question to be stated more precisely and may permit the
question to be addressed with a smaller sample size because variability among subjects
is reduced. The disadvantage of this approach is that the results of the study may not
be as widely generalizable and some patients may eventually be given treatments for
which little is known from controlled studies, e.g., lipid lowering drugs in women and the
elderly, antiretroviral drugs in women and non-whites or in persons with very early
disease. If a heterogeneous group is not enrolled, the consistency of the overall
findings cannot be addressed in different subgroups.
Many attribute restriction on eligibility and patient management in antiretroviral trials to
our poor understanding on how to use current treatments even though thousands of
patients have been studied. A State of the Art Conference on antiretroviral therapy in
1993 attributed our inability to make firm recommendations on when to start and change
therapy and what treatments to use to the way clinical trials have been designed. In
their words most of the randomized trials "were designed to answer questions about
comparative clinical safety and efficacy in certain specific clinical scenarios." Also,
"trials tended to enroll relatively uniform study groups". Tunis et al. have argued that
diverse populations should be enrolled in trials and that they should be enrolled by
clinicians in heterogeneous practice settings (JAMA 2003;90:1624-1632).
Entry criteria for ACTG 016, a study conducted by the NIH-sponsored AIDS Clinical
Trials Group, are typical. In ACTG 016, 711 patients with mildly symptomatic HIV
disease were randomized to receive AZT or placebo. According to the published report
of this study, to be eligible patients had to have mildly symptomatic HIV infection
defined as the presence of one or two of the following: oral candidiasis, oral hairy
leukoplakia, single dermatomal herpes zoster, unintentional weight loss exceeding 4.5
3
kg or 10% of usual body weight, all occurring within 3 years of study entry; recurrent
dermatitis or pruritic folliculitis; intermittent diarrhea, defined as 2 or more liquid stools
per day for no more than 7 consecutive days but for at least 14 days within 120 days of
study entry; or chronic fatigue that interfered with normal activity within 6 months of
study entry. CD4+ had to be between 200 and 800 cell/mm 3; hematocrit had to be
greater than 34%; neutrophil count had to be 1500 cells/mm 3 or more; platelet count
had to be greater than 100 X 109/L; liver tests had to be not more than 5 times the
upper range of normal. Patients with AIDS were excluded. Use of other anti-HIV drugs,
biologic response modifiers, or systemic corticosteroids were not allowed. In addition,
prophylaxis for PCP was not allowed until 4 months before the study concluded.
Patient management during the study was restrictive in other ways besides concomitant
treatments. Participants with a serious toxicity lasting more than 21 consecutive days or
those with the same recurring toxicity after dose reduction were removed from study
medication. Patients were also removed from study therapy if they developed AIDS or if
they failed to complete follow-up appointments or take study medication as directed.
These restrictive eligibility criteria and patient management rules are consistent with
much that we teach about the scientific method. Carefully define your study population
taking care to eliminate as many possible sources of variability as possible. But does it
make sense to narrowly define the study population and then use the drug for others
who did not meet the eligibility criteria? Does it make sense to restrict concomitant
medications if, in the "real world", patients take multiple medications for prophylaxis of
opportunistic diseases and for other reasons? Does the reduced variability resulting
from these restrictions limit our understanding of how the drug really works? For
example, if we don't really know when a new drug should be initiated, would it not make
more sense to be as inclusive as possible? You cannot study factors which are fixed by
design, e.g., if you only enroll men, then you can not assess whether the treatment
works the same in women. More generally, should the number and scope of restrictions
made in a trial to ensure more homogeneity depend on the research question posed?
Friedman, Furberg and DeMets define 5 general considerations on which to develop
individual criteria for eligibility.
1.
Only participants who have the potential to benefit from the intervention
should be enrolled.
4
2.
Participants for whom there is high probability of detecting the results of
the intervention should be enrolled.
3.
Participants to whom the intervention is potentially harmful should not be
enrolled.
4.
Participants at high risk of developing conditions which preclude
ascertainment of the event of interest should not be enrolled.
5.
Participants who cannot comply with the protocol should not be enrolled.
As noted previously, many trial designers feel that eligibility criteria for trials are
unnecessarily complex and restrictive resulting in slow enrollment, eligibility violations
and many patients being treated outside of the trial with the same treatments. Also, as
noted above, frequently if a treatment is found to be effective in the trial, its use
becomes more widespread after the trial is over among patients whose disease status
is different than those studied in the trial. This thinking has led to development of a
concept called the "uncertainty principle". With this approach any patient for whom the
effects of treatment are uncertain in the mind of the clinician and patient are
randomized. This is the only eligibility criteria apart from a signed consent. The
resulting increase in patient heterogeneity is accounted for by increasing sample size.
Run-In Periods
In some trials run-in periods have been used to ensure more "compliant" patients are
enrolled. For example, prior to randomization, potential trial participants may be
required to take the study medication and/or attend clinic visits and comply with other
study procedures (e.g., diaries, blood tests, collection of urine samples). Those who
"comply" are randomized; those who do not are excluded. In some trials active
treatment has been given during the run-in period to ensure that patients can tolerate
the treatment and/or respond to it based on intermediate variables (see Packer et al,
NEJM 334:1349-55, 1996 for a placebo-controlled trial of carvedilol in heart failure or
the CAST study in NEJM 324:781-788, 1991). In other studies, placebo is used to
exclude "placebo responders" (see Lancet 349:1594-1597, 1997). In the Physicians
Health Study, active aspirin and placebo for carotene were used during an 18 week runin period to identify good compliers. Run-in periods may improve the power for
treatment comparisons. One disadvantage is that in blinded studies patients may notice
5
the change in treatment after randomization. Another disadvantage is that trials results
will be less generalizable. Pablos-Mendez discuss the implication of run-ins on the
applicability of trial results (JAMA 1998;279:222-225).
Enrichment Designs
Designs with run-ins can be viewed as just one way of “enriching” the study population
chosen for a clinical trial to be participants who are likely to respond to the treatment.
Biomarkers, including genetic traits, can be used to target therapy. Sargent et al (2005)
discuss trial designs to determine whether targeted treatment is warranted. Temple
(1994) discusses enrichment in a pre-randomization period. The CAST study is an
example of such a trial. Maitournam and Simon (Stat Med 2005) consider the efficiency
of targeted clinical trials and describe situations when sample size is reduced.
Logs and Checklists
If exclusions are many, some investigators suggest keeping a log of ineligible patients.
Recent guidelines for reporting clinical trials indicate that reports should count and
characterize the patients that they did not include. A log may enable the type of patient
for whom the results of the study apply to be more easily defined. Many reports of
clinical trials are difficult to interpret and generalize because the patients population
from which the randomized groups were selected is inadequately defined (see Charlson
ME and Horwitz RI, BMJ, 289:1281-1284, 1984). Disadvantages of logs are that they
add to the cost of studies and often it is difficult to define the "denominator" -- all
patients considered for entry. Some investigators consider them to be pointless (Peto
R, Lancet 348:894-895, 1996).
Logs may be helpful in monitoring the recruitment effort. If eligibility criteria are too
stringent, the necessary number of patients to be studied may not be able to be
recruited. Variations from clinic to clinic, ambiguous eligibility criteria, and criteria which
are resulting in the exclusion of large numbers of patients can be identified and modified
if appropriate.
To ensure that eligibility criteria are understood and adhered to, forms with checklists
should be used. Verification of eligibility should be part of the randomization procedure.
In some studies, source documentation (documentation in the patient's medical chart) in
addition to case-report forms, is frequently required for study elibility criteria.
6
Screening and Regression Toward the Mean
For some trials, eligibility is established over a series of screening visits. For example,
in MRFIT, participants attended 3 screening visits prior to randomization. One reason
multiple screening visits are used in risk factor intervention studies is to ensure that
patients do in fact have elevated risk factors.
Regression toward the mean is the term used to describe the phenomenon that a
variable that is extreme on its first measurement will be closer to the center of the
distribution for a later measurement.
This phenomenon was noted by Francis Galton (J. Anthropol. Inst., 15, pp 246-263,
1886) where he noted that the children of tall parents were on the average shorter than
their parents, and that the children of short parents were on the average taller than their
parents.
This phenomenon arises in clinical and epidemiologic studies because of "within
person" or temporal variability and measurement error.
Regression toward the mean is often not recognized to exist in uncontrolled clinical
investigations. Patients have measurements taken before and after treatment and the
change is attributed to a treatment effect when in fact the change may simply result
from temporal variability and measurement error. Many ailments improve over time,
regardless of the intervention (recall the study on the treatment for the common cold).
Likewise patients whose ailments are better than usual might be expected to worsen
over time. Here are a few examples:
1.
Patients with elevated blood pressure (BP) are recruited to receive musclerelaxation training and bio-feedback techniques. Substantial BP reductions are
noted. Later scientists randomly allocate patients to relaxation training or no
training. Both groups experience substantial reductions. Regular monitoring of
BP results in reduction? (Science News, June 17, 1987.)
2.
Five obstetricians each re-evaluated the case notes of 50 women in labor who
had had an emergency caesarean section for fetal distress. In 30% of cases at
least four of them felt that the caesarean had been unnecessary. What is the
7
appropriate control group? An unselected group? Those that did not receive a
caesarean?
3.
Cholesterol screening has been encouraged by public health agencies. How does
the natural variability of cholesterol influence how this screening is carried out?
4.
For patients infected with HIV, prophylaxis against PCP is recommended when
the CD4+ count drops below 200 cells/mm3. CD4+ counts are difficult to measure
and vary considerably over short time periods. How do we know when the CD4+
count truly drops below 200 cells/mm3?
5.
Mantel (Cont. Clinical Trials 3:369-270, 1982) describes problems with
investigators dividing patients into "responders" and "non-responders". He cites
an example of a study of a treatment called Krebiozen for cancer. In that example
a Krebiozen-responder was defined as one who had survived some minimal
amount of time -- those dying early were classified as Krebiozen nonresponders.
Claims of significantly improved survival for Krebiozen-responders were then
made.
Regression toward the mean also influences studies of the association of risk factors
and disease endpoints. Random, short-term fluctuations in blood pressure, for
example, result in substantial underestimation of the strength of the real association.
This has been referred to as "regression dilution bias" (see Clarke et al.).
How can regression toward the mean affect the design, implementation and
interpretation of a clinical trial?
1)
The methods by which patients are selected for inclusion can influence the
estimated event rate in the control group (pc) and the estimated treatment
difference. Gardner and Heady (J. Chronic Diseases, Vol 26, pp 781-795, 1973)
note:
"it is important to select an experimental group with as far as possible the highest
'true' values. In this way more of those at highest risk of the disease, in whom the
treatment might be most effective, would be included, thus increasing the
probability of detecting a stated difference with a given number of subjects. It
8
seems undesirable to have the experimental risk groups contaminated with
persons without truly high levels..."
Neaton and Bartsch (Stat Med, Vol 11, pp 1719-1729, 1992) showed that in the
design of trials regression toward the mean could result in an underestimation of
the effect of treatment in some cases (regression dilution bias) and an overestimation of the effect in other cases (due to my classification at entry).
2)
The number of measurements used to establish eligibility will have an impact on
how many people have to be screened to find a group "at risk", e.g., the screening
yield.
3)
The choice of a baseline from which to measure the effect of treatment.
Suppose for our clinical trial have two eligibility/screening visits. To be eligible for
randomization a subject must have a screen 1 risk factor level > xc.
Consider:
1) E (x1 | x1 > xc) - Average level observed at screen 1 for those eligible at screen 1
2) E (x2 | x1 > xc) - Average level observed at screen 2 for those eligible at screen 1
3) E (X | x1 > xc) - Average of "true" level for those eligible at screen 1
9
STATISTICAL MODEL (2 Screens)
X = Subject's "true" risk level
ei = Error resulting from measurement and temporal variation at screen i
xi = Subject's observed risk level at screen i
xc = cutpoint for eligibility at screen 1 (no risk cutoff at screen2)
X ~ N (μX, σ2X)
ei ~ N ( O, σ2e )
xi = X + ei
σ2X = between subject variability
σ2e = within subject variability
If the errors in different subjects and within subjects are independent:
xi ~ N( μxi , σ2x)
and σ2x = σ2X + σ2e
In an unselected population
μx1 = μx2 = μx, and σ2x1 = σ2x2 = σ2x
10
Expected Regression to the Mean in Standard Deviation Units
for Various Proportions Excluded in Screening and Levels of Correlation
Proportion Excluded in Screening
Correlation
.01
.05
.25
.50
.60
.70
0
.027
.108
.424
.798
.966 1.159 1.400 1.554 1.755 2.063 2.666
.10
.024
.098
.381
.718
.869 1.043 1.260 1.399 1.580 1.857 2.399
.20
.022
.087
.339
.638
.772
.927 1.120 1.244 1.404 1.650 2.132
.30
.019
.076
.297
.559
.676
.811
.980 1.088 1.229 1.444 1.866
.40
.016
.065
.254
.479
.579
.695
.840
.933 1.053 1.238 1.599
.50
.013
.054
.212
.399
.483
.579
.700
.777
.878 1.032 1.333
.60
.011
.043
.170
.319
.386
.463
.560
.622
.702
.825 1.066
.70
.008
.033
.127
.239
.290
.348
.420
.466
.527
.619
.800
.80
.005
.022
.085
.160
.193
.232
.280
.311
.351
.413
.533
.90
.003
.011
.042
.080
.097
.116
.140
.155
.176
.206
.267
11
.80
.85
.90
.95
.99
Download