Clinical Trial Design Principles for Dummies

advertisement
Some Clinical Trial Design
Questions and Answers
Peter A. Lachenbruch
Oregon State University
Usual Disclaimer
• The views expressed here are mine. While this
presentation was initially developed while I was
at the FDA, the opinions may not reflect those of
the FDA
• There are lots of acronyms – FDA never met an
acronym it didn’t like.
• Please feel free to stop me and ask for
clarification
2
Orientation
• I discuss some questions regarding clinical trial
design principles. There are many books on the
basics of clinical trial design and analysis
–
–
–
–
Pocock
Freedman Furberg and DeMets
Piantadosi
Chow and Liu
3
Orientation (2)
• FDA has many guidance documents on their
web site www.fda.gov Consult these for further
details.
• The International Conference on Harmonization
(ICH) has issued many reports that worldwide
regulatory agencies will abide by. Web site:
www.ich.org
• See E9 (Statistical Principles for Clinical Trials), E10 (Choice
of Control Group and Related Issues in Clinical Trials , E3
(Structure and Content of Clinical Study Reports), E6 (Good
Clinical Practice), and E5 (Ethnic Factors in the Acceptability
of Foreign Clinical Data) for particularly useful documents
4
Orientation (3)
• Off-label use
– Use of a product for a condition that it has not been approved for
(i.e., not on label)
– FDA does not regulate medical practice and there may be
drawbacks – e.g., drug interactions, adverse events
– Limited information from these uses
• Investigational use
– Product being studied (usually at a limited number of sites) for
use for some indication. Many controls on studies.
– Compassionate use – when products are close to approval, FDA
may allow use for patients not in a clinical trial
5
Orientation (4)
• FDA organization
– Center for Biologics Evaluation and Research (CBER) – things
like vaccines, blood, genomics
– Center for Devices and Radiological Health (CDRH) – stents,
wheelchairs, band-aids, TV radiation – mechanical stuff.
– Center for Drug Evaluation and Research (CDER)
– Others: Center for Veterinary Medicine (CVM), Center for Food
Safety & Applied Nutrition (CFSAN), National Center for
Toxicological Research (NCTR)
6
General Statistical Ideas
• Clarity of approach
– Full disclosure of design, sample size calculations
– Analysis methods
– Distinction between CONFIRMATORY and Exploratory analyses
• Confirmatory analyses are specified in the protocol (and will lead to
licensure)
• Exploratory analyses are those that are suggested by the data
(think of shooting an arrow into a barn and then painting the target
around the arrow!)
– If “new” methods are used, there should be peer-reviewed
citation
7
Statistical Ideas (2)
• Must be analytically appropriate
– Maintain size (α level)
– Maintain blinding as appropriate
• Have endpoints (outcomes) that are appropriate
– Show a clinical benefit
– Reliable and valid
• Minimize missing data and provide plan for dealing with
them when they occur
8
Question 1
• What pitfalls do the FDA see when information
from pre-clinical data (or early clinical data on a
similar investigational product) is formulated into
a Phase I protocol?
9
Q1: (1)
• Safety issues
– A main use of pre-clinical data is to ascertain basic safety
information. If animal data is limited, the FDA may ask for further
study
• Carcinogenicity studies – are there excess cancers
• Immunogenicity studies
• Teratogenicity studies – do fetuses develop normally
– The choice of animal model is important – if it is not accepted as
appropriate, there may be need to obtain further information or
information on another model (the lists below are not exhaustive
• Rodents: mice, rats, rabbits
• Non-primate vertebrates: dogs, cats, pigs
• Primates: rhesus monkeys, macaques
10
Q1: (2)
• Need to characterize the product
–
–
–
–
Potency assays
Purity
Identity
All specifications need to have at least a start at understanding
leading to full GMP compliance (GMP=good manufacturing
practice)
11
Q1: (3)
• It is recognized that proof of concept studies in
pre-clinical studies may be limited. However,
there should not be evidence of poorer
outcomes than comparators.
– This means that the results are not significantly
poorer, not that the mean in one group is lower than
the mean in the other.
12
Q2:
• What are the various types of study
designs available when there are
more than two comparators?
13
Q2:
• Depends on purpose of study
– Testing 2 or more dosage levels versus control
• Parallel group design – may wish to account for ordering of
dosages in the analysis
– Testing both schedule and dose
• Factorial design (high and low levels of each dose crossed) –
allows examination of interactions in the analysis
14
Q2: (2)
– Test amount of adjuvant and dose
• Factorial design
– Crossover designs are cannot be used in vaccine
trials because the immune system is permanently
affected (or at least affected for a long time). Thus,
carry over effects are always present
• Smallpox immunization may last for 20 years or more, and
may be refreshed by exposure to the virus
• Influenza may have rapidly waning immunity, and different
strains may appear each year. There may be some crossprotection.
15
Q3:
• What controls are appropriate when
there cannot be any blinding in the
trial?
16
Q3:
• Almost any control is reasonable: placebo,
standard of care
– Compare treatment with control
• In some cases, historical controls may be used,
but these are rare
– A historical control is a group that was observed
previously. Problems with concurrency, lack of
comparability, etc.
– Generally not recommended
17
Q3 (2)
• The endpoint / outcome variable that is being
used and how it’s evaluated is most important
– A subjective endpoint is usually a problem, so FDA
expects that a blinded evaluator will be used in these
cases.
– An objective endpoint (e.g., confirmed disease by
laboratory measures) is preferable
– Survival is always objective, but may have too few
events.
18
Q4:
• What are the problems seen by the
FDA with randomization in trials?
19
Q4:
• Randomization is absolutely essential in vaccine
trials
• Issues
– Cheating: unblinding the treatment assignment –
need to have robust way of preventing this
– Stratification: too many strata make it unlikely that
there will be sufficient numbers in each stratum for
precise estimation. The number of strata is the
product of the number of levels in each stratum (Sex
(2), Age (4), Ethnicity (3) = 24 strata)
20
Q4: (2)
– Issues (continued)
• Inadequate number of strata
– age<2, 2 ≤ age <12, 12 ≤ age < 18, 18 ≤ age < 50,
50 ≤ age often important in vaccine studies
• Introduction of bias
• Not accounting for the design of the study in the
analysis – just because you have stratified, you still
must account for the stratification in the analysis.
21
Q5:
• What are the steps in designing a
dose-ranging/dose-escalation study?
22
Q5:
• Goals:
– To establish maximum tolerated dose (MTD), dose
limiting toxicities, and/or maximum feasible dose
– To establish minimum effective dose
• Designs
– One dose per subject, gradual increase by fixed
amount (typically half log increases, with rules for
stopping)
• What’s the right starting dose?
23
Q5: (2) Dose ranging / dose escalation
– Multiple doses per subject for short (3-7 days) or long
(1-4 weeks or greater) periods
– What is range that generates useful levels of
antibodies? What level has adverse events?
– Dose Escalation
• Give successively larger doses or number of doses (booster
doses) until subject responds
• May not be helpful with vaccines because of permanent
effect of a vaccine, but can use different subjects. In this
case, subjects should be randomized to dose.
24
Q5: (3)
• For a vaccine both dose and schedule need to
be determined
– A factorial design may be useful
• Test all doses and all schedules
• Can look for interactions to see if the response is additive or
not
• The specific adjuvant may be important
– This may be expanded to look at a response surface
• Useful for first trials to pick a dose-schedule combination for
later trials
25
Q5: (4)
• In vaccine studies, it is important to establish the
duration of protection.
– In clinical trials, sponsor can follow subjects for a year
or more and observe if there are any changes in
disease incidence over that time.
– Alternatively, the sponsor can obtain serum samples
to determine the level of antibodies.
– It is not always clear how the serum antibody level
relates to disease incidence (called a correlate of
protection).
26
Q6:
• How does one deal with multiple
variables that will affect the outcome
measure (with an understanding of
fixed randomization schemes and
adaptive/dynamic randomization
schemes)
27
Q6:
• If there are strata, including study sites, these
always should be included in the analysis model.
• Common covariates include (if they are not
strata) age, sex, ethnicity, disease stage.
• Usual method is to conduct an analysis of
covariance – some covariates may not be
ordered, often they are.
28
Q6: (2)
• The analysis of covariance does an analysis of
variance on the adjusted response.
• Assumptions:
– Normal distribution of residual error
• Can handle with a permutation test
– Covariates are not affected by treatment – measure
at baseline!
– Parallelism – no interaction of treatment and slope –
i.e. the response treatment rate of change is same for
all covariate combinations
29
Q6a: Fixed and Adaptive
Randomization Schemes
• Fixed scheme
– Same proportion assigned to each treatment group
– Different proportion assigned to groups such as 2:1 or
3:1
• Smallest variance of treatment effect associated with 1:1
allocation, but it may be important to gain understanding of
safety profile, so a more extreme allocation may be used.
More extreme than 3:1 is not very useful and leads to much
larger sample size
30
Q6a: (2)
• Here are total sample sizes for α=0.05, =0.1,
mean difference=1, =3
–
–
–
–
–
1:1 allocation 380
2:1 allocation 429 13% increase
3:1 allocation 508 34% increase
5:1 allocation 684 80% increase
Thus, the unbalanced allocation leads to a substantial
increase in sample size and consequent budget
31
Q6a: (3)
• Adaptive randomization
– Next randomization depends on outcome of prior
subjects in the trial
• Need fairly early response in trial. May be possible with skin
or other reactions (if they occur within a few hours of
treatment), a bit less so with immunogenicity (outcome after
first series of treatment that may take 6 months), unlikely with
clinical outcome (occurs after series of treatments and a
relatively long follow up period)
32
Q6a: (4)
• Another form of adaptive randomization attempts
to balance covariates (e.g., minimization)
– This can be done with vaccine studies
– Needs to have a measure of imbalance
• Must adjust for covariates used in imbalance score
• Potential for manipulation?
– There is considerable disagreement among
statisticians about the appropriate analysis model.
33
Q7:
• What factors are used to estimate
sample size?
34
Q7: What factors are used to estimate
sample size?
• Most popular question to statisticians
– Tell me what n I need? It depends on the context of
the study
• Factors (using a two group test as an example)
–
–
–
–
–
Significance level (α, often 0.05)
Type 2 error (, often 0.2 or 0.1; 1- is the power)
Standard deviation of observation ()
Difference in means (1 - 2 )
Allocation ratio (1:1, 2:1, etc.)
35
Q7: (2)
• Significance level, type 2 error and allocation
ratio are relatively easy to determine
• Mean difference and standard deviation are
usually based on preliminary studies and may
be quite uncertain.
– It is useful to take these preliminary differences and
halve them
– What is really needed is the ratio of treatment
difference to the standard deviation
36
Q7: (3)
• In vaccines we often want to estimate the vaccine efficacy
and find a confidence interval with a lower bound that
gives us assurance the vaccine is working well
IV
VE  1 
IC
Where IV is the incidence rate for the vaccine group and IC is the
incidence rate for the control group
Lower bound must be substantially greater than 0
37
Q7: (4)
• It’s easy to find an expression for the confidence
interval and then one sets the lower bound of
the interval to the desired level (set in
consultation with FDA)
– Often ¾ of the observed VE, and VE is set by needs
of clinical prevention. For example if the target VE is
0.8 (or 80%) the lower bound would be 60%
– These are not absolute criteria!
38
Q8:
• How does the investigator choose the margin of
equivalence or non-inferiority (delta or ) in
comparative clinical trials?
– The object is to show that a new product is not poorer
than the approved product by an clinically
unimportant amount.
– This is the concept for generic products – want to
show a) works about as well; b) about as safe or
more so; c) costs less
39
Q8:
• Demonstrate that active control has assay sensitivity –
that is, it consistently shows itself better than placebo –
need evidence of this (the active control is the approved
product).
– Some conditions are so variable that a non-inferiority study isn’t
feasible. These may include psychiatric conditions, some
rheumatological conditions, etc.
• Is the control an appropriate one? Compare to best
licensed product, not worst if there are multiple options.
• What is clinically important? Depends on context of
disease
40
Q8: (2)
• When citing a % difference for VE (or anything) be sure
to clarify whether you mean 10 percentage points or
10% of the comparator
– If comparator has a VE of 80%, do we want the estimated VE of
the new vaccine to be 72% or 70% if we choose a 10% margin?
Or do we want the relative VE (vaccine vs. control) to be at least
90%
– It’s easy to become confused so it’s good to be specific
41
Q9:
• How does the investigator deal with
missing data?
42
Q9:
• Don’t have any
• Don’t have very much (under 5% is my initial
break point)
• Discuss ways of dealing with it prospectively!!!
–
–
–
–
–
Complete case analysis (ugh!)
Last Observation Carried Forward (LOCF) (also ugh!)
Mean values for replacement
Regression models for replacement
Imputation models
43
Q9: (2)
• Types of missing values
– Patient misses visit, and an ‘interior’ value is missing
– Patient drops out, and a series of values at the end
are missing
– (Some) Covariates are missing in one or more visits
44
Q9: (3)
• Last Observation Carried Forward – when
patient drops out, observations from that time
onward are replaced by the last observation.
–
–
–
–
Is almost always a problem for me
It ignores any trends in data
It reduces variability arbitrarily
Can change significance level substantially – in either
direction.
45
Q9: (4)
• Mean values imputation
– Need to be sure you don’t increase the apparent sample size
(i.e., replaced values don’t give a more precise estimate)
• When observations are replaced by mean values, most statistics
programs treat the replaced values like other values. This reduces
the variability and ‘improves’ the power.
– This doesn’t account for patient specific characteristics
– This can be especially tricky if a long series of values is imputed
– Mean of patient values or mean of other patients at that visit?
46
Q9: (5)
• Regression models
– Determine what variables are “good” predictors of the missing
value – usually useful to have a small number of such variables
so they won’t have a lot of missing value issues
– Predict missing value using a regression model (try to show that
variance of predicted value isn’t too big)
47
Q9: (6)
• More sophisticated imputation models have been
developed recently
– Determine classes of similar patients (“propensity scores”)
– Fill in missing values by selecting randomly from observations in
the same class.
– Do this multiple times, (5 to 10 is usually enough)
– Analyze the data and pool results
48
Q10:
• How does an investigator determine what data
should and should not be included in analyses
(especially in the cases of protocol violations,
withdrawals and drop-outs)?
49
Q10:
• The fundamental efficacy data set includes all subjects
as randomized
– Note this does not say “ignore patients who didn’t get
medication” – doing this messes up the randomization plan and
allows data shaping
– This is called Intent to Treat (ITT)
– Modified ITT relaxes this to patients who have had at least one
treatment. It still messes up the randomization design.
50
Q10: (2)
• Per Protocol Data set
– Subjects who had no protocol violations (completed study, etc)
– This is often done, but may be misleading if many subjects have
protocol violations.
• Can provide various analysis data sets that may be
subsets – e.g. no protocol violations, no withdrawals or
dropouts
– Important to examine comparability of groups and outcomes to
ITT, mITT
51
Q10: (3)
• Data to be included will depend on the purposes of the
analysis. All substantial differences from total study
population need to be explained
– E.g., Immunogenicity analysis had only 48% of the total sample
because only 50% of subjects were solicited for bleeding (may
be acceptable if pre-specified in the protocol)
• FDA expects to have access to all data and may audit it.
52
Q11:
• What types of analyses should be used
when there are multiple time points
(multiple observations) of data collection
(and there is no dichotomous outcome /
endpoint)?
53
Q11:
• There has been an active area of statistical
research on longitudinal data analysis in the
past few years
– In vaccines research, the common issue is in
immunogenicity levels over time. These are often in
log(GMC) or log(titer). These look like multiple
continuous measurements
– Can also have whether a four-fold increase in titer
has been achieved at various times. This is a series
of dichotomous variables (yes or no)
54
Q11: (2)
• Longitudinal analysis accounts for subject, treatment,
other covariates and time in the analysis
– Measurements made at different times are correlated
– Must determine appropriate correlation structure
– Shape of response curve (linear over time? Curve? )
55
Q11: (3)
• Longitudinal analysis
• GEE models provide great flexibility
• Dichotomous variables (e.g., seroconversion at different
times) can be handled with GEE
• Alternatives – may be appropriate
• Change from baseline to final observation (usually need to
have a common last time) - does this depend on baseline?
Should we use last observation with baseline as covariate?
56
Q12:
• In analyzing data from clinical trials involving
multiple sites, should site be treated as a fixed or
random effect?
57
Q12:
• The site should be included in the analysis model,
especially if randomization is stratified. If it’s not
stratified, you may not wish to include it if sites are
generally small. With small sites, the d.f. that are used
up reduces the precision of the comparison
58
Q12: (2)
• If we include sites, I prefer to treat them as random
effects rather than fixed effects since the intent is to
generalize beyond those sites in the trial
– With random effects, the inference extends to all possible sites
(i.e., the population of sites)
– With fixed effects, the population is just the sites that have
enrolled patients
• Main idea is to analyze the data according to the way in
which they have been collected
59
Q12: (3)
• Interesting question (at least for statisticians!)
– How can we regard the sites as a random sample of all sites if
we have selected them because they have talented and
committed physicians conducting the research there?
– No decent answer – but there is usually little interest in drawing
conclusions that apply only to the sites that have entered
patients
60
Q13:
• When should a planned interim analysis (for safety
and/or efficacy and/or sample size re-estimation) be
appropriate? What are the pros and cons? What are the
methods used?
61
Q13:
• There are three reasons for doing an interim
analysis:
– Examine the safety of the product at an early time to
ensure that we are not harming subjects - either stop
or continue
– Examine efficacy of the product at an early time –
may stop because vaccine is very good or very bad,
or may continue
– Re-estimate sample size – learn that study is too
small to show a difference (variability too large,
treatment effect is too small) – probably don’t want to
increase sample size by more than 50% total
62
Q13: (2)
• All interim analyses carry a risk of unblinding the study
– If the study stops, everyone will know the results
– If the study continues, a reasonable inference is that the study
has a small p-value ( but >0.05) since we would have stopped if
there was little hope of showing a difference
– Someone might inadvertently (or deliberately) let information slip
out
63
Q13: (3)
• Interim analyses require adjustment of the critical values.
It is complicated and several programs (EaST, PEST,
SPlus for sequential analysis) are available
• All interim analyses or sample size re-estimation
analyses need to be specified prospectively in the
protocol
• One sponsor reported to us that they had been looking
at the data as each patient came in and stopped when
the p-value was <0.05.
64
Thank You
65
Download