Design of Clinical Research Protocols

advertisement
Study Design and Hypothesis
Testing in Clinical Research
Jonathan J. Shuster, Ph.D
(jshuster@biostat.ufl.edu)
Research Professor of Biostatistics
Univ. of Florida, College of Medicine
1
Take-home Messages
• Rely on Evidence-Based Medicine. Conventional
wisdom can easily lead us astray.
• The objective of Statistics is to make informed
inferences about a population, based on a sample. It is
imperative to quantify the uncertainty.
• The P-value is a quantity that allows us to infer
something about whether a scientific hypothesis is false.
• Non-significant results are inconclusive
• Randomization and intent-to-treat are vital
components in sound clinical research
2
3
Topics
1. Motivating Evidence-Based Clinical
Studies
2. Objective of Statistics
3. Hypothesis testing and P-values
4. Real Examples and their lessons
4
5
1. Motivating Evidence-Based
Medicine
• A coin is “loaded”, with a 70% chance of
landing heads. One player picks a three
outcome sequence (e.g. HTH), then the
other picks a different sequence. Whoever’s
sequence comes up first is the winner.
• Do you want to choose first, and if so, what
sequence to you select?
6
Evidence-Based Medicine
• So you decided to go first and pick HHH, right?
• OK, I pick THH.
• HHH can only occur before THH if it is on the
first three flips. (If the first time HHH occurs is
flips 6,7,8 then flip 5 is T, so flips 5,6,7 are THH, I
win. (I make your first 2, my last 2, so I tend to
stay ahead.)
• Your chance of winning=.73 =.343 (34.3%)
7
Evidence-Based Medicine
• Lesson from this example.
• Things are not always what they seem. You
need to be a healthy skeptic.
• Reference: Shuster, J. A two-player coin
game paradox in the classroom. American
Statistician, 2006(Feb), vol 60, pp 68-70.
8
9
2. Objective of Statistics
• To make an inference about a defined target
population from a representative sample.
• That is, for us, to start from a medical hypothesis
about a medical condition, help design a study that
can collect data to test the question, and draw
conclusions. Quantifying the uncertainty about
the inference is a key part.
10
2. Comment on This
• Should we compare treatment groups
statistically in a randomized study with
respect to baseline parameter (e.g. age,
gender, ethnicity, blood pressure)?
11
2. Provenzano: Clin J Am Soc
Nephrol 4, 386-93, 2009
• “Baseline characteristics were similar
except for more men in the oral iron group
compared with the ferumoxytol group
(62.9% versus 50.0%, P 0.04). Mean
baseline laboratory measures were similar
between the two treatment groups.”
12
2. Comment on This
• For hypothesis driven research, should we
test for normality before using a t-test, and
if we reject try to transform the data?
13
Nissen Article
• JAMA. 2008;299(13):1561-1573. Comparison of
Pioglitazone vs Glimepiride on Progression of
Coronary Atherosclerosis in Patients With Type
2 Diabetes
• ‘For continuous variables with a normal
distribution, the mean and 95% confidence
intervals (CIs) are reported. For variables not
normally distributed, median and interquartile
ranges are reported and 95% CIs around median
changes were computed using bootstrap
resampling.’ (N=273 vs 270 in groups)
14
2. Testing Assumptions
Diagnostic Test
Passes
Fails
15
16
3. Testing a Hypothesis (P-Value)
• Put a statement on Trial: “Null Hypothesis”
• ISIS #2 (International Sudden Infarct Study
#2): The five week mortality rates for
Streptokinase and Placebo are equivalent in
patients with recent MIs
• Results: Strep(791/8592=9.2%) vs.
Plac(1029/8595=12.0%)
17
3. P-Value
• P=3.8*
-9
10
• If you replicated the experiment in
a population where the null
hypothesis was true, there is a 3.8
in a billion chance of seeing a
difference at least as extreme in
either direction (2-sided)
18
3. ISIS #2 Reference
• ISIS #2 Collaborative Group. (1988)
Randomised trial of intravenous
streptokinase, oral aspirin, both, or neither
among 17,187 cases of acute myocardial
infarction: ISIS 2, Lancet 2: 349-360.
19
3. P-Value and Proof by
Contradiction
• What is the probability that if you replicated your
experiment in a target population where your null
hypothesis is true that you would see differences
at least as extreme as what you actually observed.
If this value (the p-value) is small it is evidence
against this null hypothesis.
• Analogy is beyond a reasonable doubt. Science
uses 5% arbitrarily as “reasonable” doubt in most
cases.
20
3. Was this overkill in terms of
sample size
• Suppose the results were 79/859 vs.
103/860 (same percentages of 9.2% vs.
12.0% but with one tenth the sample size).
• Now P=0.071 (7.1%), and would not be
statistically significant. Would we be using
this clot buster today? It was the
biostatistician, Sir Richard Peto who
determined this sample size.
21
3. ISIS #2:
• Any other questions about the study?
22
3. ISIS #2 Issues
• Who was watching the store. Accrual took
3.5 years and outcome was known for each
patient within five weeks.
• Always report a sample size justification in
your papers (Provenzano, slide 12, did not).
23
4. Real Example
• Coronary Drug Project
24
The Coronary Drug Project
Research Group (1980)
• Influence of adherence to treatment and
response of cholesterol on mortality in the
Coronary Drug Project. NEJM 303: 10381041.
• Double blind randomized study of
Clofibrate vs. Placebo in men who had prior
MI.
25
Compliers vs. Not on Drug
Coronary Drug Project
25
5Yr Mortality(%)
20
15
C_Drug
NC_Drug
10
5
0
C_Drug
NC_Drug
26
Compliers vs. Not
27
Drug vs. Placebo
28
Coronary Drug Project Take
home Message
What can this study teach us about Clinical
Studies?
29
Intent-to-Treat
• The gold standard for analyzing randomized
clinical trials is Intent-to-treat. Patients are
analyzed in the groups they were assigned
to, irrespective of what they actually
received.
30
31
4. Real UF Example:
• Effectiveness of Nesiritide on Dialysis or
All-Cause Mortality in Patients Undergoing
Cardiothoracic Surgery. Clinical
Cardiology. 2006; Jan;29(1):18-24. with T.
Beaver et. al.
• Motivation: Shands impression was that it
was harmful and costly.
32
4. Nesiritide Example
• Study Null Hypothesis: 20 day
death/dialysis rate in patients getting
nesiritide within two days of surgery have
the same death rate as “similar” patients not
getting it.
• Design Suggestions?
33
4. Possible Designs (+/-)
• Observational: Historical Control (Compare
period before drug) to period after drug
started to be given to a sizable fraction (gap
during ramping up of use). Must include all
comers and use electronic chart review.
• Observational: Compare those getting to
those not getting the drug.
• Randomized controlled prospective trial
34
4. Sources of Variation
• Within treatments, why might we not get
the same result for every patient?
• Historical Control?
• Comparing concurrent nesiritide vs. not?
• Randomized prospective trial?
35
4. Sources of Bias (Confounders)
• Why might we see differences that might be
totally unrelated to the treatment (nesiritide
vs. not)?
• Historical Control?
• Comparing concurrent nesiritide vs. not?
• Randomized prospective trial?
36
4. Nesiritide: Propensity Scoring
• Actual Design: Compared Nesiritide vs. Not
by Propensity Score Matching.
• Using 12 key covariates, we estimated the
probability that a patient would get
Nesiritide given these covariates. Then we
matched the nesiritide patients to nonnesiritide patients for the propensity, and
did a matched analysis.
37
4. Conclusions
• Nesiritide showed no significant difference
(inconclusive) within CABG patients,
• Nesiritide showed promise in aneurysm subjects
with baseline elevated SCR, but was inconclusive
in other such patients.
• Run a future randomized double-blind trial in
aneurisms with elevated SCR (Just completed and
close to being in press with an inconclusive
result.)
38
4. Conclusion (continued)
• Note that the Shands study data were very
important in designing the randomized
follow-up study, in terms of the number of
subjects needed (power analysis).
39
Take-home Messages
• Rely on Evidence-Based Medicine. Conventional
wisdom can easily lead us astray.
• The objective of Statistics is to make informed
inferences about a population, based on a sample. It is
imperative to quantify the uncertainty.
• The P-value is a quantity that allows us to infer
something about whether a scientific hypothesis is false.
• Non-significant results are inconclusive
• Randomization and intent-to-treat are vital
components in sound clinical research
40
Design One Together
• Medical Question: Does Caffeine
Withdrawal cause Headaches?
41
Eligibility
42
Design
• What are the sources of variation besides
caffeine consumption?
• How do we control caffeine consumption
• Should we use deception—hide purpose of
study? Is this ethical?
43
Design
• Pre-Post?
• Double Blind Parallel Study?
• Double Blind Crossover Study?
44
Forensics for Irregularity
Phenylephrine
45
Phenylephrine Crossover Studies
46
Phenylephrine (Baseline NAR)
Study (10 mg vs
Placebo)
1 (N=16) (EB)
Std Dev
2.0
CV=100SD/Mea
n
15.3%
2 (N=10) (EB)
0.9
6.7%
3 (N=16)
7.8
36.3%
4 (N=15)
9.5
35.6%
5 (N=16)
6.2
29.3%
6 (N=16)
9.8
40.4%
7 (N=14)
9.4
35.3%
47
How do we test for Data
Irregularities?
• Background: Baseline NAR (Nasal Airway
resistance) measures are typically xx.x (e.g.
20.2), and are always based on the mean of
10 observations (5 from each nostril).
• What null hypothesis can we test to find
potential irregularities? What P-value might
we use to declare significance?
48
Baseline Last Digit (3rd sign)
Study 1
Study 2
0:2
1:4
2:2
3:6
5
2
1
9
4:2
5:23
6:8
4
7
5
7:9
8:3
9:5
10
3
4
49
• Thank You!!
50
Coronary Drug ProjectCoronary
Drug Project Data
Five Year Mortality (Clofibrate)
• Compliers: 15.0% (15.7%) (N=708)
• Non-Compliers: 24.6%(22.5%) (N=357)
• Compliers took >80% of their meds to death
or to 5 years whichever was first.
• In () is 5 year mortality, adjusted for
prognostic factors.
51
Coronary Drug Project
Five Year Mortality (Placebo)
• Compliers: 15.1% (16.4%) (N=1813)
• Non-Compliers: 28.2%(25.8%) (N=882)
• Compliers took >80% of their meds to death
or to 5 years whichever was first.
• In () is 5 year mortality, adjusted for
prognostic factors.
52
Coronary Drug Project
Five-year mortality (As randomized)
• Clofibrate: 20.0% (N=1103)
• Placebo: 20.9% (N=2789)
• NB: Compliance could not be assessed in a
small number of patients.
53
Download