effect size - Dr. Marcia Testa

advertisement
Clinical Investigation and
Outcomes Research
Statistical Issues in Designing
Clinical Research
Marcia A. Testa, MPH, PhD
Department of Biostatistics
Harvard School of Public Health
1
Objective of Presentation
• Introduce statistical issues that are critical
for designing a clinical research study and
developing a research protocol, with a
special focus on
• Power and sample size
– Readings: Textbook, Designing Clinical Research, Chapter 6,
Estimating Sample Size and Power: Applications and Examples
and Chapter 19, Writing and Funding a Research Proposal.
2
Research Proposal
• Carefully planning the analytical and
statistical methods is critical to any clinical
research study.
• An outline of the main elements of a research
proposal are listed in Table 19.1 of your
textbook.
• Two very important components of the
“Research Methods” section are
“Measurements” and “Statistical Issues”.
3
Measurement and Statistical Components
of the Research Proposal
• Measurements – you first must define:
– Main predictor/independent variables
(intervention, if an experiment)
– Potential confounding variables
– Outcome/dependent variables
• Statistical Issues – you should outline:
– Approach to statistical analyses
– Hypothesis, sample size and power
4
Power and Sample Size
• Depends upon:
–
–
–
–
measurements and study hypotheses
statistical test used on primary outcome
study design
variability and precision of the dependent
measure
– alpha (type 1 error)
– effect size
– number of hypotheses that you want to test
5
Types of Errors
Confidence
6
What is power analysis?
• Statistical power:
– the probability of correctly identifying a trend
or effect
(Being correct that there is a trend or effect)
• Statistical confidence:
– the probability of not identifying a false trend
or effect (false alarm)
(Being correct that there is no trend)
7
Why is power analysis useful in
research planning?
• Clinical research is primarily concerned with
detecting improvements or worsening due to
interventions or risk factors.
• Power analysis answers the question:
“How likely is my statistical test to
detect important clinical effects given
my research design?”
8
Elements of power analysis
•
Variability (stochastic noise in the data)
•
Sample Size (accumulated information)
• time horizon (e.g.,survival analysis)
– sampling frequency
– replication
–
Beyond our
control
Within our
control
Confidence level/statistical test
9
Dealing with Variability
• Variability is often a barrier to detection
• Minimizing variability is often the goal
• Choose variables with a high signal to noise ratio
• Caution: these variables may be less sensitive to
change
• Sample within a more homogeneous population
• Caution: greater homogeneity often means we are
limiting the inferences we can make. At the extreme we
would have highly reliable results that are for the most
part clinically irrelevant
10
The Balancing of Cost and Power
Power Curve
high
optimal use
of resources
100
C
Sample Power
B
effective but
inefficient use
of resources
A
low return on
investment
low
0
small
Low Cost
Sample Size
large
11
High Cost
Limitations of power analysis
• Power analysis is only as good as the information you
provide:
– How appropriate is the statistical test?
– How accurate are estimates of variability?
• Power analysis can’t tell you:
– How much power is enough?
– What’s a meaningful change?
12
How much power is enough?
•
There is no universal standard
•
What is more important?
• Not missing a trend?
 Power > Confidence
•
•
Reporting a false trend?
 Confidence > Power
Usual range for confidence and power: 80-95%
13
Power (%)
What’s a meaningful change?
100
100
80
80
60
60
40
40
20
20
-20
-15
-10
-5
Annual Rate of Decline (%)
0
Power = 95% for
declines = -17%
Example: You want
to be able to detect
the withdrawal
(decline in
participation) from a
diet and exercise
program under
“usual care”.
effect size
14
Power (%)
What’s a meaningful change?
100
100
80
80
60
60
40
40
20
20
-20
-15
-10
-5
Annual Rate of Decline (%)
Power = 80% for
decline = -13%
0
effect size
15
Power (%)
What’s a meaningful change?
100
100
80
80
60
60
40
40
20
20
-20
-15
-10
-5
Annual Rate of Decline (%)
Power = 60% for
decline = -10%
0
effect size
16
Is a 17% annual withdrawal
rate clinically meaningful?
• Example – Start with 100 patients
Year
No. of
individuals
1
100
2
83
3
69
4
57
5
47
17% withdrawal
after one year
After 5 years, more than
50% of your original
population has
withdrawn for the
program
17
What is a meaningful change?
• Most people would concur that a withdrawal of
17% per year from a diet and exercise is large
enough to be considered clinically meaningful.
• However, how meaningful are smaller withdrawal
rates (13%, 10%, 5% 1%) ?
• This can not be answered using a formula.
• The answer will depend on the research
objectives and clinical objectives, and the
research budget.
18
1. Chose Statistical Hypothesis
• Set up Null Hypotheses: Examples
1. Compare sample group mean to a known value 0
– Mean of group = Known population mean
(H0 :   0 ) vs (HA :   0 )
2. Compare two sample group means
– Mean Group (1) = Mean Group (2)
(H0 : 1  2 ) (HA : 1  2 )
Note – because you are testing “not equal” in the alternative hypothesis
() you have selected a “two-tailed test”.
19
2. Chose Statistical Test
• There are many statistical tests that are used
in clinical research, however, for this
presentation we will restrict ourselves to the
following:
Outcome/Dependent Variable
Predictor/Independent
Variable
Dichotomous
Continuous
Dichotomous
Continuous
Chi-squared test
t-test
t-test
Correlation Coefficient
20
3. Chose Alpha Level and Effect Size
• Alpha = 0.05 – probability of rejecting
the null when the null is true = 5%
– You will conclude that there was a
difference 5% of the time when there really
was no difference
• You would like to detect a difference of
X units or higher (effect size) in one
group as compared to the other
21
4. Need SD of the Dependent Variable
• Use historical data if available
• Use the sample data from a feasibility study
(e.g. 15 subjects)
• If you have no data to serve as a reference,
you have to make an educated guess. Here’s
a trick if your data is mound shaped and
approximately normal.
– Choose a representative low and high from your
clinical experience, take the difference and divide
by 4.
– = ((high) – (low))/4 = SD estimate
22
5. Calculate a Standard Effect Size
• Effect size/standard deviation =
standardized effect size
• Choose the  error
– Remember Power = 1 - , so a type 2 error
of 0.20 yields a power of 0.80
– Power is the probability of failure to reject
the null hypothesis when the null
hypothesis is false  concluding no
difference when there really is a difference.
23
Power and Sample Size
Example
Continuous Glucose Monitoring
Diabetes Study
24
CGM Study
Two-group Comparison
• How many subjects do we need to be
able to detect a difference in CGM
mean daily glucose between patients on
Lantus and Apidra insulin versus Premix
analogue insulin?
– Before you can answer this question, you
must gather some more information.
25
Break down the problem
• CGM glucose at Week 12 = dependent
variable of interest
• Want to compare two groups – each
group has different patients
• Simple independent t-test
• Need SD of daily glucose
• Need to specify how large an effect you
want to detect
26
Data from feasibility study
Week 12 Data
27
CGM Study
Two-group Comparison
• Compare Lantus & Apidra to Premix at
12 weeks
• Feasibility data available on 15 patients
• Independent t test will be used
• Alpha = 0.05, beta = 0.20, 2-tailed test
• Power = 0.80
– Null: Mean L & A = Mean Premix
(H0 : 1  2 ) (HA : 1  2 )
28
CGM Study
Two-group Comparison
• SD from 15 patient feasibility study = 33
29
Estimating Sample Size of CGM Study
• Alpha = 0.05 for 1-sided, 0.025 for 2-sided test
• Beta = 0.20, hence, power = 0.80
• Clinically meaningful effect = 10 mg/dL
difference (based upon clinical judgement)
• SD CGM glucose = 33 (from feasibility study)
• Standardized effect = 10/33 = 0.30
• Check Appendix 6A in textbook for power
• Table 6A says you need 176 subjects per
treatment group for a total of 352 subjects.
30
http://www.epibiostat.ucsf.edu/biostat/sampsize.html
This is a
directory of
where you can
find sample
size and power
programs
31
Useful Power Calculator Website
http://www.stat.uiowa.edu/~rlenth/Power/
32
Online Power/Sample Size
Power = 0.8, detect ES
= 0.3 (10 mg/dL)
Power = 0.9, detect ES
= 0.35 (11.6 mg/dL)
N = 175 per group
N = 175 per group
33
Online Power/Sample Size
Power = 0.8, detect ES = 0.5
Power = 0.8, detect ES = 1.57
(16.5 mg/dL)
(52 mg/dL)
Sample size = 64/group
Sample size = N1 = 7, N2 = 834
CGM Study
Paired Comparison
• Useful for longitudinal assessments
• CGM Study – You want to detect a
decrease between Week 12 and Week
24 of 10 mg/dL
• You only have one group of patients,
but they are measured on two separate
occasions (Week 12 and Week 24).
35
15 patient feasibility
study
Wk 0
Wk 12 Wk 24
What is the mean glucose,
parameter for the subjects
at Week 12 versus Week
24?
For simplicity, we are going
to use the single value
summary mean glucose
levels at Wk 12 and Wk 24.
36
Power and Sample Size for
Paired t-test
Power = 0.8, detect ES = 0.30
Need 92 subjects or “pairs” (Wk 12
and Wk 24) data.
Remember with two independent
groups we needed 175 subjects per
group for a total of 350 subjects.
When patients serve as their own
control, you need “fewer” subjects to
detect an equivalent effect size (ES)
with the same power.
37
HRV Study
Correlation and Multiple Regression
• Single-Group Study
– Session 1 – Signal 1  HRV
– Session 1 – Signal 2  BP
– Demographic variables = Age, Gender
– Clinical characteristics = Disease Status
• Suppose you want to look at associations
between HRV, BP, demographic and clinical
characteristics -- use bivariate correlation
coefficient for 2 variables of multiple
regression R2 multiple predictors.
38
Power and Sample Size for
Correlations (H0: r = 0)
Power = 0.0.80, r = 0.3, ES = R2
= 0.09, Sample size = 85
Power = 0.97, r = 0.4, ES = R2 =
0.16, Sample size = 85
Only 1 “regressor” or predictor
39
Power and Sample Size for
Correlations (H0: r = 0)
Power = 0.80, r = 0.3, ES = R2 =
0.09, Sample size = 139, if number
of ipredictor variables = 5
Power = 0.80, r = 0.3, ES = R2 =
0.09, Sample size = 177, if number
of predictor variables = 10
40
Power and Sample Size for
Test of Two Proportions
You want to detect a difference
between two proportions.
Example: How many patients do you
need in each group to detect a
difference in the numbers of patients
who adhere to diet and exercise at
the end of 5 years.
Old Program = 0.5 Adhere
New Program= 0.7 Adhere
Alpha = 0.05, Power = 0.8.
You will need 103 individuals in each
group.
41
Final Points
• Design your study such that you will have a
sufficient number of subjects to be able to
detect the effects that are clinically
meaningful (high power).
• If you have a limited budget, and you can not
afford to increase your sample size to the
necessary levels, and lowering the variability
is not feasible, you should consider
alternative designs and hypotheses rather
than proceeding with a study design with low
power.
42
Download