Minimal Detectable Difference Calculations

advertisement
Sample Size Estimation, Power Analysis,
and Minimal Detectable Difference
Calculations
Sample Size Estimation,
Power Analysis,
and
Minimal Detectable Difference
Calculations
John Sorkin, M.D., Ph.D.
Chief of Biostatistics and Informatics
Baltimore VA GRECC and
University of Maryland School of
Medicine Claude D. Pepper OAIC
7/29/2008
John Sorkin M.D. Ph.D.
•
•
•
•
Internal Medicine
Endocrinology
Gerontology
Geriatrics
• Ph.D. (Epidemiology)
What Motivates This Session?
John Sorkin M.D. Ph.D.
•
•
•
•
Internal Medicine
Endocrinology
Gerontology
Geriatrics
• Ph.D. (Epidemiology)
– Out of the closet statistician
What Motivates This Session?
• Sample size, power, and minimal
detectable difference calculations play a
role in
– Designing studies
– IRB applications
– Grant applications
– Evaluating study results
© 2001 John Sorkin. Do not use without
permission
Aim
• Describe the concepts that must be
understood when a
• sample size,
• power, or
• minimal detectable difference
calculation is performed
• This presentation is not intended to be
mathematically rigorous.
1
Sample Size Estimation, Power Analysis,
and Minimal Detectable Difference
Calculations
7/29/2008
Sample Size
• The number of subjects that need to be
studied if a significant difference
between treatment and control groups is
to be shown
– Assuming a known treatment effect
• Sample size analyses help determine
the cost of a study.
Power Analyses
• The probability that a treatment effect
will be correctly identified if one exists.
– Assuming a known sample size
• Power analyses help determine a
study’s probability of success i.e., the
probability that a proposed study will
find a treatment effect.
Minimal Detectable Difference
• The smallest treatment effect that can
be identified.
Primum non Nocere
• Minimal detectable difference analyses
helps determine the feasibility of a study
• It is unethical to perform any experiment
that puts subjects at risk for injury or
other harm if the study as designed has
no hope of answering the question the
study was designed to address.
Basic Concepts
Basic Concepts
– Assuming a known sample size
•
•
•
•
•
•
•
•
•
Study design
Mean value of the characteristic
Distribution of the measurements
Variability of the measurements (SD)
Statistical significance (α error rate)
Number of subjects per group
Effect size
One or two-sided test
Power (1-β)
© 2001 John Sorkin. Do not use without
permission
•
•
•
•
•
•
•
•
•
Study design
Mean value of the characteristic
Distribution of the measurements
Variability of the measurements (SD)
Statistical significance (α error rate)
Number of subjects per group
Effect size
One or two-sided test
Power (1-β)
2
Sample Size Estimation, Power Analysis,
and Minimal Detectable Difference
Calculations
Study Design
• Single sample
• Two samples
• Multiple samples
7/29/2008
Single Sample
• A characteristic is measured in a group.
The group receives a treatment and the
characteristic is measured again.
Single Sample - Example
Two Samples
• Serum cholesterol concentration is
measured in a single group of subjects.
The subjects receive a drug and the
subjects’ cholesterol concentration is
measured again.
• A characteristic is measured in a control
group and an experimental group. The
experimental group receives a
treatment, the control group receives a
placebo. The characteristic is measured
in both group after treatment.
Two Samples – Example
Multiple Samples
• Pulse rate is measured in a control
group and an experimental group. The
experimental group receives a pill
containing caffeine. The control group
receives a sugar pill. Pulse rate is
measured in both groups after
treatment.
© 2001 John Sorkin. Do not use without
permission
• A characteristic is measured in a control
group and several experimental groups.
The experimental groups receive
different treatments, the control group
receives a placebo. The characteristic is
measured in all groups after the
treatments.
3
Sample Size Estimation, Power Analysis,
and Minimal Detectable Difference
Calculations
7/29/2008
The Mean Value of the
Characteristic Being Studied
Basic Concepts
•
•
•
•
•
•
•
•
•
Study design
Mean value of the characteristic
Distribution of the measurements
Variability of the measurements (SD)
Statistical significance (α error rate)
Number of subjects per group
Effect size
One or two-sided test
Power (1-β)
• Also called
– Null hypothesis or
– H0.
• Source of estimates
– Pilot study
– Literature review
– Prior knowledge.
H0, the Null Hypothesis
Probability
0.25
0.2
0.15
0.1
0.05
5
10
15
20
x
Distribution of the Measurements
of the Characteristic
•
•
•
•
Normal
Poisson
Binomial
Exponential
Basic Concepts
•
•
•
•
•
•
•
•
•
Study design
Mean value of the characteristic
Distribution of the measurements
Variability of the measurements (SD)
Statistical significance (α error rate)
Number of subjects per group
Effect size
One or two-sided test
Power (1-β)
Distribution of the Measurements
of the Characteristic
Probability
0.25
0.2
0.15
0.1
0.05
5
© 2001 John Sorkin. Do not use without
permission
10
15
20
x
4
Sample Size Estimation, Power Analysis,
and Minimal Detectable Difference
Calculations
7/29/2008
Variability of the
Measurements
Basic Concepts
•
•
•
•
•
•
•
•
•
Study design
Mean value of the characteristic
Distribution of the measurements
Variability of the measurements (SD)
Statistical significance (α error rate)
Number of subjects per group
Effect size
One or two-sided test
Power (1-β)
• Quantified by the standard deviation
(SD) of the characteristic.
Variability of the
Measurements: SD
Variability of the
Measurements: SD
Probability
Probability
0.25
0.4
0.2
0.3
0.15
0.2
0.1
0.1
0.05
5
10
15
Basic Concepts
•
•
•
•
•
•
•
•
•
Study design
Mean value of the characteristic
Distribution of the measurements
Variability of the measurements (SD)
Statistical significance (α error rate)
Number of subjects per group
Effect size
One or two-sided test
Power (1-β)
© 2001 John Sorkin. Do not use without
permission
20
x
2.5
5
x
7.5 10 12.5 15 17.5 20
Significance
• Also know as:
– Probability of a Type-I error, α.
– The probability of incorrectly saying there
is a treatment effect when there is none.
5
Sample Size Estimation, Power Analysis,
and Minimal Detectable Difference
Calculations
7/29/2008
Significance:
Significance
Probability of saying there is a
treatment effect when there is none
Absolute Truth:
H0: Treatment has no effect
Probability
True
0.25
You
State
0.2
0.15
0.1
5
10
15
Basic Concepts
Study design
Mean value of the characteristic
Distribution of the measurements
Variability of the measurements (SD)
Statistical significance (α error rate)
Number of subjects per group
Effect size
One or two-sided test
Power (1-β)
Basic Concepts
•
•
•
•
•
•
•
•
•
False
α
• The probability of saying there is a
treatment effect when there is none
0.05
•
•
•
•
•
•
•
•
•
True
False
Study design
Mean value of the characteristic
Distribution of the measurements
Variability of the measurements (SD)
Statistical significance (α error rate)
Number of subjects per group
Effect size
One or two-sided test
Power (1-β)
© 2001 John Sorkin. Do not use without
permission
20
x
Number of Subjects in Each
Group
• Single sample
– Only one number needed
• Two or more samples
– Sample size for each group
– Groups need not be the same size.
Effect Size
• Increase (or decrease) in the mean brought
about by treatment.
• Also known as
– Alternate hypothesis, post treatment mean, Ha.
• Source of estimates
– Pilot study
– Literature review
– Prior knowledge.
6
Sample Size Estimation, Power Analysis,
and Minimal Detectable Difference
Calculations
7/29/2008
Distribution of Measurements
Resulting in Ha: SDa
Ha: The Post Treatment Mean
Probability
Probability
0.25
0.25
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
5
10
15
20
x
5
Distribution of H0 and Ha
Probability
0.25
0.2
0.15
0.1
0.05
5
10
15
20
x
One or Two-Sided Test
• Two-sided
– Can the intervention both increase and decrease
the pre-treatment mean?
• One sided
– Does the intervention only increase the mean?
– Does the intervention only decrease the mean?
• Use one-sided test infrequently!
10
15
20
x
Basic Concepts
•
•
•
•
•
•
•
•
•
Study design
Mean value of the characteristic
Distribution of the measurements
Variability of the measurements (SD)
Statistical significance (α error rate)
Number of subjects per group
Effect size
One or two-sided test
Power (1-β)
One or Two-Sided Test
• Two-sided
– Can the intervention both increase and decrease
the pre-treatment mean?
• One sided
– Does the intervention only increase the mean?
– Does the intervention only decrease the mean?
• Use one-sided test infrequently!
– Consider recent Estrogen supplementation study
© 2001 John Sorkin. Do not use without
permission
7
Sample Size Estimation, Power Analysis,
and Minimal Detectable Difference
Calculations
7/29/2008
Power
Basic Concepts
•
•
•
•
•
•
•
•
•
Study design
Mean value of the characteristic
Distribution of the measurements
Variability of the measurements (SD)
Statistical significance (α error rate)
Number of subjects per group
Effect size
One or two-sided test
Power (1-β)
Power
Absolute Truth:
H0: Treatment has no effect
True
• Also know as
– 1-β.
• The probability of correctly saying there
is a treatment effect when there is a
treatment effect.
Power: Probability of saying there
is a treatment effect when there is a
treatment effect
Probability
False
β
1-β
0.25
• The probability of saying there is a
treatment effect when there is a
treatment effect
0.1
You
State
True
False
α
0.2
0.15
0.05
5
Software
• DSTPLAN
(http://odin.mdacc.tmc.edu/anonftp/)
– FREE! Excellent documentation.
– Runs on Macintosh, DOS, and Windows
• Analyst application is SAS
• S-Plus
• PASS 2000 – NCSS 2000
© 2001 John Sorkin. Do not use without
permission
10
15
20
x
Books
• Cohen, J. Statistical Power Analysis for
the Behavioral Sciences, 2nd Edition.
Lawrence Erlbaum Associates, Inc.
Hillsdale New Jersey 1988.
• Mace, AE. Sample-Size Determination.
Robert E. Krieger Pub Co., Huntington,
New York 1974.
8
Sample Size Estimation, Power Analysis,
and Minimal Detectable Difference
Calculations
References
Structural Equation Modeling
• MacCallum RC, Brown MW, Sugawara HM.
Power analysis and determination of sample
size for covariance structure equation
modeling. Psychological Methods 1(2), 130149.
• MacCallum RC, Brown MW, Hong S. Power
analysis in covariance structure equation
modeling using GFI and AGFI. Multivariate
Behavorial Research 32(2), 193-210.
7/29/2008
Problem 1 – Sample Size
• A clinical trail of a drug is planned in
which half of the subjects will receive an
active drug, the other half of the
subjects will receive a placebo.
• The mean HDL2 cholesterol
concentration in the population is
10±2.5 mg/dl (mean ± SD).
Problem 1 (cont)
• A pilot study indicates that the drug
increases mean HDL2 concentration to
14 mg/dl.
• How many subjects need to be studied
in each group assuming a significance
(α) of 0.05 and a power (1-β) of 0.80?
Assume HDL2 concentration is normally
distributed on repeated testing.
© 2001 John Sorkin. Do not use without
permission
9
Sample Size Estimation, Power Analysis,
and Minimal Detectable Difference
Calculations
7/29/2008
Describing the Power Analysis
• Prior data from our laboratory indicates
that the mean HDL2 concentration in our
population is 10±2.5 mg/dl (mean ± SD)
and that the drug increases HDL2
concentration an average of 4 mg/dl to
14 mg/dl.
Describing the Power Analysis
(cont)
• Sample size was calculated for a twotailed comparison of an experimental
group to a control group with a
significance (α) of 0.05 and a power (1β) of 0.80. Under these assumptions,
we calculate that we will need 8
subjects per group to complete the
study. Assuming a 50% drop out rate,
we will enroll 16 subjects per group.
© 2001 John Sorkin. Do not use without
permission
Problem 2 - Power
• A clinical trail of a drug is planned in
which half of the subjects will receive an
active drug, the other half of the
subjects will receive a placebo.
• The mean HDL2 cholesterol
concentration in the population is
10±2.5 mg/dl (mean ± SD).
10
Sample Size Estimation, Power Analysis,
and Minimal Detectable Difference
Calculations
7/29/2008
Problem 2 – Power (Cont)
• A pilot study indicates that the drug increases
mean HDL2 concentration to 14 mg/dl.
• Ten subjects will be enrolled in the control
group and ten subjects in the experimental
group.
• What is the power of the study (1-β)
assuming a significance (α) of 0.05? Assume
HDL2 concentration is normally distributed on
repeated testing.
Problem 3 – Minimal
Detectable Difference
• A clinical trail of a drug is planned in
which half of the subjects will receive an
active drug, the other half of the
subjects will receive a placebo. The
mean HDL2 cholesterol concentration in
the population is known to be 10±2.5
mg/dl (mean ± SD).
Problem 3 (cont)
• Five subjects will be enrolled in the control
group and five subjects in the experimental
test.
• What is the minimal increase in HDL2
concentration that can be detected assuming
a significance (α) of 0.05 and a power (1-β) of
0.80? Assume HDL2 concentration is
normally distributed on repeated testing.
© 2001 John Sorkin. Do not use without
permission
11
Sample Size Estimation, Power Analysis,
and Minimal Detectable Difference
Calculations
7/29/2008
Suggestions
• Consult a statistician (early and often).
– Include a statistician in your grant.
• Adjust all calculations for loss to followup.
• Avoid one-sided tests.
– If you use a one-sided test, justify its use!
Suggestions
• Use a significance of 0.05 or better, e.g.
0.01
– If you use a value >0.05 (e.g. 0.10) justify
its use.
• Use a power of 0.80 or better, e.g. 0.90
If you use a value <0.80, e.g. 0.60 justify its
use.
How to Contact Me
(I Can’t Run and I Can’t Hide)
John Sorkin, M.D. Ph.D.
University of Maryland School of Medicine
Baltimore VA Medical Center
10 North Greene Street (BT/18/GR)
Baltimore MD 21201
410 605-7119
JSorkin@grecc.umaryland.edu
Response Types
Response Types
Continuous Outcome
Two-State
• Analyses done so far assume an
continuous outcome (or response)
– Used to study a continuous variable that
can take on a large range of values
• Serum cholesterol
• IQ
• Height
© 2001 John Sorkin. Do not use without
permission
• To study a variable that can take on
only one of two states
– Disease vs. non-diseased
– Yes vs. no
– Dead vs. alive
• Other techniques are needed
12
Sample Size Estimation, Power Analysis,
and Minimal Detectable Difference
Calculations
Response Types
Two-State Response
• Follow-up time
– Not important
– Important
7/29/2008
Two-State Response
Yes vs. No
(Follow-up time not important)
• Follow-up time not of interest
– Binomial distribution
• Other techniques are needed
– Not addressed in this presentation.
• Survival time
Two-State Response
Survival Time
Survival: Yes vs. No
(Follow-up time important)
• Survival in two groups
• Exponential distribution
– Treated vs. untreated
– Follow-up (i.e. survival) time of primary interest.
• Survival time
– Assumes constant risk of death
• Risk is NOT a function of age
– Can be used where age does not influence
survival
– Time to death
– Time to failure
• Short term survival
• Survival immediately after treatment
• Analyses matched by age
• Does age affect survival?
– No:
– Yes:
Exponential Distribution
Exponential distribution vs.
Proportional hazards model
Normal Distribution
Normal Distribution
Requires Two Parameters
Probability vs. Value
0.08
Mean 0
SD 5
0.07
Probability( x ) =
e − ( x − μ ) 2σ
σ 2π
2
2
μ = Mean
σ = Standard Deviation
0.06
0.05
Probability 0.04
0.03
0.02
0.01
0
-30
-20
-10
0
10
20
30
X
© 2001 John Sorkin. Do not use without
permission
13
Sample Size Estimation, Power Analysis,
and Minimal Detectable Difference
Calculations
7/29/2008
Normal Distribution
Normal Distribution
Probability vs. Value
Probability vs. Value
0.08
0.08
Mean 0
SD 5
0.07
Mean -10
SD 5
0.07
0.06
Mean 0
SD 5
0.06
0.05
0.05
Probability 0.04
Probability 0.04
0.03
0.03
Mean 0
SD 10
0.02
0.02
0.01
0.01
0
Mean 0
SD 10
0
-30
-20
-10
0
10
20
30
-30
-20
-10
X
0
10
20
30
X
Normal Distribution
Normal Distribution
Probability vs. Value
Standard Deviation Defines Spread
0.08
0.08
Mean 0
SD 5
0.07
Mean 0
SD 5
0.07
0.06
0.06
0.05
0.05
Probability 0.04
Probability 0.04
0.03
0.03
0.02
0.02
0.01
0.01
0
Mean 0
SD 10
0
-30
-20
-10
0
10
20
30
-30
-20
-10
X
0
10
20
30
X
Normal Distribution
Exponential Distribution
Mean Defines Location
0.08
Mean -10
SD 5
0.07
Requires One Parameter
Mean 0
SD 5
MortalityRisk ( x ) = λe − λt
0.06
0.05
1
Probability 0.04
λ
0.03
= Mean
Mean 0
SD 10
0.02
0.01
0
-30
-20
-10
0
10
20
30
X
© 2001 John Sorkin. Do not use without
permission
14
Sample Size Estimation, Power Analysis,
and Minimal Detectable Difference
Calculations
7/29/2008
Exponential Distribution
Exponential Distribution
Mortality Risk vs. Time
1.0
Requires One Parameter
Lambda=1
0.8
MortalityRisk ( x ) = λe − λt
0.6
1
λ
1
λ2
Mortality
Risk
= Mean
0.4
= Standard Deviation
0.2
0.0
0.0
1.0
2.0
3.0
4.0
Time
Exponential Distribution
Exponential Distribution
Mortality Risk vs. Time
1.0
Requires One Parameter
Lambda=1
0.8
MortalityRisk ( x ) = λe − λt
0.6
1
Mortality
Risk
λ
0.4
1
Lambda=0.5
λ2
0.2
0.0
1.0
2.0
3.0
λ
4.0
Time
Basic Concepts
•
•
•
•
•
•
•
•
•
= Standard Deviation
ln2
0.0
Study design
Mean value of the characteristic
Distribution of the measurements
Variability of the measurements (SD)
Statistical significance (α error rate)
Number of subjects per group
Effect size
One or two-sided test
Power (1-β)
© 2001 John Sorkin. Do not use without
permission
= Mean
= Median ≅
0.693
λ
Basic Concepts for Survival
• Study design
• Mean value of the characteristic
– Mean (or median) survival
• Distribution of the measurements
– Exponential
• Variability of the measurements (SD)
• Statistical significance (α error rate)
• Number of subjects per group
– Accrual time and rate
• Effect size
– Alternative survival (mean or median)
• One or two-sided test
• Power (1-β)
• Follow-up time
15
Sample Size Estimation, Power Analysis,
and Minimal Detectable Difference
Calculations
7/29/2008
Follow-up Study
Accrual and Follow-up Time
Survival Time
Taking Age Into Account
• Used for Cox Proportional Hazards
Regression
• Simulations
– Hard to do
– Take age into account
Problem 4 – Sample Size
Problem 4 (cont)
• A clinical trail of a drug is planned in
which half of the subjects will receive an
active drug, the other half of the
subjects will receive a placebo.
• Mean survival in untreated subjects is
know to be 2 months, and has been
successfully modeled assuming
mortality risk follows a pattern of
exponential decay.
• A pilot study indicates that the drug
increases mean survival to four months.
• How many subjects need to be studied
in each group assuming a significance
(α) of 0.05 and a power (1-β) of 0.80?
© 2001 John Sorkin. Do not use without
permission
16
Sample Size Estimation, Power Analysis,
and Minimal Detectable Difference
Calculations
7/29/2008
Describing the Power Analysis
• We performed sample size calculations under
the assumption that the risk for mortality in
the control and treatment groups follows
exponential decay. Our calculations were
performed assuming a Type I error rate (α) of
5% and a power (1-β) of 80%.
• Clinical studies have shown an average pretreatment survival of 2 months. Based on
preliminary studies in our laboratory, we
believe that our new treatment will result in an
average post-treatment survival of 4 months.
© 2001 John Sorkin. Do not use without
permission
17
Sample Size Estimation, Power Analysis,
and Minimal Detectable Difference
Calculations
Describing the Power Analysis
(cont.)
• Base on the assumptions noted above, we
will need to recruit and enroll a total of 90
subjects (45 control and 45 who will receive
our new drug) over a one-year period, and
then follow subjects for four years beyond the
one-year recruiting period. Given the short
nature of the study and the need for universal
need for medical follow-up in these patients,
we anticipate 100% follow-up.
© 2001 John Sorkin. Do not use without
permission
7/29/2008
Power, Sample Size, Minimum Detectable Difference 07/09/2005:
Evaluation of the presentation (Circle a value that indicates your evaluation of the presentation)
Best ever
Worst ever
10-------------9-------------8------------7-------------6-------------5-------------4-------------3-------------2-------------1
What I like most about the presentation
What I like least about the presentation
Suggestions:
Questions (If you give me you name, I will try to find you and answer your question):
18
Download