Drug Development Statistics & Data Management
July 2014
Cathryn Lewis
Professor of Genetic Epidemiology & Statistics
Department of Medical & Molecular Genetics
King’s College London
With thanks to Irene Rebollo Mesa and Frühling Rijsdijk
1.
Concepts of power
2.
Power and types of error
3.
Software to calculate power
4.
Power for continuous outcome
5.
Power for proportion, success/failure
6.
Quiz!
Power and Sample size 2
Question : What are the study endpoints?
Types of Endpoints:
• Binary clinical outcome: Death from disease.
• Quantitative : Creatinine, cholesterol levels, QOL.
• Time to Event: Time to graft failure, time to death, time to recovery
Good Qualities:
Clinically meaningful
Practical and feasible to measure
Occur frequently enough throughout the duration of the trial
Power and Sample size 3
Question : What is the expected prevalence of outcome
(discrete) or variability of the outcome (continuous)?
• Based on previous studies, pilot study or hospital/NHS report.
• Variability and prevalence are vital for power.
• Both are best at intermediate levels.
Question:What is the expected difference between groups
• in proportion of events (if discrete), or
• in mean measure (if continuous)
• Based on previous studies or pilot study
• Alternatively, minimum difference clinically relevant
• The larger the difference the higher the power
Power and Sample size 4
1.Superiority
Objective To determine whether there is evidence of statistical difference in the comparison of interest between two Tx regimes:
A: Tx of Interest B : Placebo or
Active control Tx
H
0
: The two Txs have equal effect with respect to the mean response
A
B
H
1
: The two Txs are different with respect to the mean response
A
B
Power and Sample size 5
Power and Sample size
6
• Definition: The expected proportion of samples in which we decide correctly against the null hypothesis
• It depends on:
1.
Size of the (treatment) effect in the population ( d
)
2.
The significance level at which we reject the null (0.05)
3.
Sample size (N)
4.
Design of the study: parallel or crossover etc.
5.
Endpoint measurement (categorical, ordinal, continuous)
6.
The expected dropout rate
Power and Sample size 7
• We summarise results of a trial in a statistical analysis with a test statistic (e.g. chi-squared, Z score)
• Provide a measure of support for a certain hypothesis
• Pre-determine threshold on test statistic to reject null hypothesis
NO
YES
Test statistic
YES OR NO decision-making : significance testing
Inevitably leads to two types of mistake : false positive ( YES instead of NO ) (Type I) false negative ( NO instead of YES ) (Type II)
Power and Sample size 8
Sampling distribution if
H
0 were true alpha 0.05
Sampling distribution if
H
A were true
POWER:
1 -
T
Power and Sample size 9
Rejection of H
0
Non-rejection of H
0
H
0 true
Signifcance level
Type I error = α
H
A true
Power
1-type II error = 1β
Type II error = β
Power and Sample size 10
• Null hypothesis : no effect
• A ‘ significant ’ result means that we can reject the null hypothesis
• A ‘ non-significant ’ result means that we cannot reject the null hypothesis
Power and Sample size 11
Statistical significance
• The ‘ p-value ’
• The probability of a false positive error if the null were in fact true
• Typically, we are willing to incorrectly reject the null
5% or 1% of the time ( Type I error )
Power and Sample size 12
Rejection of H
0
Non-rejection of H
0
H
0 true
Signifcance level
Type I error = α
H
A true
Power
1-type II error = 1β
Type II error = β
Power and Sample size 13
H
0 true
Rejection of H
0
Type I error at rate
Non-rejection of H
0
Nonsignificant result
(1-
)
H
A true
Significant result
(1-
)
Type II error at rate
Power and Sample size 14
Sampling distribution if
H
0 were true alpha 0.05
Sampling distribution if
H
A were true
POWER:
1 -
T
Power and Sample size 15
Sampling distribution if
H
0 were true alpha 0.05
Sampling distribution if
H
A were true
POWER:
1 -
↑
T
Power and Sample size 16
Sampling distribution if
H
0 were true alpha 0.01
POWER:
1 -
↓
Sampling distribution if
H
A were true
T
Power and Sample size 17
Sampling distribution if
H
0 were true alpha 0.1
POWER:
1 -
↑
Sampling distribution if
H
A were true
Power and Sample size 18
Sampling distribution if
H
0 were true alpha 0.05
Sampling distribution if
H
A were true
POWER:
1 -
↑
T
Power and Sample size 19
We need:
– Acceptable type I error rate (
),
• usually 0.05, or 0.025 if one sided
– A meaningful difference d in the response: the smallest Tx effect clinically worth detecting / that we wish to detect
– The desirable power (1) to detect this difference, min. 80%
– Ratio of allocation to the groups (equal sample sizes?)
– Whether to use one-sided or two-sided test
In addition,
– The variability common to the two populations for continuous endpoint
– The response (event) rate of the control group for the binary endpoint
Power and Sample size 20
PRISM StatMate ($50)
G*Power 3 (Free)
Statistical software: SPSS, SAS, Stata, R
PS Power and Sample size Calculation (free) (Windows)
Web: Google “Statistical Power Calculation”
Russell V. Lenth
http://www.stat.uiowa.edu/~rlenth/Power/
David Schoenfeld
http://hedwig.mgh.harvard.edu/sample_size/size.html
Perform calculation in two methods – similar answers
Power and Sample size 21
Russ Lenth’s Power and Sample size page http://www.stat.uiowa.edu/~rlenth/Power/
Statistical Considerations 22
http://hedwig.mgh.harvard.edu/sample_size/size.html
Statistical Considerations 23
Determining Sample Size:
Continuous outcome
• Two Anti-Hypertensives:
– Testing for superiority
• Endpoint: Difference in Diastolic BP
– Continuous variable
• Relevant parameters
– Difference in Diastolic BP between drugs: d
=2 mm Hg
– Standard deviation of Diastolic BP in each group:
= 10 mm Hg
– Significance level: 0.05
– Required power: 0.8
– Assume equal sized groups
• Calculate sample size required
Power and Sample size 24
Russ Lenth’s Power and Sample size page http://www.stat.uiowa.edu/~rlenth/Power/
Power and Sample size 25
Power and Sample size 26
Power, by difference between two groups
Statistical Considerations 27
Continuous outcome:
Power
Standard error
(equal in each group) Significance level
Difference in means
Sample size
(equal in each group, fixed ratio? )
Power and Sample size 28
Determining Sample Size: Discrete Example
• APT070 perfusion vs. cold storage of kidney
• Testing for superiority
• Endpoint: Delayed Graft Function after transplantation
• Proportion of patients experiencing delayed graft
• Relevant parameters
• Baseline prevalence: 35%
• Minimum difference clinically significance, 10%
• p1=0.35, p2=0.25
[proportion with delayed graft function in each group]
• Significance level
=0.05
• Power = 80%
• Calculate sample size required
Power and Sample size 29
Russ Lenth’s Power and Sample size page http://www.stat.uiowa.edu/~rlenth/Power/
Power and Sample size 30
http://hedwig.mgh.harvard.edu/sample_size/size.html
Power and Sample size
A and 349 patients on treatment A and 349 response rate of treatment A is
0.05 significance level.
This assumes that the response rate of treatment
A is 0.35 and the response rate of treatment B is 0.25.
31
Discrete outcome
Power
Proportion responding in Group 1
Proportion responding in Group 2
Significance level
Sample size
Equal in each group?
Fixed ratio?
Power and Sample size 32
• Use power prospectively for planning future studies
– Determine an appropriate sample size
– Evaluating a planned study – will it yield useful information?
• Put science before statistics .
– Use effect sizes that are clinically relevant
– Don’t get distracted by statistical considerations
• Perform a pilot study
– Helps establish procedures, understand and protect against the unexpected
– Gives variance estimates needed in determining sample size
Power and Sample size 33
1.Superiority
2.Equivalence:
Objective To demonstrate that two treatments have no clinically meaningful difference d = largest difference clinically acceptable
H
0
: The two Txs effects are different with respect to the mean response
A
B
d or
A
B
d
A
B
H
1
: The two Txs are equal with respect to the mean response d
A
B
d
A
Power and Sample size
B
34
3.Non-Inferiority:
Objective To demonstrate that a given treatment is not clinically inferior to another
H
0
: A given Tx is inferior with respect to the mean response
A
B
d
H
1
: A given Tx is non-inferior with respect to the mean response
A
B
d
Power and Sample size
35
Assume 80% Power, α = 0.05, two-sided
1. Mortality
2. Mortality
3. Diastolic BP
4. Diastolic BP
Study A
20% vs 10%
B How many subjects?
Study B
20% vs 15%
(x) more with A
(y) more with B
(z) the same
20% vs 10% 40% vs 30%
(x) more with A
(y) more with B
(z) the same
(x) more with A
80 vs 85 mmHg 90 vs 95 mmHg
St. dev 10 St dev 10 (z) the same
(x) more with A
(y) more with B
St. dev 10 St dev 8
Power and Sample size
36
1. B
2. B
3. Same
4. A
Small difference need more subjects
Bigger effect size in A (doubling of survival. Smaller effect, larger sample size needed to detect
Only standard deviation matters
Bigger standard deviation more subjects
Power and Sample size
37