What size of study do I need?

advertisement
Statistics for Health Research
What size of trial do
I need?
Peter T. Donnan
Professor of Epidemiology and Biostatistics
Co-Director of TCTU
Tayside Clinical Trials Unit
• TCTU now fully registered with
UKCRC Trials units
• Added randomisation service
Outline of Talk
• Basis of sample size
• Effect size
• Significance level
•
•
or (Type I error - α)
Type II error (β)
Power (1 – type II error)
What size of study do I
need?
or
10
10,000
Answer
•As large as possible!
•Data is information, so the
more data the sounder the
conclusions
•In real world, data limited by
resources: access to patients,
money, time, etc..
What size of study do I need?
Expand the question:
What size of study do I need to
answer the question posed, given
the size of my practice / clinic, or
no. of samples, given the amount of
resources (time and money) I have
to collect the information?
What is the question?
•Efficacy: new drugs (CTIMP),
weight loss programme (nonCTIMP
•Equivalence (not very common)
•Non-inferiority (e.g.me-too
drug with less side effects)
Why bother?
1.You will not get your study
past ethics!
2.You will not get your proposal
past a statistical review by
funders!
3.It will be difficult to publish
your results!
Why bother?
•Is the study feasible?
•Is likely sample size enough to
show meaningful differences
with statistical significance?
•Does number planned give
enough power or need larger
number?
OBJECTIVES
•Understand issues involved in
estimating sample size
•Sample size is dependent on
design and type of analysis
•Parameters needed for sample
size estimation
OBJECTIVES
•Understand what is necessary
to carry out some simple
sample size calculations
•Carry out these calculations
with software
•Note SPSS does not yet have
a sample size calculator
What is the measure of outcome?
•Difference in Change in :
•Depression Scores, physiological
measures (BP, Chol),
QOL, hospitalisations, mortality,
etc….
•Choose PRIMARY OUTCOME
•A number of secondary but not too
many!
Unit of Randomisation
1) Randomisation by patient
(Individual Randomisation)RCT e.g. Crossover trial
2) Randomisation by practice
(Cluster randomisation)
RANDOMISED CONTROLLED
TRIAL (RCT)
Gold standard
method to assess
Efficacy of
treatment
RANDOMISED CONTROLLED
TRIAL (RCT)
Random allocation to
intervention or control so
likely balance of all factors
affecting outcome
Hence any difference in outcome
‘caused’ by the intervention
Randomised Controlled Trial
Eligible subjects
RANDOMISED
Intervention
Control
INTERVENTION:
To improve patient care and/or
efficiency of care delivery
•new drug/therapy
•patient education
•Health professional education
•organisational change
Example
Evaluate cost-effectiveness
of new statin
RCT of new statin vs. old
Randomise eligible individuals
to either receive new statin
or old statin
Eligible subjects
Evaluate cost-effectiveness
of new statin on:
Men ? Aged over 50?
Cardiovascular disease?
Previous MI?
Requires precise INCLUSION and
EXCLUSION criteria in protocol
WHAT IS THE OUTCOME?
•Improvement in patients’ health
•Reduction in CV hospitalisations
•More explicitly a greater reduction
in mean lipid levels in those receiving
the new statin compared with the old
statin
•Reduction in costs
Decide on Effect size?
Sounds a bit chicken and egg!
Likely size of effect:
What is the minimum effect
size you will accept as being
clinically or scientifically
meaningful?
Likely Effect size?
Change in Percentage with Total
Cholesterol < 5 mmol/l
• New
Old
Difference
• 40%
20%
20%
• 30%
20%
10%
• 25%
20%
5%
Variability of effect?
Variability of size of effect:
Obtained from previous
published studies and/or
Obtained from pilot work prior
to main study
Variability of effect?
For a comparison of two
proportions the variability of
size of effect is dependent on:
1) the size of the study and
2) the size of the proportions
or percentages
How many subjects?
•1) Likely size of effect 
•2) Variability of effect 
•3) Statistical significance level
•4) Power
•5) 1 or 2-sided tests
Hypothesis Testing
Test Result –
True State
H0 True
H0 False
H0 True
Correct
Decision
H0 False
Type I Error
Type II Error Correct
Decision
  P(Type I Error)   P(Type II Error)
• Goal is to keep ,  reasonably small
Statistical significance or type I
error ()
Type I error – rejecting null hypothesis
when it is true: False positive (Prob= )
Generally use 5% level ( = 0.05) i.e.
accept evidence that null hypothesis
unlikely with p< 0.05
May decrease this for multiple testing
e.g. with 10 tests accept p < 0.005
1 or 2-sided?
Generally use 2-sided significance
tests unless VERY strong belief
one treatment could not be worse
than the other
e.g. Weakest NSAID compared
with new Cox-2 NSAID
How many subjects?
•1) Likely size of effect 
•2) Variability of effect 
•3) Statistical significance level 
•4) Power
•5) 1 or 2-sided tests 
Power and type II error
Type II error (False negative):
Not rejecting the null
hypothesis (non-significance)
when it is false
Probability of type II error - 
Power = 1 - , typically 80%
Type I and Type II errors
Analogy with sensitivity and specificity
Error
Prob.
Screening
Type I () False
positive
1-specificity
Type II () False
negative
1-sensitivity
Power
Acceptable power 70% - 99%
If sample size is not a problem
go for 90% or 95% power;
If sample size could be
problematic go for lower power
but still sufficient e.g. 80%
Power
In some studies finite limit on
the possible size of the study
then question is rephrased:
What likely effect size will I
be able to detect given a fixed
sample size?
How many subjects?
•1) Likely size of effect 
•2) Variability of effect 
•3) Statistical significance level 
•4) Power 
•5) 1 or 2-sided tests 
Sample size for difference in
two proportions
Number needed for comparison
depends on statistical test used
For comparison of two
proportions or percentages
use Chi-Squared (2) test
Comparison of two proportions
Number in each arm =

z
n


 z2 p1 100  p1   p2 100  p2 
2
p1  p2 
2
Where p1 and p2 are the percentages
in group 1 and group 2 respectively
Assume 90% power and 5%
statistical significance (2-sided)
Number in each arm =
10.507p1 100  p1   p2 100  p2 
n 
2
p1  p2 
z = 1.96 (5% significance level, 2-sided)
z2β = 1.28 ( 90% power) are obtained
from Normal distribution
Assume 40% reach lipid target on
new statin and 20% on old drug
10.50740  60  20  80
n 
2
40  20
Number in each arm = 105
Total
= 210
Comparison of two proportions
Repeat for different effects
•New
Old Difference
n
Total
•40%
20%
20%
105
210
•30%
20%
10%
472
944
•25%
20%
5%
1964
3928
n.b. Halving effect size increases size by factor 4!
Increase in sample size with
decrease in difference
Two group ׿ test of equal proportions (odds ratio = 1) (equal n's)
Æ= 0.050 ( 2) ÒÁ= 0.400 Pow= 90
1800
1600
n per group
1400
1200
1000
800
600
400
200
0
0.10
0.12
0.14
0.16
0.18
0.20
0.22
0.24
0.26
Group 2 proportion, ÒÂ
0.28
0.30
0.32
0.34
Increase in power with sample size
Two group ׿ test of equal proportions (odds ratio = 1) (equal n's)
Æ= 0.050 ( 2) ÒÁ= 0.400 ÒÂ= 0.300
100
90
Power
80
70
60
50
100
200
300
400
Sample Size per Group
500
Comparison of two means has a
similar formula
Number in each arm =
2(zα + z2 β ) σ
2
n =
(x1
2
- x2 )
2
Where x1 and x2 are the means in group
1 and group 2 respectively and  is
the assumed standard deviation
Allowing for loss to follow-up /
non-compliance
The number estimated for
statistical purposes may need to
be inflated if likely that a
proportion will be lost to follow-up
For example if you know approx.
20% will drop-out inflate sample
size by 1/(1-0.2) = 1.25
Software and other sample size
estimation
The formula depends on the
nature of the outcome and likely
statistical test
Numerous texts with sample size
tables and formula
Software – nQuery Advisor®
IBM SPSS SamplePower ~1600$
SUMMARY
In planning consider:
design, type of intervention,
outcomes, sample size,
power, and ethics together
at the design stage
SUMMARY
Sample size follows from type
of analysis which follows from
design
Invaluable information is gained
from pilot work and also more
likely to be funded (CSO, MRC,
NIHR, etc.)
SUMMARY
Pilot also gives information on
recruitment rate
You may need to inflate sample
size due to:
Loss of follow-up/ drop-out
Low compliance
Remember the checklist
•1) Likely size of effect 
•2) Variability of effect 
•3) Statistical significance level 
•4) Power 
•5) 1 or 2-sided tests 
SUMMARY
Remember in Scientific Research:
Size Matters
Download