Introduction to sample size and power calculations

advertisement
Introduction to sample size
and power calculations
How much chance do we have to
reject the null hypothesis when
the alternative is in fact true?
(what’s the probability of
detecting a real effect?)
Can we quantify how much
power we have for given
sample sizes?
study 1: 263 cases, 1241 controls
Null
Distribution:
difference=0.
Rejection region.
Any value >= 6.5
(0+3.3*1.96)
For 5% significance level,
one-tail area=2.5%
(Z/2 = 1.96)
Power= chance of being in the
Clinically relevant
rejection region if the alternative
alternative: is true=area to the right of this
difference=10%.
line (in yellow)
study 1: 263 cases, 1241 controls
Rejection region.
Any value >= 6.5
(0+3.3*1.96)
Power here:
6.5 10
P( Z >
)=
3.3
P( Z > 1.06) = 85%
Power= chance of being in the
rejection region if the alternative
is true=area to the right of this
line (in yellow)
study 1: 50 cases, 50 controls
Critical value=
0+10*1.96=20
Z/2=1.96
2.5% area
Power closer to
15% now.
Study 2: 18 treated, 72 controls, STD DEV = 2
Critical value=
0+0.52*1.96 = 1
Clinically relevant
alternative:
difference=4 points
Power is nearly
100%!
Study 2: 18 treated, 72 controls, STD DEV=10
Critical value=
0+2.58*1.96 = 5
Power is about
40%
Study 2: 18 treated, 72 controls, effect size=1.0
Critical value=
0+0.52*1.96 = 1
Power is about
50%
Clinically relevant
alternative:
difference=1 point
Factors Affecting Power
1.
2.
3.
4.
Size of the effect
Standard deviation of the characteristic
Bigger sample size
Significance level desired
1. Bigger difference from the null mean
Null
Clinically
relevant
alternative
average weight from samples of 100
2. Bigger standard deviation
average weight from samples of 100
3. Bigger Sample Size
average weight from samples of 100
4. Higher significance level
Rejection region.
average weight from samples of 100
Sample size calculations


Based on these elements, you can write
a formal mathematical equation that
relates power, sample size, effect size,
standard deviation, and significance
level…
**WE WILL DERIVE THESE FORMULAS
FORMALLY SHORTLY**
Simple formula for difference
in means
Represents the
desired power
(typically .84 for
80% power).
Sample size in each
group (assumes equal
sized groups)
2 ( Z   Z/2 )
2
n
Standard
deviation of the
outcome variable
difference
Effect Size
(the difference
in means)
2
2
Represents the
desired level of
statistical
significance
(typically 1.96).
Simple formula for difference
in proportions
Represents the
desired power
(typically .84 for
80% power).
Sample size in each
group (assumes equal
sized groups)
n
2( p )(1  p )( Z   Z/2 )
A measure of
variability (similar
to standard
deviation)
(p1  p2 )
Effect Size
(the difference
in proportions)
2
2
Represents the
desired level of
statistical
significance
(typically 1.96).
Derivation of sample size
formula….
Study 2: 18 treated, 72 controls, effect size=1.0
Critical value= 0+.52*1.96=1
Power close to 50%
SAMPLE SIZE AND POWER FORMULAS
Critical value=
0+standard error (difference)*Z/2
Power= area to right of Z=
Z 
critical value - alternativ e difference (here  1)
standard error (diff)
e.g. here :Z  
0
; power  50%
standard error (diff)
Power= area to right of Z=
Z 
critical value - alternativ e difference
standard error (diff)
Z/2 * standard error (diff) - difference
Z 
standard error(diff )
Power is the area to the right of Z . OR
difference
power is the area to the left of - Z .
Z   Z/2 
standard error(diff ) Since normal charts give us the area to
the left by convention, we need to use
- Z to get the correct value. Most
difference
 Z 
 Z/2 textbooks just call this “Z ”; I’ll use
standard error(diff )
the term Z
to avoid confusion.




power
Z power   Z  

the area to the left of Z power  the area to the right of Z 
All-purpose power formula…
Z power
difference

 Z / 2
standard error(difference)
Derivation of a sample size
formula…
s.e.(diff ) 

2
n1


2
Sample size is embedded in the
standard error….
n2
if ratio r of group 2 to group 1 : s.e.(diff ) 

2
n1


2
rn1
Algebra…
 Z power 
difference
2
n1
Z power 

2
rn1
difference
(r  1) 2
rn1
( Z power  Z/2 )  (
2
 Z/2
 Z/2
difference
(r  1)
rn1
2
)2
(r  1) ( Z power  Z/2 )  rn1difference
2
rn1difference  (r  1) ( Z power  Z/2 )
2
2
2
2
n1 
2
(r  1) 2 ( Z power  Z/2 ) 2
rdifference 2
(r  1)  ( Z power  Z/2 )
n1 
2
r
difference
2
If r  1 (equal groups), then n1 
2
2 2 ( Z power  Z/2 ) 2
difference
2
Sample size formula for
difference in means
(r  1)  ( Z power  Z/2 )
n1 
2
r
difference
2
2
where :
n 1  size of smaller group
r  ratio of larger group to smaller group
  standard deviation of the characteristic
diffference  clinically meaningful difference in means of the outcome
Z power  corresponds to power (.84  80% power)
Z / 2  corresponds to two - tailed significan ce level (1.96 for   .05)
Examples

Example 1: You want to calculate how much power
you will have to see a difference of 3.0 IQ points
between two groups: 30 male doctors and 30 female
doctors. If you expect the standard deviation to be
about 10 on an IQ test for both groups, then the
standard error for the difference will be about:
10 2 10 2

= 2.57
30
30
Power formula…
Z power
Z 
Z power
d*

 Z / 2 
 (d *)
d*
2 2
n
 Z / 2
d* n

 Z / 2
 2
d*
3
d*
 Z / 2 
 1.96  .79 or ZZpower


 (d *)
2.57

n
3
 Z / 2 
2
10
30
 1.96  .79
2
P(Z≤ -.79) =.21; only 21% power to see a difference of 3 IQ points.

Example 2: How many people would
you need to sample in each group to
achieve power of 80% (corresponds to
Z=.84)
n
2 2 ( Z   Z  / 2 ) 2
(d *) 2
100 (2)(.84  1.96) 2

 174
2
(3)
174/group; 348 altogether
Sample Size needed for
comparing two proportions:
Example: I am going to run a case-control study
to determine if pancreatic cancer is linked to
drinking coffee. If I want 80% power to detect
a 10% difference in the proportion of coffee
drinkers among cases vs. controls (if coffee
drinking and pancreatic cancer are linked, we
would expect that a higher proportion of cases
would be coffee drinkers than controls), how
many cases and controls should I sample?
About half the population drinks coffee.
Derivation of a sample size
formula:
The standard error of the difference of two proportions is:
p (1  p ) p (1  p )

n1
n2
Derivation of a sample size
formula:
Here, if we assume equal sample size and
that, under the null hypothesis proportions of
coffee drinkers is .5 in both cases and
controls, then
s.e.(diff)=
.5(1  .5) .5(1  .5)

 .5 / n
n
n
Z power
test statistic

 Z / 2
s.e.(test statistic)
Z power =
.10
.5 / n
1.96
For 80% power…
.84 
.10
 1.96
.5 / n
.84  1.96 
.10
.5 / n
There is 80% area to the
left of a Z-score of .84 on
a standard normal curve;
therefore, there is 80%
area to the right of -.84.
2
.
10
n
(.84  1.96) 2 
.5
.5(.84  1.96) 2
n 
 392
2
.10
Would take 392 cases and 392 controls to have 80% power!
Total=784
Question 2:
How many total cases and controls would I have
to sample to get 80% power for the same
study, if I sample 2 controls for every case?

Ask yourself, what changes here?
Z power 
test statistic
 Z / 2
s.e.(teststatistic)
p(1  p) p(1  p)
.25 .25
.25 .5
.75
.75







2n
n
2n
n
2n 2n
2n
2n
Different size groups…
.84 
.10
 1.96
.75 / 2n
.10
.84  1.96 
.75 / 2n
(.10 2 ) 2n
(.84  1.96) 
.75
.75(.84  1.96) 2
n 
 294
2
( 2).10
2
Need: 294 cases and 2x294=588 controls. 882 total.
Note: you get the best power for the lowest sample size if you keep both groups equal (882 > 784).
You would only want to make groups unequal if there was an obvious difference in the cost or ease of
collecting data on one group. E.g., cases of pancreatic cancer are rare and take time to find.
General sample size formula
s.e.(diff ) 
p(1  p) p(1  p)


rn
n
p(1  p) rp(1  p)
(r  1) p(1  p)


rn
rn
rn
2
r  1 p(1  p )(Z power  Z  / 2 )
n
r
( p1  p 2 ) 2
General sample size needs
when outcome is binary:
2
p
(
1

p
)(
Z

Z
)
r 1

 /2
n
2
r
( p1  p2 )
where :
n  size of smaller group
r  ratio of larger group2 to smaller group
2 ( Z power  Z / 2 ) 2
p1  p2  clinically
n  meaningful difference in proportion s of the outcome
2
Z   corresponds to power (.84
 80%
(diff
) power)
Z / 2  corresponds to two - tailed significan ce level (1.96 for   .05)
Compare with when outcome
is continuous:
(r  1)  ( Z   Z/2 )
n1 
2
r
difference
2
2
where :
n1  size of smaller group
r  ratio of larger group to smaller group
  standard deviation of the characteristic
diffference  clinically meaningful difference in means of the outcome
Z   corresponds to power (.84  80% power)
Z / 2  corresponds to two - tailed significan ce level (1.96 for   .05)
Question

How many subjects would we need to sample
to have 80% power to detect an average
increase in MCAT biology score of 1 point, if the
average change without instruction (just due to
chance) is plus or minus 3 points (=standard
deviation of change)?
Standard error here=
 change
n

3
n
Z power
Z power 
test statistic

 Z / 2
s.e.(test statistic)
D
D
 Z / 2
Where D=change
from test 1 to test
2. (difference)
n
( Z power  Z / 2 ) 2 
D
2
 D ( Z power  Z / 2 )
2
n
nD
2
D2
Therefore, need:
(9)(1.96+.84)2/1 =
70 people total
2
Sample size for paired data:
 d ( Z   Z/2 )
2
n
difference
2
2
where :
n  sample size
  standard deviation of the within - pair difference
diffference  clinically meaningful difference
Z   corresponds to power (.84  80% power)
Z / 2  corresponds to two - tailed significan ce level (1.96 for   .05)
Paired data difference in
proportion: sample size:
n
p(1  p)(Z   Z / 2 ) 2
( p1  p2 )
2
where :
n  sample size for 1 group
2meaningful
( Z powerdifference
 Z / in2 )dependent proportion s
p1  p2  clinically
2
2
n  s to power (.84  80%
Z   correspond
2 power)
(diff )
Z / 2  corresponds to two - tailed significan ce level (1.96 for   .05)
Download