Sample Size Determination

advertisement
Module 28
Sample Size Determination
This module explores the process of estimating the
sample size required for detecting differences of a
specified magnitude for three common circumstances.
Reviewed 19 July 05/ Module 28
28 - 1
The General Situation
An important issue in planning a new study is the
determination of an appropriate sample size required
to meet certain conditions. For example, for a study
dealing with blood cholesterol levels, these conditions
are typically expressed in terms such as “How large a
sample do I need to be able to reject the null
hypothesis that two population means are equal if the
difference between them is d = 10 mg/dl?”
28 - 2
The General Approach
We focus on the sample size required to test a specific
hypothesis. In general, there exists a formula for
calculating a sample size for the specific test statistic
appropriate to test a specified hypothesis. Typically, these
formulae require that the user specify the α-level and
Power = (1 – β) desired, as well as the difference to be
detected and the variability of the measure in question.
Importantly, it is usually wise not to calculate a single
number for the sample size. Rather, calculate a range of
values by varying the assumptions so that you can get a
sense of their impact on the resulting projected sample size.
The you can pick a more suitable sample size from this
range.
28 - 3
Three Common Situations
In this module, we examine the process of estimating
sample size for three common circumstances:
1.
2.
3.
One-sample t-test and paired t-test,
Two-sample t-test, and
Comparison of P1 versus P2 with a z-test.
The tools required for these three situations are broadly
applicable and cover many of the circumstances that are
typically encountered. There are sophisticated software
packages that cover much more than these three and
most professional biostatisticians have them readily
available.
28 - 4
1. One-sample t-test and Paired t-test
For testing the hypothesis:
H0 :  = k vs. H1 :   k
with a two-tailed test, the formula is:
 ( z1 / 2  z1  )
n
d




2
Note: this formula is used even though the test statistic
could be a t-test.
28 - 5
One-Sample Example
We are interested in the size for a sample from a
population of blood cholesterol levels. We know that
typically σ is about 30 mg/dl for these populations.
The following table shows sample sizes for different
levels of some of the factors included in the equation
for a one sample t-test for differences between a
specified population mean and the true mean.
28 - 6
One-Sample Example (contd.)
α = 0.05, σ = 25, d = 5.0, Power = 0.80
2
 ( z1 / 2  z1  ) 
n  

d


 1.96  0.842  25 
n  

5





14.01
2
2
 196.28
n  197
28 - 7
Sample Size for One-Sample t-test
Blood Cholesterol Levels: α = 0.05, σ = 25
1-z1-
 = 25
d
0.5
1.0
3.0
5.0
10.0
20.0
30.0
0.5
0
9,604
2,401
267
96
24
6
3
0.8
0.842
19,628
4,907
545
196
49
12
5
0.85
1.036
22,440
5,610
623
224
56
14
6
0.9
1.282
26,276
6,569
730
263
66
16
7
0.95
1.645
32,490
8,123
903
325
81
20
9
28 - 8
Blood Cholesterol Levels: α = 0.05, σ = 30
1-z1-
 = 30
d
0.5
1.0
3.0
5.0
10.0
20.0
30.0
0.5
0
13,830
3,457
384
138
35
9
4
0.8
0.842
28,264
7,066
785
283
71
18
8
0.85
1.036
32,314
8,078
898
323
81
20
9
0.9
1.282
37,838
9,460
1,051
378
95
24
11
0.95
1.645
46,786
11,696
1,300
468
117
29
13
28 - 9
Blood Cholesterol Levels: α = 0.05, σ = 35
1-z1-
 = 35
d
0.5
1.0
3.0
5.0
10.0
20.0
30.0
0.5
0
18,824
4,706
523
188
47
12
5
0.8
0.842
38,471
9,618
1,069
385
96
24
11
0.85
1.036
43,982
10,996
1,222
440
110
27
12
0.9
1.282
51,502
12,875
1,431
515
129
32
14
0.95
1.645
63,681
15,920
1,769
637
159
40
18
28 - 10
2.
Two Sample t-test
For the hypothesis:
H0: 1 = 2 vs. H1: 1  2
For a two tailed t-test, the formula is:
4 ( z1 / 2  z1  )
2
N  n1  n2 
(d  1  2 )
2
2
28 - 11
Sample Size for Testing Two tailed t-test
H0: 1 = 2 vs. H1: 1  2
How large a sample would be needed for comparing two
approaches to cholesterol lowering using α = 0.05, to
detect a difference of d = 20 mg/dl or more with
Power = 1-  = 0.90
The formula is:
2
2
4 ( z1 / 2  z1  )
N  n1  n2 
(d  1  2 )2
Note: Textbooks do not always clearly indicate whether
the formula they provide is for one group only or for
both groups combined.
28 - 12
When  = 30 mg/dl, β = 0.10,  = 0.05; z1-/2 = 1.96
Power = 1- β ; z 1- β = 1.282 , d = 20mg/dl
4(30) (1.96  1.282)
N  n1  n2 
2
(20)
2
2
4  900  (3.242)2 37,838.03


400
400
N  94.6
Hence about 50 for each group
28 - 13
Sample Sizes:  = 25 mg/dl,  = 0.05
 = 25
d
0.5
1
3
5
10
20
30
0.5
0
38,416
9,604
1,067
384
96
24
11
1-/z1-
0.8
0.85
0.842
1.036
78,512 89,760
19,628 22,440
2,181
2,493
785
898
196
224
49
56
22
25
0.9
1.282
105,106
26,276
2,920
1,051
263
66
29
0.95
1.645
129,960
32,490
3,610
1,300
325
81
36
28 - 14
Sample Sizes:  = 30 mg/dl,  = 0.05
 = 30
d
0.5
1
3
5
10
20
30
0.5
0
55,319
13,830
1,537
553
138
35
15
1-/z1-
0.8
0.85
0.842
1.036
113,057 129,255
28,264
32,314
3,140
3,590
1,131
1,293
283
323
71
81
31
36
0.9
1.282
151,352
37,838
4,204
1,514
378
95
42
0.95
1.645
187,143
46,786
5,198
1,871
468
117
52
28 - 15
Sample Sizes:  = 35 mg/dl,  = 0.05
 = 35
d
0.5
1
3
5
10
20
30
0.5
0
75,295
18,824
2,092
753
188
47
21
1-/z1-
0.8
0.85
0.842
1.036
153,884 175,930
38,471
43,982
4,275
4,887
1,539
1,759
385
440
96
110
43
49
0.9
1.282
206,007
51,502
5,722
2,060
515
129
57
0.95
1.645
254,722
63,681
7,076
2,547
637
159
71
28 - 16
3.
Two-sample proportions
H0 : P1 = P2 vs. H1 : P1  P2
 P1  P2  P1  P2  
4( z1 / 2  z1  ) 
1



2 
 2 
2
N  n1  n2 
 d  P1  P2 
2
28 - 17
Example: d = P1 - P2 = 0.7 - 0.5 = 0.2
When  = 30 mg/dl, β = 0.10,  = 0.05; z1-/2 = 1.96
Power = 1- β ; z1- β = 1.282 , d = 20mg/dl
(P1+P2)/2 = (0.7+0.5)/2 = 0.6
4 1.96  1.282 (0.6)(1  0.6)
2
N  (n1  n2 ) 

(0.2)2
4(3.242)2  (0.6)(0.4)
(0.2)2
10.09

 252.25
0.04
N  252.25
Consider using N = 260, or 130 per group
28 - 18
Sample size for testing P1- P2 with α = 0.05
1-z1-

P1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
 
P2
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.5
0
196
288
350
380
380
350
288
196
73
0.8
0.842
400
589
714
777
777
714
589
400
149
0.85
1.036
458
673
817
889
889
817
673
458
171
0.9
1.282
536
788
956
1,041
1,041
956
788
536
200
0.95
1.645
663
975
1,183
1,287
1,287
1,183
975
663
247
28 - 19
1-β/
0.8
0.842
z
1-β
P1
P2
0.5
0
0.85
1.036
0.9
1.282
0.95
1.645
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
61
81
92
96
92
81
61
35
126
165
188
196
188
165
126
71
144
188
215
224
215
188
144
81
168
221
252
263
252
221
168
95
208
273
312
325
312
273
208
117
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.6
0.5
0.4
0.3
0.2
0.1
0.0
32
39
42
42
39
32
22
65
79
86
86
79
65
44
75
91
99
99
91
75
51
88
106
116
116
106
88
60
108
131
143
143
131
108
74
28 - 20
1-z1-

 
P1
0.9
0.8
0.7
0.6
0.5
0.4
P2
0.5
0.4
0.3
0.2
0.1
0.0
0.5
0
20
23
24
23
20
15
0.8
0.842
41
47
49
47
41
31
0.85
1.036
47
54
56
54
47
36
0.9
1.282
55
63
66
63
55
42
0.95
1.645
68
78
81
78
68
52
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
14
15
15
14
12
29
31
31
29
24
33
36
36
33
27
38
42
42
38
32
47
51
51
47
39
0.9
0.8
0.7
0.6
0.3
0.2
0.1
0.0
10
11
10
9
21
22
21
18
24
25
24
21
28
29
28
25
35
36
35
30
28 - 21
Download